Non-ASCII characters are not correctly displayed in PDF when served via HttpResponse and AJAX

Tags: , , , ,



I have generated a PDF file which contains Cyrillic characters (non-ASCII) with ReportLab. For this purpose I have used the “Montserrat” font, which support such characters. When I look in the generated PDF file inside the media folder of Django, the characters are correctly displayed:

enter image description here

I have embedded the font by using the following code in the function generating the PDF:

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

pdfmetrics.registerFont(TTFont('Montserrat', 'apps/Generic/static/Generic/tff/Montserrat-Regular.ttf'))
canvas_test = canvas.Canvas("media/"+filename, pagesize=A4)
canvas_test.setFont('Montserrat', 18)
canvas_test.drawString(10, 150, "Some text encoded in UTF-8")
canvas_test.drawString(10, 100, "как поживаешь")
canvas_test.save()

However, when I try to serve this PDF via HttpResponse, the Cyrillic characters are not properly displayed, despite being displayed in the Montserrat font:

enter image description here

The code that serves the PDF is the following:

# Return the pdf as a response
fs = FileSystemStorage()
if fs.exists(filename):
    with fs.open(filename) as pdf:
        response = HttpResponse(
            pdf, content_type='application/pdf; encoding=utf-8; charset=utf-8')
        response['Content-Disposition'] = 'inline; filename="'+filename+'"'
        return response

I have tried nearly everything (using FileResponse, opening the PDF with with open(fs.location + "/" + filename, 'rb') as pdf…) without success. Actually, I do not understand why, if ReportLab embeddes correctly the font (local file inside media folder), the file provided to the browser is not embedding the font.

It is also interesting to note that I have used Foxit Reader via Chrome or Edge to read the PDF. When I use the default PDF viewer of Firefox, different erroneous characters are displayed. Actually the font seems to be also erroneous in that case:

enter image description here

Edit

Thanks to @Melvyn, I have realized that the error did not lay in the response directly sent from the Python view, but in the success code in the AJAX call, which I leave hereafter:

$.ajax({
    method: "POST",
    url: window.location.href,
    data: { trigger: 'print_pdf', orientation: orientation, size: size},
    success: function (data) {
        if (data.error === undefined) {
            var blob = new Blob([data]);
            var link = document.createElement('a');
            link.href = window.URL.createObjectURL(blob);
            link.download = filename + '.pdf';
            link.click();
        }
    }
 });

This is the part of the code that is changing somehow the encoding.

Solution with the ideas from comments

I finally come up with a solution thanks to all the comments I have received, specially from @Melvyn. Instead of creating a Blob object, I have just set the responseType of the AJAX to Blob type. This is possible since JQuery 3:

$.ajax({
    method: "POST",
    url: window.location.href,
    xhrFields:{
        responseType: 'blob'
    },
    data: { trigger: 'print_pdf', orientation: orientation, size: size},
    success: function (data) {
        if (data.error === undefined) {
            var link = document.createElement('a');
            link.href = window.URL.createObjectURL(data);
            link.download = filename + '.pdf';
            link.click();
        }
    }
 });

I hope this post helps people with the same problem while generating PDFs in non-ASCII (Cyrillic) characters. It took me several days…

Answer

You are doing some encoding/recoding, because if you look at the diff between the files, it’s littered with unicode replacement characters:

% diff -ua Cyrillic_good.pdf Cyrillic_wrong.pdf > out.diff

% hexdump out.diff|grep 'ef bf bd'|wc -l
    2659

You said you tried without setting the encoding and charset, but I don’t think that was tested properly – most likely you saw an aggressively browser-cached version.

The proper way to do this is to use FileResponse, pass in the filename and let Django figure out the right content type.

The following is a reproducible test of a working situation:

First of all, put Cyrillic_good.pdf (not wrong.pdf), in your media root.

Add the following to urls.py:

#urls.py
from django.urls import path
from .views import pdf_serve

urlpatterns = [
    path("pdf/<str:filename>", pdf_serve),
]

And views.py in the same directory:

#views.py
from pathlib import Path

from django.conf import settings
from django.http import (
    HttpResponseNotFound, HttpResponseServerError, FileResponse
)

def pdf_serve(request, filename: str):
    pdf = Path(settings.MEDIA_ROOT) / filename
    if pdf.exists():
        response = FileResponse(open(pdf, "rb"), filename=filename)
        filesize = pdf.stat().st_size
        cl = int(response["Content-Length"])
        if cl != filesize:
            return HttpResponseServerError(
                f"Expected {filesize} bytes but response is {cl} bytes"
            )
        return response

    return HttpResponseNotFound(f"No such file: {filename}")


Now start runserver and request http://localhost:8000/pdf/Cyrillic_good.pdf.

If this doesn’t reproduce a valid pdf, it is a local problem and you should look at middleware or your OS or little green men, but not the code. I have this working locally with your file and no mangling is happening.

In fact, the only way to get a mangled pdf now is browser cache or response being modified after Django sends it, since the content length check would prevent sending a file that has different size then the one on disk.

JS Part

I would expect the conversion to happen in the blob constructor as it’s possible to hand a blob a type. I’m not sure the default is binary-safe. It’s also weird your data has an error property and you pass the entire thing to the blob, but we can’t see what promise you’re reacting on.
success: function (data) {
    if (data.error === undefined) {
        console.log(data) // This will be informative
        var blob = new Blob([data]);
        var link = document.createElement('a');
        link.href = window.URL.createObjectURL(blob);
        link.download = filename + '.pdf';
        link.click();
    }
}


Source: stackoverflow