facebooktwittermenuarrow-up

GemBox Support Forum

Problem with special characters converting .DOC files to PDF

Hi,

I have a problem converting a .DOC file with special french accented characters (é, è etc).
It works perfectly with .DOCX, but if the initial document is .DOC then there is a problem:
image

Hi,

Can you send us your DOC file so that we can reproduce this issue?

Currently, I cannot say for sure, but is it possible that you have HTML content that’s saved into a file with a “.doc” extension?
If that is the case, you’ll need to specify the right encoding using the HtmlLoadOptions.Encoding property.

Anyway, to tell you exactly why the problem occurs, I’ll need to investigate the file.

Regards,
Mario

Hi Mario,

Thank you for the reply.
Unfortunately, I cannot send you the document because it’s confidential.
Nevertheless, I’ll try the encoding and I’ll let you know of the results

Thank you,

Regards,
Diamantis

Hi Diamantis,

Any progress so far?

Perhaps you could open your DOC file in some text editor (like Notepad, Notepad++, or VS Code) and take a small screenshot of its content at the beginning.
With that, we should be able to conclude what format it actually is.

If the file is actually of DOC format, then you could try editing it with Microsoft Word, remove everything except one word with that special character, resave it and check if the issue remains.
If it does, then you can send us that small DOC file.
This won’t work if the file is not of DOC format because on resave Microsoft Word will change it to DOC format.

Regards,
Mario

Hi Mario,

Indeed if I remove everything except the problematic words, it works.
The problem come from the header of the document when text is mixed with images.

Thank you for your help!
:slight_smile:

Hi Diamantis,

What about the screenshot, is it really a DOC format or something else?

Unfortunately, without reproducing your issue I’m unable to resolve it for you.
Can you try removing any confidential data from the file and send it to us?
Or, can you try recreating the document with the content that you found to be problematic and send it to us?

Regards,
Mario

Hi Mario,

Here is a screenshot (open with Notepad++)

Aparently the problem is on the header.
When I create a new .doc document and put special characters (french, greek etc) seems to work fine.
But with this specific text in the header there is a problem.

Thank you

Hi,

From the screenshot, I see that this is indeed DOC format.

I have now tried creating a new DOC file with those special characters in the header.
Unfortunately, I was unable to reproduce any issue with them.

Here is the input DOC file I used:

And the resulting PDF file I got:

Regards,
Mario