Hi,
I have a problem converting a .DOC file with special french accented characters (é, è etc).
It works perfectly with .DOCX, but if the initial document is .DOC then there is a problem:
Hi,
I have a problem converting a .DOC file with special french accented characters (é, è etc).
It works perfectly with .DOCX, but if the initial document is .DOC then there is a problem:
Hi,
Can you send us your DOC file so that we can reproduce this issue?
Currently, I cannot say for sure, but is it possible that you have HTML content that’s saved into a file with a “.doc” extension?
If that is the case, you’ll need to specify the right encoding using the HtmlLoadOptions.Encoding
property.
Anyway, to tell you exactly why the problem occurs, I’ll need to investigate the file.
Regards,
Mario
Hi Mario,
Thank you for the reply.
Unfortunately, I cannot send you the document because it’s confidential.
Nevertheless, I’ll try the encoding and I’ll let you know of the results
Thank you,
Regards,
Diamantis
Hi Diamantis,
Any progress so far?
Perhaps you could open your DOC file in some text editor (like Notepad, Notepad++, or VS Code) and take a small screenshot of its content at the beginning.
With that, we should be able to conclude what format it actually is.
If the file is actually of DOC format, then you could try editing it with Microsoft Word, remove everything except one word with that special character, resave it and check if the issue remains.
If it does, then you can send us that small DOC file.
This won’t work if the file is not of DOC format because on resave Microsoft Word will change it to DOC format.
Regards,
Mario
Hi Mario,
Indeed if I remove everything except the problematic words, it works.
The problem come from the header of the document when text is mixed with images.
Thank you for your help!
Hi Diamantis,
What about the screenshot, is it really a DOC format or something else?
Unfortunately, without reproducing your issue I’m unable to resolve it for you.
Can you try removing any confidential data from the file and send it to us?
Or, can you try recreating the document with the content that you found to be problematic and send it to us?
Regards,
Mario
Hi Mario,
Here is a screenshot (open with Notepad++)
Aparently the problem is on the header.
When I create a new .doc document and put special characters (french, greek etc) seems to work fine.
But with this specific text in the header there is a problem.
Thank you