Detect encoding of text file

Hi GemBox Team,

is there a way to determine the encoding of a txt file that is loaded as DocumentModel from an existing file?

I am handling files that are written in UTF-8 and Windows-1252 encoding and some non-ASCII characters are not recognized in the Windows-1252 files. This is due to UTF8 being the standard encoding in the txt loading options, but setting the encoding to Windows-1252 causes problems with the UTF8 txt files. How can I dynamically select the correct encoding when loading the content of a txt file?

Kind regards,
Sarah

Hi Sarah,

Unfortunately, GemBox.Document currently doesn’t provide such a feature.

Nevertheless, perhaps you could try using this package:
https://www.nuget.org/packages/UTF.Unknown/

string path = "input.txt";

var result = CharsetDetector.DetectFromFile(path);
var detectedEncoding = result.Detected.Encoding;

var document = DocumentModel.Load(path,
    new TxtLoadOptions() { Encoding = detectedEncoding });
// ...

Regards,
Mario

Hi Mario,

thanks, that sounds good. I will have a look at it.

Kind regards,
Sarah