Exception on Load

I have a large set of PDF files that are uploaded from our users. I have a large set (60%) of them that give the following exceptions when attempting to load via GemBox.Pdf.PdfDocument.Load(path), which means we can’t do anything with the file.

Invalid character ‘n’ was read at index 0 in keyword “trailer”.
Invalid character was detected while reading a PDF cross-reference table subsection.

Any advice?

Hi Jason,

Without investigating those PDF files it’s impossible for me to say anything for sure.
Nevertheless, it seems like you have invalid PDF files, that don’t follow specifications.

GemBox.Pdf is able to make some repairs to invalid PDFs, but currently, it’s unable to repair the invalid IDs for indirect objects, damaged cross-reference tables, or invalid syntax.

Note that PDF is a file format that supports “lazy-loading”, meaning that only objects that are required to view some page can be loaded and other objects do not have to be read from the PDF file, and GemBox.Pdf also reads a PDF file in this “lazy” fashion.

This is especially useful for huge PDF files that contain several thousand pages, you can read a specific page very fast. But in the case of that kind of invalid PDF files, the “lazy reading” feature is impossible because all objects would have to be iterated.

Nevertheless, note that we have an internal support ticket in which we keep track of this feature’s priority (support for fixing missing or invalid cross-reference tables) and we plan to implement it sometime in the future. But at this moment, I’m afraid I cannot say exactly when that will be.
This is not in our current roadmap so it won’t be done within the first quarter of the next year.

Regards,
Mario

They are valid files, I’m able to open them with any PDF viewer. Is there any diagnostic I could do on my end and provide to you?

Hi Jason,

Just because you can open them with PDF applications, doesn’t mean they are valid.
You see, in quite a few cases the applications will perform a silent repair.

One way how you can notice this is that when you open the file in Adobe and then immediately try to close it the application will ask if you want to save the changes.
This indicates that the application had to make some changes to the file when it was reading it.

Anyway, we can make the diagnostic for you, please send us your PDF file(s) so that we can take a look at them.

Regards,
Mario