Adding a table of content in a dynamically builded PDF document

Hi,

We are currently building a solution that allows our clients to upload some files in different formats (Docx, Doc and Pdf).
Our solution then uses all these files to create one big file inside of a structure (including a cover, a table of contents and other stuff).

So far, we are giving the resulting file in Word (Docx). To do so, we convert every file that is not already in Docx type (Doc => Docx and Pdf => Docx). It works most of the time (some files are converted with strange layout, but nothing crazy).

Now, we need to export the resulting file in Pdf and here is the problem.
We are able to merge all files together, but we are not able to create a table of contents.

To explain a bit more, we convert every file that is not already in Pdf format (Doc => Pdf and Docx => Pdf) and merge them. The resulting file might be over 300+ pages.
We’d like to build a table of contents according to the content of the document (like it is done in Word). But we cannot achieve that for now.

The only solution we have is to first create the resulting file in Word (Docx), then update the table of contents using C#, and after that saving the file to Pdf.
We cannot do that, because the conversion takes to much time and like I said earlier, some conversions give strange layout.

Our question is : Is it possible to build from scratch a table of contents inside of an existing Pdf document? The table of contents should present every “header” from the document, like a Word document does.

If my situation isn’t clear enough, I’ll be happy to explain more.

Thank you!

I presume you’re using the default PDF reader in GemBox.Document.
Try using the hight-fidelity PDF reader instead:

However, since your goal is to generate a merged PDF, I would suggest you keep those original PDF files and merge them directly.

First note that for converting DOC, DOCX, ODT, etc. files to PDF you’ll need GemBox.Document, but for merging multiple PDF files into one you’ll need GemBox.Pdf:

Now regarding the TOC, the problem is that there are no “Heading” styles in PDF files like what you have in Word files. There are no outline levels on paragraph elements that would indicate if the paragraph should be listed in TOC.
Also, converting PDF to DOCX will not resolve this problem because the resulting DOCX still won’t have any “Heading” paragraph.

So, to create some kind of TOC element from PDF files you’ll need to define what paragraphs you’ll consider as “Heading” paragraphs.
However, you mentioned that your client is uploading the files, so I’m not sure that you’ll be able to have an accurate indicator of “Heading” paragraphs because the client could upload a PDF file with any kind of content.

Do you perhaps have some sample files of what’s expected to be uploaded?

Regards,
Mario

I presume you’re using the default PDF reader in GemBox.Document.
Try using the hight-fidelity PDF reader instead:

We are already using the high-fidelity reader. We even tried to convert the document using the online converter here : Convert PDF to Word (DOCX) from C# / VB.NET applications
I cannot upload a PDF file here, but the file is available here : https://documentation.medial.ca/documentation/assignation-temporaire-dun-travail.pdf
Converting it to Word gives un unexpected result, even with high-fidelity.

First note that for converting DOC, DOCX, ODT, etc. files to PDF you’ll need GemBox.Document, but for merging multiple PDF files into one you’ll need GemBox.Pdf:

We have both. This part is fine.

Now regarding the TOC, the problem is that there are no “Heading” styles in PDF files like what you have in Word files. There are no outline levels on paragraph elements that would indicate if the paragraph should be listed in TOC.

Thanks for the answer. That explains why we were unable to build from scratch a TOC based on dynamic content.

Do you perhaps have some sample files of what’s expected to be uploaded?

Yes, but no :slight_smile: Yes we have some sample files, but no, because it can be anything, including files with strange layout.
To give a bit more details, our solution helps our clients building a OHS program for their employes and their businesses. Depending on what their business model is (office worker vs factory worker vs kindergarten vs etc.), their OHS program will be completely different.

So, since we must output a clean and professional document, it must contains a TOC. I guess we’ll have no choice but to create a Word document, then build the TOC, and after that convert it again in PDF if needed. Is that right?

Thanks

No, you’ll still have the same issue at hand.
When you convert a Word document to PDF, you still won’t have the “Heading” paragraphs.