linkedinfacebooktwittermenuarrow-up

GemBox Support Forum

Headers, footers, section breaks and page breaks in Html to Docx conversion

Is there clear documentation on how to properly use headers, footers and breaks in Html so they convert properly to Docx?

I have them working to some degree but sometimes I make small changes to my Html and it will stop inserting the page breaks properly or sometimes it will add page breaks where they don’t belong, often right after a header so I end up with a blank first page. I’m finding the Html to be very fragile when it comes to correctly converting to Docx even though it respects all my page breaks when actually printing or saving to PDF.

As a related issue, is it possible to build the Html in a way that it will use different headers or footers in different sections of the resulting document? Our site allows someone to download a single document or download sets of documents. If they download sets then it builds a page that has all the content on it and converts the entire thing to one Docx file. It would be nice if I could change the header to match the sub-document. I’ve tried using elements in the html, hoping that might result in a different section in the Docx but that doesn’t seem to work.

I know that ultimately it would be better for me to build these directly in code as Docx from scratch but I’m working with what I have. Converting the old Cloud Convert method to Gembox was faster than rebuilding everything to create the doc directly using the Gembox object model.

Hi Steve!

Regarding the conversion of page-related properties from HTML to DOCX, I recommend that you take a look at the following example: Convert HTML pages to PDF files from C# / VB.NET applications.

GemBox.Document currently doesn’t support loading multiple headers/footers from HTML. Currently, only the following HTML is supported:
If <header> is the first element in the HTML file, then its content will be read as a document’s default header; if <footer> is the last element in the HTML file, then its content will be read as a document’s default footer.

If you want to build HTML with multiple sections with different headers/footers, then I recommend that you simply load each HTML section with its own header/footer into another DocumentModel instance and then clone Section(s) of that DocumentModel instance into a final DocumentModel instance that you save to DOCX or any other file format.

Regards,
Stipo

Thank you, that’s what I suspected. I had seen that document and I have successfully created headers and footers. I was guessing based on that document that there is no way to create more than one using just HTML. Thank you for the suggested workaround to get multiple sections, I will explore that idea a bit.

That page says “It uses a subset of CSS properties and some additional arbitrary properties from Microsoft Word (like mso-pagination and mso-rotate ).” Is there a list anywhere of the mso- properties that it supports or is it just those two? It would be interesting to know what else I could use to help with the Docx conversion.

Is there a better way to force page breaks? I’ve found them to be somewhat inconsistent. I can add css to force page breaks after certain elements and very frequently that css will be ignored. I’m pretty sure it doesn’t support css pseudo classes. I’ve had the best luck with having a

that I insert where I want page breaks with css that sets that class to always have a page break. I still sometimes get page breaks at odd locations where the break makes no sense - it will leave me with a mostly empty page and the next page with only a little content on it. If I print a physical page or use the browser’s print/save to PDF I don’t get those random page breaks.

I know conversion to Docx is complex so I don’t expect it to be perfect, I’m just hoping for a bit more documentation about how to craft my Html to make it work better.

Hi,

No, we do not document our support for proprietary HTML/CSS extensions. By looking at our source code, I see that we currently support: mso-rotate, mso-pagination, mso-field-code, mso-header-margin, mso-footer-margin, and mso-outline-level

Page breaks should be easily achievable with page-break-before: always as shown in the example Convert HTML pages to PDF files from C# / VB.NET applications

If you are having a problem with this approach, I recommend that you create a support ticket with a problematic HTML and we will investigate.

Regards,
Stipo