GemBox Support Forum

Incorrect table contents when reading a PDF document

Stumbled upon a couple of problems when reading tables from a PDF, which I found as a sample of a table spanning across multiple pages.

In the first big table (with index 2), the columns should be 19, but the rows after the header row indicate they have 21 cells, i.e. there are 2 extra empty columns for no obvious reason (the first one and one in the middle).

Probably some not visible formatting elements are involved :thinking:

More importantly, the table on the second page reports only 16 cells per row, and the empty columns are just missing from the Table object. The problem with the missing empty columns is that this makes it impossible to combine the contents of multiple tables. Is there a way to instruct the component to preserve the columns with empty values?

Hi Ivan,

Apologize for the bit late response.

Anyway, please note that GemBox.Document’s PDF reader never left the BETA stage and I’m afraid that we’re not actively working on its recognition algorithms:

In the future, we intend to replace the current PDF reader with a newer implementation from GemBox.Pdf, but I’m afraid that won’t be available any time soon.


OK, if you work on this, I think that adding an option PreserveEmptyColumns definitely will be a good (and probably easy to implement) extension to be able to deal with table that continue on multiple pages.