Paginating HTML document with style in % causes stack overflow

Loading an HTML file with the following content results in a stack overflow and stops the running application:

<html>
<body>
<div style=“margin:30%;”>
Test
</div>
</body>
</html>

Calling DocumentModel.Load() on the file and then GetPaginator() on the loaded document reproduces the issue.

This also applies to setting the width or other style elements in %, as long as the parent element does not contain a style with an absolute value (in px for example). So setting the body width to 1000px solves the issue.

Is there a possibility to set the body width/height of the document before getting the Paginator? Or is there a way to throw an exception before the application stops due to the stack overflow?

Hi Sarah,

I was unable to reproduce your issue, I tried the following:

string html = @"<html>
<body>
<div style='margin:30%;'>
Test
</div>
</body>
</html>";

var document = new DocumentModel();
document.Content.LoadText(html, LoadOptions.HtmlDefault);
document.Save("output.pdf");

I also tried loading that content from the “input.html” file, still no issue occurred.

Just in case, please try again with the latest bugfix version:
https://www.gemboxsoftware.com/document/downloads/bugfixes.html

Or the latest NuGet package:
https://www.nuget.org/packages/GemBox.Document/

Regards,
Mario

Hi Mario,

I was still using GemBox.Bundle in version 47.0.1006 and updating to 47.0.1024 (including GemBox.Document 35.0.1300) fixed the issue.

Thank you for the investigation!

Regards,
Sarah

Hi Mario,

with another HTML file I have a similar issue now. The application does not get a stack overflow, but the paginating (= loading the document and then calling GetPaginator().Pages) takes more than 5 minutes which causes my application to throw a Timeout Exception.

The HTML file is not large, it is the body of a newsletter mail. I uploaded it → here ← .

I tried it with the newest version of GemBox.Document. Could you have a look at it with this HTML content?

Regards,
Sarah

Hi Sarah,

Unfortunately, GemBox.Document currently has an issue with saving HTML content that has a large number of nested tables, they end up requiring quite some time to be rendered to PDF.

As a workaround for now, can you use the following:

static void Main()
{
    var document = DocumentModel.Load("document.html");

    foreach (Table table in document.GetChildElements(true, ElementType.Table))
        CleanTable(table);

    Table nestedTable;
    while ((nestedTable = GetNestedTable(document)) != null)
    {
        var parentTable = (Table)nestedTable.Parent.Parent.Parent;
        nestedTable.TableFormat = parentTable.TableFormat.Clone();
        parentTable.Content.Start.InsertRange(nestedTable.Content);
        parentTable.Content.Delete();
    }

    document.Save("output.pdf");
}

static void CleanTable(Table table)
{
    foreach (var row in table.Rows.ToList())
    {
        foreach (var cell in row.Cells.ToList())
        {
            if (cell.GetChildElements(true, ElementType.Picture).Any())
                continue;

            string content = cell.Content.ToString().Replace("\x00A0", "").Trim();
            if (string.IsNullOrEmpty(content))
                row.Cells.Remove(cell);
        }

        if (row.Cells.Count == 0)
            table.Rows.Remove(row);
    }
}

static Table GetNestedTable(DocumentModel document)
{
    foreach (Table table in document.GetChildElements(true, ElementType.Table))
    {
        var parentCell = table.Parent as TableCell;
        if (parentCell == null)
            continue;

        var parentRow = parentCell.Parent;
        var parentTable = parentRow.Parent;
        if (parentCell.Blocks.Count != 1 || parentRow.Cells.Count != 1 || parentTable.Rows.Count != 1)
            continue;

        var parentParentCell = parentTable.Parent as TableCell;
        if (parentParentCell == null)
            continue;

        var parentParentRow = parentParentCell.Parent;
        var parentParentTable = parentParentRow.Parent;
        if (parentParentCell.Blocks.Count != 1 || parentParentRow.Cells.Count != 1 || parentParentTable.Rows.Count != 1)
            continue;

        return table;
    }

    return null;
}

I hope this helps.

Regards,
Mario

Hi Mario,

thank you for the suggested workaround.
Since we try to avoid analysing the content of the documents that we process with our application in detail, we will continue to solve the problem by catching the TimeoutException.

Should I create a Feature Request for faster processing of HTML files with nested tables? Or is the issue already being worked on?

Best regards,
Sarah

We already have a backlog ticket for this and I’ve added your report to it in order to increase its priority.
But at the moment I cannot say when this could become available.
Please note that we prioritize greater time investments by the number of users requesting them and currently we’re working on some other features that have greater priority.

1 Like

Hi Mario,

Alright, thank you for the information!

Best regards,
Sarah