Loading an HTML file with the following content results in a stack overflow and stops the running application:
<html>
<body>
<div style=“margin:30%;”>
Test
</div>
</body>
</html>
Calling DocumentModel.Load() on the file and then GetPaginator() on the loaded document reproduces the issue.
This also applies to setting the width or other style elements in %, as long as the parent element does not contain a style with an absolute value (in px for example). So setting the body width to 1000px solves the issue.
Is there a possibility to set the body width/height of the document before getting the Paginator? Or is there a way to throw an exception before the application stops due to the stack overflow?
I was unable to reproduce your issue, I tried the following:
string html = @"<html>
<body>
<div style='margin:30%;'>
Test
</div>
</body>
</html>";
var document = new DocumentModel();
document.Content.LoadText(html, LoadOptions.HtmlDefault);
document.Save("output.pdf");
I also tried loading that content from the “input.html” file, still no issue occurred.
with another HTML file I have a similar issue now. The application does not get a stack overflow, but the paginating (= loading the document and then calling GetPaginator().Pages) takes more than 5 minutes which causes my application to throw a Timeout Exception.
The HTML file is not large, it is the body of a newsletter mail. I uploaded it → here ← .
I tried it with the newest version of GemBox.Document. Could you have a look at it with this HTML content?
Unfortunately, GemBox.Document currently has an issue with saving HTML content that has a large number of nested tables, they end up requiring quite some time to be rendered to PDF.
As a workaround for now, can you use the following:
static void Main()
{
var document = DocumentModel.Load("document.html");
foreach (Table table in document.GetChildElements(true, ElementType.Table))
CleanTable(table);
Table nestedTable;
while ((nestedTable = GetNestedTable(document)) != null)
{
var parentTable = (Table)nestedTable.Parent.Parent.Parent;
nestedTable.TableFormat = parentTable.TableFormat.Clone();
parentTable.Content.Start.InsertRange(nestedTable.Content);
parentTable.Content.Delete();
}
document.Save("output.pdf");
}
static void CleanTable(Table table)
{
foreach (var row in table.Rows.ToList())
{
foreach (var cell in row.Cells.ToList())
{
if (cell.GetChildElements(true, ElementType.Picture).Any())
continue;
string content = cell.Content.ToString().Replace("\x00A0", "").Trim();
if (string.IsNullOrEmpty(content))
row.Cells.Remove(cell);
}
if (row.Cells.Count == 0)
table.Rows.Remove(row);
}
}
static Table GetNestedTable(DocumentModel document)
{
foreach (Table table in document.GetChildElements(true, ElementType.Table))
{
var parentCell = table.Parent as TableCell;
if (parentCell == null)
continue;
var parentRow = parentCell.Parent;
var parentTable = parentRow.Parent;
if (parentCell.Blocks.Count != 1 || parentRow.Cells.Count != 1 || parentTable.Rows.Count != 1)
continue;
var parentParentCell = parentTable.Parent as TableCell;
if (parentParentCell == null)
continue;
var parentParentRow = parentParentCell.Parent;
var parentParentTable = parentParentRow.Parent;
if (parentParentCell.Blocks.Count != 1 || parentParentRow.Cells.Count != 1 || parentParentTable.Rows.Count != 1)
continue;
return table;
}
return null;
}
thank you for the suggested workaround.
Since we try to avoid analysing the content of the documents that we process with our application in detail, we will continue to solve the problem by catching the TimeoutException.
Should I create a Feature Request for faster processing of HTML files with nested tables? Or is the issue already being worked on?
We already have a backlog ticket for this and I’ve added your report to it in order to increase its priority.
But at the moment I cannot say when this could become available.
Please note that we prioritize greater time investments by the number of users requesting them and currently we’re working on some other features that have greater priority.
is the backlog ticket you mentioned already solved?
I added a test document to check the workaround and I recently noticed, that the workaround does not seem to be necessary anymore. Is this probably just a coincidence with the test document or was the issue fixed in a newer version?