Loading small HTML Document takes nearly two minutes

Hi GemBox Team,

I have a small HTML document (11 KB) which takes a long time to load, causing my Application to run into a Timeout. Executing the Line DocumentModel.Load(filepath, LoadOptions.HtmlDefault) takes about 01:40 minutes.

Is there a problem with the document or can I somehow avoid the long loading time?

I am using version 47.0.1184-hotfix of GemBox.Bundle.
This is the content of the HTML document:

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><meta name="copyright" content="(c) 2022 Deutsche Post AG">
    <meta name="viewport" content="width=device-width">




<style type="text/css">
<!--
    html, body {
        background-color: white;
        color: #333333;
        margin: 0;
        padding: 0;
        font-family: Arial, Helvetica, sans-serif;
        font-size: 15px;
        font-weight: normal;
        line-height: 22px; }

    td {
        font-family: Arial, sans-serif;
        font-size: 15px;
        line-height: 22px;
        vertical-align: top; }

    p { margin: 0; padding: 0; }
    p.h1 { font-size: 26px; margin-bottom: 22px; }
    a { color: #333333 !important; }
    a.mediumgrey { color: #9C9C9C !important; text-decoration: none; }
    img { border: 0; }
    tr.lightgrey-bg, td.lightgrey-bg { background-color: #F3F3F3; }
    td.mediumgrey { color: #9C9C9C; }
    td.small, p.small { font-size: 11px; line-height: 16px; color: #737373; }
    @media only screen and (max-width: 596px) {
        .show-for-large { display: none !important; }
        table .show-for-large {
            width: 0;
            mso-hide: all;
            overflow: hidden; }
    }

    sup, sub  { color: #666666; font-size: 0.8em; line-height: 100%;}
    .discount { text-decoration: line-through;  color: #666666;}

    table.admin, table.admin th, table.admin td { border: 1px solid lightgrey; }
    table.admin th, table.admin td { text-align: center; vertical-align: middle; }

    .teaserbutton{
    margin: 5px;
    padding: 5px;
    vertical-align: top;
    display:inline-block;
    line-height: 20px;
       font-size: 16px;
       font-weight:bold;
       min-width: 107px;
       background: #F3F3F3;
       padding: 10px;
       border-radius: 4px;
       color: rgba(0,0,0,0.9);
       }
       .dhl-header{
       background:#ffcc00;
       padding:5px;
       }

-->
</style>



</head>
<body>
    <table width="100%" cellpadding="0" cellspacing="0" border="0">
        <tr>
            <td>
                <table width="600" cellpadding="0" cellspacing="0" border="0" align="center">
                    <tr>
                        <td>
                            <table id="header" width="600" cellpadding="0" cellspacing="0" border="0" align="center">
                                <tr>
                                    <td>
                                        <img src="https://shop.deutschepost.de/shop/images/mailing/header.png" border="0" alt="Deutsche Post Shop" style="display: block;">
                                    </td>
                                </tr>
                            </table>
                        </td>
                    </tr>





    <tr>
        <td height="30"></td>
    </tr>











        <tr>
                <td>
                        <table width="540" cellpadding="0" cellspacing="0" border="0" align="center">
                                <tr>
                                        <td>













    <p>























                        Guten Tag

                        xxxxxxxx,






    </p>
    <p>&nbsp;</p>





                                                <p>




                                                                               <br>
                                                                               im Anhang dieser E-Mail finden Sie die Rechnung zu Ihrer Bestellung Nr.
                                                                               xxxxxxxxx
                                                                               als PDF-Datei.




                                                </p>
                                        </td>
                                </tr>
                        </table>
                <td>
        </tr>
















    <tr>
        <td height="30"></td>
    </tr>


            <tr>
                <td class="lightgrey-bg">
                    <table width="600" cellpadding="0" cellspacing="0" border="0" align="center">
                        <tr>
                            <td colspan="3" height="15"></td>
                        </tr>
                        <tr>
                            <td width="30"></td>
                            <td>
                                <table width="100%" cellpadding="0" cellspacing="0" border="0" align="center">


                                    <tr>
                                        <td width="220">
                                            <strong>
                                                Kundennummer:
                                            </strong>
                                        </td>
                                        <td width="320" style="text-align: right;">








                                                    xxxxxxxxxxxx


                                        </td>
                                    </tr>

                                    <tr>
                                        <td width="220">
                                            <strong>
                                                Bestellnummer:
                                            </strong>
                                        </td>
                                        <td width="320" style="text-align: right;">
                                            xxxxxxxxxxxxx
                                        </td>
                                    </tr>
                                    <tr>
                                        <td width="220">
                                            <strong>
                                                Bestelldatum:
                                            </strong>
                                        </td>
                                        <td width="320" style="text-align: right;">
                                            xxxxxxxxxxxx
                                        </td>
                                    </tr>









                                                <tr>
                                                    <td width="220">
                                                        <strong>
                                                            Zahlungsart:
                                                        </strong>
                                                    </td>
                                                    <td width="320" style="text-align: right;">






    xxxxxxxxx

                                                    </td>
                                                </tr>



                                </table>
                            </td>
                            <td width="30"></td>
                        </tr>
                        <tr>
                            <td colspan="3" height="15"></td>
                        </tr>
                    </table>
                </td>
            </tr>






    <tr>
        <td height="15"></td>
    </tr>













































        <tr>
                <td>
                        <table width="540" cellpadding="0" cellspacing="0" border="0" align="center">
                                <tr>
                                        <td>






















































                                                  <span></span>
                                        </td>
                                </tr>
                        </table>
                </td>
        </tr>


















    <tr>
        <td height="20"></td>
    </tr>


<tr>
        <td>
                <table width="540" cellpadding="0" cellspacing="0" border="0" align="center">
                        <tr>
                                <td>













                                                                <p>Für Rückfragen an unser Service Center nutzen Sie bitte unser Kontaktformular unter xxxxxxxx, so 
dass wir Ihr Anliegen zielgerichtet bearbeiten können.</p>
<p>&nbsp;</p>
<p>Mit freundlichen Grüßen</p>
<p>&nbsp;</p>
<p>Ihr Shop der Deutschen Post</p>



                                </td>
                        </tr>
                </table>
        </td>
</tr>






    <tr>
        <td height="20"></td>
    </tr>


















    <!-- Spacer //-->
    <tr>
        <td height="1" bgcolor="#dddddd"></td>
    </tr>
    <!-- // Spacer -->





<tr>
        <td>
                <table width="540" cellpadding="0" cellspacing="0" border="0" align="center">





    <tr>
        <td height="20"></td>
    </tr>


                        <tr>
                                <td>
















                                                                        <p>
  <strong>Deutsche Post AG</strong><br>
  Service- und Versandzentrum<br>
  92631 Weiden</p>
<p>&nbsp;</p>
<p>E-Mail: xxxxxx</p>




                                </td>
                        </tr>





    <tr>
        <td height="20"></td>
    </tr>


                </table>
        </td>
</tr>






    <!-- Spacer //-->
    <tr>
        <td height="1" bgcolor="#dddddd"></td>
    </tr>
    <!-- // Spacer -->





<tr>
    <td>
        <table width="540" cellpadding="0" cellspacing="0" border="0" align="center">





    <tr>
        <td height="20"></td>
    </tr>


            <tr>
                <td class="mediumgrey">





                        © 2023 Deutsche Post AG<br>
<div style="font-size:12px; line-height:16px; margin:10px 0">
 Sitz Bonn; Registergericht Bonn HRB 6792<br>
 Vorstand: Dr. Tobias Meyer, Vorsitzender; Oscar de Bok, Pablo Ciano, Nikola Hagleitner, Melanie Kreis, Dr. Thomas Ogilvie, John Pearson, Tim Scharwath<br>
 Vorsitzender des Aufsichtsrates: Dr. Nikolaus von Bomhard
</div>
 | 

                </td>
            </tr>





    <tr>
        <td height="30"></td>
    </tr>


        </table>
    </td>
</tr>

Hi Sarah,

The problem occurs when retrieving that image.
In other words, you can reproduce the issue with this:

string url = "https://shop.deutschepost.de/shop/images/mailing/header.png";
using var client = new HttpClient();
using var response = await client.GetAsync(url);
byte[] data = await response.Content.ReadAsByteArrayAsync();

Unfortunately, I’m not sure what the problem is.
I tried changing a few options in HttpMessageHandler and HttpClient but none of them worked.

If you perhaps have any idea as to what could be the problem let me know.
Or if you want to perhaps ignore downloading of that image or reduce the timeout then use the HtmlLoadOptions.ResourceLoading event, like this:

var htmlLoadOptions = new HtmlLoadOptions();
htmlLoadOptions.ResourceLoading += async (sender, e) =>
{
    string url = e.Uri.OriginalString;
    // Download image data and set the "e.Data" byte array to it.
    // Or ignore the resource download by setting the "e.Cancel" to true.
};

var document = DocumentModel.Load("input.html", htmlLoadOptions);

I hope this helps.

Regards,
Mario

Hi Mario,

thank you for the investigation.
I tried using the Cancel setting to ignore the resource download:

htmlLoadOptions.ResourceLoading += (sender, e) =>
{
     e.Cancel = false;
};

but the loading time remained the same. Did I misunderstand the Cancel option? Do I have to set it another way?

Best regards,
Sarah

Apologize, please try setting e.Cancel to true.

Hi Mario,

when cancelling the resource loading, the document can be read within seconds. So the workaround is fine for this specific document.
But using it generally in my application will cause other files, that do not have this loading problem, to also not load their images even though they could do so without taking so long. Is there a possibility, to only cancel the loading of resources, if it takes more than a specific number of seconds? Meaning that I can configure a resource loading timeout by using the HtmlLoadOptions.

Best regards,
Sarah

Hi Sarah,

Try this, it will cancel the resource loading after 1 second (1000 ms).

var htmlLoadOptions = new HtmlLoadOptions();
htmlLoadOptions.ResourceLoading += (sender, e) =>
{
    try
    {
        var request = WebRequest.Create(e.Uri);
        request.Timeout = 1000;

        using var response = request.GetResponse();
        using var responseStream = response.GetResponseStream();
        using var memoryStream = new MemoryStream();

        responseStream.CopyTo(memoryStream);
        e.Data = memoryStream.ToArray();
    }
    catch (WebException)
    {
        e.Cancel = true;
    }
};

var document = DocumentModel.Load("input.html", htmlLoadOptions);
document.Save("output.docx");

Regards,
Mario

1 Like

Hi Mario,

this is exactly what I needed! Thanks :slight_smile:

Small edit from my side: I used the System.Net.Http.HttpClient instead of System.Net.Requests.WebRequest, but both ways worked fine for me.

Best regards,
Sarah