Only 'IPM.Note' messages are supported error

Brian · September 10, 2021, 8:13am

Hello, we are using Gembox to convert mails to pdfs, but for an unknown reason this email fails.
This is the Email File-Upload.net - BesprechungumdieInventurplanungzumachen.msg

We are using the latest version and I tried it too, with the online page example, where I too, get an error.
It would be great if you could say whats wrong and how we can fix it.

using System.Linq;
using System.Text.RegularExpressions;
using GemBox.Document;
using GemBox.Email;
using GemBox.Email.Mime;

class Program
{
    static void Main()
    {
        // If using Professional version, put your GemBox.Email serial key below.
        GemBox.Email.ComponentInfo.SetLicense("FREE-LIMITED-KEY");
        
        // If using Professional version, put your GemBox.Document serial key below.
        GemBox.Document.ComponentInfo.SetLicense("FREE-LIMITED-KEY");

        // Load an email file.
        MailMessage message = MailMessage.Load("Attachment.msg");

        // Create a new document.
        DocumentModel document = new DocumentModel();

        // Import the email's content to the document.
        LoadHeaders(message, document);
        LoadBody(message, document);
        LoadAttachments(message.Attachments, document);

        // Save the document as PDF.
        document.Save("Export.pdf");
    }

    static void LoadHeaders(MailMessage message, DocumentModel document)
    {
        // Create HTML content from the email headers.
        var htmlHeaders = $@"
            <style>
              * {{ font-size: 12px; font-family: Calibri; }}
              th {{ text-align: left; padding-right: 24px; }}
            </style>
            <table>
              <tr><th>From:</th><td>{message.From[0].ToString().Replace("<", "&lt;").Replace(">", "&gt;")}</td></tr>
              <tr><th>Sent:</th><td>{message.Date:dddd, d MMM yyyy}</td></tr>
              <tr><th>To:</th><td>{message.To[0].ToString().Replace("<", "&lt;").Replace(">", "&gt;")}</td></tr>
              <tr><th>Subject:</th><td>{message.Subject}</td></tr>
            </table>
            <hr>";

        // Load the HTML headers to the document.
        document.Content.End.LoadText(htmlHeaders, LoadOptions.HtmlDefault);
    }

    static void LoadBody(MailMessage message, DocumentModel document)
    {
        if (!string.IsNullOrEmpty(message.BodyHtml))
            // Load the HTML body to the document.
            document.Content.End.LoadText(
                ReplaceEmbeddedImages(message.BodyHtml, message.Attachments),
                LoadOptions.HtmlDefault);
        else
            // Load the TXT body to the document.
            document.Content.End.LoadText(
                message.BodyText,
                LoadOptions.TxtDefault);
    }

    // Replace attached CID images to inlined DATA urls.
    static string ReplaceEmbeddedImages(string htmlBody, AttachmentCollection attachments)
    {
        var srcPattern =
            "(?<=<img.+?src=[\"'])" +
            "(.+?)" +
            "(?=[\"'].*?>)";

        // Iterate through the "src" attributes from HTML images in reverse order.
        foreach (var match in Regex.Matches(htmlBody, srcPattern, RegexOptions.IgnoreCase).Cast<Match>().Reverse())
        {
            var imageId = match.Value.Replace("cid:", "");
            Attachment attachment = attachments.FirstOrDefault(a => a.ContentId == imageId);

            if (attachment != null)
            {
                // Create inlined image data. E.g. "data:image/png;base64,AABBCC..."
                ContentEntity entity = attachment.MimeEntity;
                var embeddedImage = entity.Charset.GetString(entity.Content);
                var embeddedSrc = $"data:{entity.ContentType};{entity.TransferEncoding},{embeddedImage}";

                // Replace the "src" attribute with the inlined image.
                htmlBody = $"{htmlBody.Substring(0, match.Index)}{embeddedSrc}{htmlBody.Substring(match.Index + match.Length)}";
            }
        }

        return htmlBody;
    }

    static void LoadAttachments(AttachmentCollection attachments, DocumentModel document)
    {
        var htmlSubtitle = "<hr><p style='font: bold 12px Calibri;'>Attachments:</p>";
        document.Content.End.LoadText(htmlSubtitle, LoadOptions.HtmlDefault);

        foreach (Attachment attachment in attachments.Where(
            a => a.DispositionType == ContentDispositionType.Attachment &&
                 a.MimeEntity.ContentType.TopLevelType == "image"))
        {
            document.Content.End.InsertRange(
                new Paragraph(document, new Picture(document, attachment.Data)).Content);
        }
    }
}

Just the normal code.

Greetings Brian

Brian · September 10, 2021, 1:27pm

Hello, I also had another issue with this email:
https://filehorst.de/d/dIxxHuEx

Take ages to load, I narrowed it down to this code of line from your example:

document.Content.End.LoadText(
    ReplaceEmbeddedImages(message.BodyHtml, message.Attachments),
    LoadOptions.HtmlDefault);

What could I do to get it faster?

I found out that the Mail contains an image Link that leads to a website and then downloads it there and replaces the image with that. How can I forbid to do that?

The saving and loading text also takes very long for those emails, very strange.
Greetings Brian

mario.gembox · September 13, 2021, 8:33am

Hi Brian,

Regarding the failed email loading, the problem with that MSG file is that it is not an email.
Try opening it in Microsoft Outlook, you’ll notice that the Meeting is opened.

Regarding the slow HTML loading, the problem is with this this image URL:
https://info.cloudacademy.com/e2t/to/VWtMcp8cQyFnW3dWLxX1tT2jYW9ccHTJ4sgN6MW1m0Fbj1gjjmx103

It takes +30 seconds to load it, for example, please check this:

var watch = Stopwatch.StartNew();

var options = new HtmlLoadOptions();
options.ResourceLoading += (sender, e) =>
{
    Console.WriteLine(watch.Elapsed);
    Console.WriteLine();
    Console.WriteLine($"Loading: {e.Uri}");
};

string html = ReplaceEmbeddedImages(message.BodyHtml, message.Attachments);
document.Content.End.LoadText(html, options);

watch.Stop();
Console.WriteLine($"Finished: {watch.Elapsed}");

This is probably a tracking pixel in the email’s message.

I hope this helps.

Regards,
Mario

Brian · September 17, 2021, 11:50am

Hey @mario.gembox, so know everything works right, but this Email takes forever to save and I don’t know why, I removed the pictures that would load and then try to save the e-mail with no pictures.

I changed the code to this:

private string ReplaceEmbeddedImages(string htmlBody, AttachmentCollection attachments)
{
    var srcPattern =
        "(?<=<img.+?src=[\"'])" +
        "(.+?)" +
        "(?=[\"'].*?>)";

    // Iterate through the "src" attributes from HTML images in reverse order.
    foreach (var match in Regex.Matches(htmlBody, srcPattern, RegexOptions.IgnoreCase).Cast<Match>().Reverse())
    {
        // We need to delete that part with an url in it
        if (Uri.IsWellFormedUriString(match.ToString(), UriKind.RelativeOrAbsolute))
        {
            var imageId = match.Value.Replace("cid:", "");
            Attachment attachment = attachments.FirstOrDefault(a => a.ContentId == imageId);

            // Replace the "src" attribute with the inlined image.
            htmlBody = $"{htmlBody.Substring(0, match.Index)}{""}{htmlBody.Substring(match.Index + match.Length)}";
        }
        else
        {
            var imageId = match.Value.Replace("cid:", "");
            Attachment attachment = attachments.FirstOrDefault(a => a.ContentId == imageId);

            if (attachment != null)
            {
                // Create inlined image data. E.g. "data:image/png;base64,AABBCC..."
                ContentEntity entity = attachment.MimeEntity;
                var embeddedImage = entity.Charset.GetString(entity.Content);
                var embeddedSrc = $"data:{entity.ContentType};{entity.TransferEncoding},{embeddedImage}";

                // Replace the "src" attribute with the inlined image.
                htmlBody = $"{htmlBody.Substring(0, match.Index)}{embeddedSrc}{htmlBody.Substring(match.Index + match.Length)}";
            }
        }
    }

    return htmlBody;
}

But it still takes way too long to save it. Here is the mail:
https://www.file-upload.net/download-14693661/JetztSEMrushSensorausprobieren-undweiterecoae3a955d-defb-4c04-a5b5-7610868168af.eml.html

Greeting Brian

mario.gembox · September 20, 2021, 6:43am

Hi Brian,

Unfortunately, the problem occurs because the email’s body has a lot of nested table elements with different layout (inline vs floating) which are causing the GemBox.Document’s rendering engine to work so long to process this.

I’m afraid that at this moment we cannot provide an improvement to this.
We will try to address this in the future, but for now can you try using the following workaround before saving to PDF:

static void Main()
{
    // ...

    Table table = null;
    while ((table = GetNestedTable(document)) != null)
    {
        var parentParentTable = (Table)table.Parent.Parent.Parent.Parent.Parent.Parent;
        table.TableFormat = parentParentTable.TableFormat.Clone();

        parentParentTable.Content.Start.InsertRange(table.Content);
        parentParentTable.Content.Delete();
    }

    document.Save("output.pdf");
}

static Table GetNestedTable(DocumentModel document)
{
    foreach (Table table in document.GetChildElements(true, ElementType.Table))
    {
        var parentCell = table.Parent as TableCell;
        if (parentCell == null)
            continue;

        var parentRow = parentCell.Parent;
        var parentTable = parentRow.Parent;
        if (parentCell.Blocks.Count != 1 || parentRow.Cells.Count != 1 || parentTable.Rows.Count != 1)
            continue;

        var parentParentCell = parentTable.Parent as TableCell;
        if (parentParentCell == null)
            continue;

        var parentParentRow = parentParentCell.Parent;
        var parentParentTable = parentParentRow.Parent;
        if (parentParentCell.Blocks.Count != 1 || parentParentRow.Cells.Count != 1 || parentParentTable.Rows.Count != 1)
            continue;

        return table;
    }

    return null;
}

In short, the “while” loop will move the tables that are nested for at least two levels and they are the only child elements of their parent table.

Last, just as an alternative, you could use the PdfSaveOptions.ProgressChanged event to cancel the saving if it takes too long.
For instance, check the following example:

I hope this helps.

Regards,
Mario

Brian · September 20, 2021, 8:26am

Thank you very much, on wednesday I will try the workaround, the most Important thing is to keep the content, the fancy html formatting is not that important.

Greetings Brian

Brian · September 22, 2021, 6:58am

Hey @mario.gembox , I would have one last questetion this email, says while saving invalid uri hostname. Is there a way to ignore this issue while saving or how can I get the hostname?

https://www.file-upload.net/download-14698266/FWWireTransferNoticea46ac96d-eebe-46aa-bee2-7be13d972c15.eml.html

Greetings Brian

mario.gembox · September 22, 2021, 7:30am

Hi Brian,

Try using the latest bugfix version:

Does this solve your issue?

Regards,
Mario

Brian · September 22, 2021, 7:35am

I will try It, but I think we already are using the latest version. And you workaround did help the saving time is now much faster but on my side, I now have this e-mail html code in my PDF:

I can do some patternmatching and just remove it, but maybe you can adjust your code too.

Greetins Brian.

Brian · September 22, 2021, 7:42am

I looked in the Content of the document:

And saw that code was copied there from the html.
And I looked the version up we already have the newest and on your website this email fails too.

Brian · September 22, 2021, 7:54am

@mario.gembox here is the email with where its happening.
https://www.file-upload.net/download-14698281/IhreBuchungERNGV4RIdunesistbaldanderZeia1359a6e-9282-44fb-b36c-273a9cdfce81.eml.html

Brian · September 22, 2021, 8:46am

Here is another mail that takes very long to save, even with pictures removed and nested tables un nested.
https://www.file-upload.net/download-14698308/HeidelbergInnovationForum2017AgTechandFoo12fdc870-a14d-4d06-9ef5-a498253eed75.eml.html

mario.gembox · September 22, 2021, 10:29am

Hi,

Regarding the invalid URI hostname exception when saving to PDF, please try again with this bugfix:
https://www.gemboxsoftware.com/document/nightlybuilds/GBD33v1307.zip

Or this NuGet package:
Install-Package GemBox.Document -Version 33.0.1307-hotfix

Regarding the irregular HTML comments, note that we’re still working on it.

Regarding that last file which results in a long save time, I’m afraid the problem is again due to excessive usage of nested tables.
Anyway, I don’t think there is any point in creating another workaround that would handle this when it’s clear that you may have any kind of HTML content.

So, considering how you mentioned that you don’t care about the “fancy html formatting”, how about you export the document content as plain text?
For example, like this:

document.Content.LoadText(document.Content.ToString(), document.DefaultCharacterFormat);
document.Save("output.pdf");

Would that work for you?

Regards,
Mario

Brian · September 22, 2021, 12:45pm

Okay thanks for alle the information, yes that should work, I hope you get the irregular HTML comments under control. Thans for all the help provided.

Greetings Brian

mario.gembox · September 22, 2021, 4:51pm

Hi,

Regarding the irregular HTML comments, please try again with this bugfix:
https://www.gemboxsoftware.com/document/nightlybuilds/GBD33v1309.zip

Or this NuGet package:
Install-Package GemBox.Document -Version 33.0.1309-hotfix

Does this solve your issue?

Regards,
Mario

Brian · September 27, 2021, 6:46am

Yes thanks the irregular HTML comments Issue are fixed now.

Brian · September 27, 2021, 8:16am

Hey @mario.gembox, I am very sorry but I have an new E-Mail that causes an stackoverflow on you side while saving.
https://www.file-upload.net/download-14702580/WGWarebereitzurAuslieferung-Rechnungsnummer.eml.html
I tried it on your website too and there also occurs an error. I hope you can help.

Greeting Brian

mario.gembox · September 29, 2021, 11:53am

Hi Brian,

Regarding the StackOverflowException, please try again with this bugfix:
https://www.gemboxsoftware.com/document/nightlybuilds/GBD33v1317.zip

Or this NuGet package:
Install-Package GemBox.Document -Version 33.0.1317-hotfix

Does this solve your issue?

Regards,
Mario