We are using the latest version and I tried it too, with the online page example, where I too, get an error.
It would be great if you could say whats wrong and how we can fix it.
using System.Linq;
using System.Text.RegularExpressions;
using GemBox.Document;
using GemBox.Email;
using GemBox.Email.Mime;
class Program
{
static void Main()
{
// If using Professional version, put your GemBox.Email serial key below.
GemBox.Email.ComponentInfo.SetLicense("FREE-LIMITED-KEY");
// If using Professional version, put your GemBox.Document serial key below.
GemBox.Document.ComponentInfo.SetLicense("FREE-LIMITED-KEY");
// Load an email file.
MailMessage message = MailMessage.Load("Attachment.msg");
// Create a new document.
DocumentModel document = new DocumentModel();
// Import the email's content to the document.
LoadHeaders(message, document);
LoadBody(message, document);
LoadAttachments(message.Attachments, document);
// Save the document as PDF.
document.Save("Export.pdf");
}
static void LoadHeaders(MailMessage message, DocumentModel document)
{
// Create HTML content from the email headers.
var htmlHeaders = $@"
<style>
* {{ font-size: 12px; font-family: Calibri; }}
th {{ text-align: left; padding-right: 24px; }}
</style>
<table>
<tr><th>From:</th><td>{message.From[0].ToString().Replace("<", "<").Replace(">", ">")}</td></tr>
<tr><th>Sent:</th><td>{message.Date:dddd, d MMM yyyy}</td></tr>
<tr><th>To:</th><td>{message.To[0].ToString().Replace("<", "<").Replace(">", ">")}</td></tr>
<tr><th>Subject:</th><td>{message.Subject}</td></tr>
</table>
<hr>";
// Load the HTML headers to the document.
document.Content.End.LoadText(htmlHeaders, LoadOptions.HtmlDefault);
}
static void LoadBody(MailMessage message, DocumentModel document)
{
if (!string.IsNullOrEmpty(message.BodyHtml))
// Load the HTML body to the document.
document.Content.End.LoadText(
ReplaceEmbeddedImages(message.BodyHtml, message.Attachments),
LoadOptions.HtmlDefault);
else
// Load the TXT body to the document.
document.Content.End.LoadText(
message.BodyText,
LoadOptions.TxtDefault);
}
// Replace attached CID images to inlined DATA urls.
static string ReplaceEmbeddedImages(string htmlBody, AttachmentCollection attachments)
{
var srcPattern =
"(?<=<img.+?src=[\"'])" +
"(.+?)" +
"(?=[\"'].*?>)";
// Iterate through the "src" attributes from HTML images in reverse order.
foreach (var match in Regex.Matches(htmlBody, srcPattern, RegexOptions.IgnoreCase).Cast<Match>().Reverse())
{
var imageId = match.Value.Replace("cid:", "");
Attachment attachment = attachments.FirstOrDefault(a => a.ContentId == imageId);
if (attachment != null)
{
// Create inlined image data. E.g. "..."
ContentEntity entity = attachment.MimeEntity;
var embeddedImage = entity.Charset.GetString(entity.Content);
var embeddedSrc = $"data:{entity.ContentType};{entity.TransferEncoding},{embeddedImage}";
// Replace the "src" attribute with the inlined image.
htmlBody = $"{htmlBody.Substring(0, match.Index)}{embeddedSrc}{htmlBody.Substring(match.Index + match.Length)}";
}
}
return htmlBody;
}
static void LoadAttachments(AttachmentCollection attachments, DocumentModel document)
{
var htmlSubtitle = "<hr><p style='font: bold 12px Calibri;'>Attachments:</p>";
document.Content.End.LoadText(htmlSubtitle, LoadOptions.HtmlDefault);
foreach (Attachment attachment in attachments.Where(
a => a.DispositionType == ContentDispositionType.Attachment &&
a.MimeEntity.ContentType.TopLevelType == "image"))
{
document.Content.End.InsertRange(
new Paragraph(document, new Picture(document, attachment.Data)).Content);
}
}
}
I found out that the Mail contains an image Link that leads to a website and then downloads it there and replaces the image with that. How can I forbid to do that?
The saving and loading text also takes very long for those emails, very strange.
Greetings Brian
Regarding the failed email loading, the problem with that MSG file is that it is not an email.
Try opening it in Microsoft Outlook, you’ll notice that the Meeting is opened.
Regarding the slow HTML loading, the problem is with this this image URL: https://info.cloudacademy.com/e2t/to/VWtMcp8cQyFnW3dWLxX1tT2jYW9ccHTJ4sgN6MW1m0Fbj1gjjmx103
It takes +30 seconds to load it, for example, please check this:
var watch = Stopwatch.StartNew();
var options = new HtmlLoadOptions();
options.ResourceLoading += (sender, e) =>
{
Console.WriteLine(watch.Elapsed);
Console.WriteLine();
Console.WriteLine($"Loading: {e.Uri}");
};
string html = ReplaceEmbeddedImages(message.BodyHtml, message.Attachments);
document.Content.End.LoadText(html, options);
watch.Stop();
Console.WriteLine($"Finished: {watch.Elapsed}");
This is probably a tracking pixel in the email’s message.
Hey @mario.gembox, so know everything works right, but this Email takes forever to save and I don’t know why, I removed the pictures that would load and then try to save the e-mail with no pictures.
I changed the code to this:
private string ReplaceEmbeddedImages(string htmlBody, AttachmentCollection attachments)
{
var srcPattern =
"(?<=<img.+?src=[\"'])" +
"(.+?)" +
"(?=[\"'].*?>)";
// Iterate through the "src" attributes from HTML images in reverse order.
foreach (var match in Regex.Matches(htmlBody, srcPattern, RegexOptions.IgnoreCase).Cast<Match>().Reverse())
{
// We need to delete that part with an url in it
if (Uri.IsWellFormedUriString(match.ToString(), UriKind.RelativeOrAbsolute))
{
var imageId = match.Value.Replace("cid:", "");
Attachment attachment = attachments.FirstOrDefault(a => a.ContentId == imageId);
// Replace the "src" attribute with the inlined image.
htmlBody = $"{htmlBody.Substring(0, match.Index)}{""}{htmlBody.Substring(match.Index + match.Length)}";
}
else
{
var imageId = match.Value.Replace("cid:", "");
Attachment attachment = attachments.FirstOrDefault(a => a.ContentId == imageId);
if (attachment != null)
{
// Create inlined image data. E.g. "..."
ContentEntity entity = attachment.MimeEntity;
var embeddedImage = entity.Charset.GetString(entity.Content);
var embeddedSrc = $"data:{entity.ContentType};{entity.TransferEncoding},{embeddedImage}";
// Replace the "src" attribute with the inlined image.
htmlBody = $"{htmlBody.Substring(0, match.Index)}{embeddedSrc}{htmlBody.Substring(match.Index + match.Length)}";
}
}
}
return htmlBody;
}
Unfortunately, the problem occurs because the email’s body has a lot of nested table elements with different layout (inline vs floating) which are causing the GemBox.Document’s rendering engine to work so long to process this.
I’m afraid that at this moment we cannot provide an improvement to this.
We will try to address this in the future, but for now can you try using the following workaround before saving to PDF:
static void Main()
{
// ...
Table table = null;
while ((table = GetNestedTable(document)) != null)
{
var parentParentTable = (Table)table.Parent.Parent.Parent.Parent.Parent.Parent;
table.TableFormat = parentParentTable.TableFormat.Clone();
parentParentTable.Content.Start.InsertRange(table.Content);
parentParentTable.Content.Delete();
}
document.Save("output.pdf");
}
static Table GetNestedTable(DocumentModel document)
{
foreach (Table table in document.GetChildElements(true, ElementType.Table))
{
var parentCell = table.Parent as TableCell;
if (parentCell == null)
continue;
var parentRow = parentCell.Parent;
var parentTable = parentRow.Parent;
if (parentCell.Blocks.Count != 1 || parentRow.Cells.Count != 1 || parentTable.Rows.Count != 1)
continue;
var parentParentCell = parentTable.Parent as TableCell;
if (parentParentCell == null)
continue;
var parentParentRow = parentParentCell.Parent;
var parentParentTable = parentParentRow.Parent;
if (parentParentCell.Blocks.Count != 1 || parentParentRow.Cells.Count != 1 || parentParentTable.Rows.Count != 1)
continue;
return table;
}
return null;
}
In short, the “while” loop will move the tables that are nested for at least two levels and they are the only child elements of their parent table.
Last, just as an alternative, you could use the PdfSaveOptions.ProgressChanged event to cancel the saving if it takes too long.
For instance, check the following example:
Thank you very much, on wednesday I will try the workaround, the most Important thing is to keep the content, the fancy html formatting is not that important.
Hey @mario.gembox , I would have one last questetion this email, says while saving invalid uri hostname. Is there a way to ignore this issue while saving or how can I get the hostname?
I will try It, but I think we already are using the latest version. And you workaround did help the saving time is now much faster but on my side, I now have this e-mail html code in my PDF:
Or this NuGet package: Install-Package GemBox.Document -Version 33.0.1307-hotfix
Regarding the irregular HTML comments, note that we’re still working on it.
Regarding that last file which results in a long save time, I’m afraid the problem is again due to excessive usage of nested tables.
Anyway, I don’t think there is any point in creating another workaround that would handle this when it’s clear that you may have any kind of HTML content.
So, considering how you mentioned that you don’t care about the “fancy html formatting”, how about you export the document content as plain text?
For example, like this: