Margins change when merging a docx template into a pdf

I have this template that after merging it and saving it as a pdf, margins are changed resulting in a broken layout (especially the two-columns area):

output in pdf:

When I print it in PDF directly in word, this doesn’t happen. Is there a workaround for this? Currently the only code I have is for convertion. This is the code where document is saved as pdf:

Hi Noe,

Please send us your DOCX file so that we can reproduce this issue and investigate it.

Regards,
Mario

How do I do so, Mario?

You can send the file via email or support ticket, see the Contact page.

1 Like

The problem occurs because the document uses font scaling:

Unfortunately, GemBox.Document currently doesn’t support CharacterFormat.Scaling property in PDF:
https://www.gemboxsoftware.com/document/docs/supported-file-formats.html#support-level-for-pdf-xps-and-image-formats

Because of that less content can be fit in the first column and the second column is broken because of the explicit column break.
You may notice the same result in Microsoft Word if you reset the scaling to 100%.

Anyway, note that we do have an internal support ticket for this and I’ve added your report to it as well in order to increase its priority.
But at this moment I cannot tell you exactly when it will be implemented.
We prioritize feature request implementations by the number of users requesting it and currently, we are working on some other feature which has a greater priority (more user requests).

Regards,
Mario

is there any workaround you can suggest on the template layout to avoid this scaling?

Besides, somes shapes and pictures are not rendered in the output pdf:

Here you can see squared (checkboxes) are not rendered but logo in the footer is

Regarding the workaround for the scaling, can you try using this:

var document = DocumentModel.Load("Minnesota-HIPAA-Auth-Form.docx");
foreach (Run run in document.GetChildElements(true, ElementType.Run))
{
    var format = run.CharacterFormat;
    double ratio = format.Scaling / 100.0;
    format.Size *= ratio;
}
document.Save("Minnesota-HIPAA-Auth-Form.pdf");

Regarding the missing shapes, the problem is that those are shapes with custom geometry.
Can you change those into rectangle shapes?

Nevertheless, note that we’re currently working on adding support for rendering shapes with custom geometry. So, can you please try again with this latest bugfix version:

Install-Package GemBox.Document -Version 35.0.1495-hotfix

Does this solve your issue?

Regards,
Mario

1 Like

It is rendering shapes w/custom geometry now (thanks for the hotfix), and still I can see sometimes a shape merges with some offset from its position in the template even if it is in line with text.

image

The thing is templates are being created by export tools from pdf to docx and users just place merge fields using text boxes wherever they need to. I wonder if you can suggest what exporting tool would output a docx that works with gembox the best

Sometimes the original document is not a pdf but an rtf. Either it is PDF or RTF, file is converted to docx to use it as a template

I notice that gembox.document replaces column breaks with a new line that pushes content down and sometimes a new page is necessary for only one line of text.

The problem is that the position of the floating and inline shape may depend on the previous or surrounding content. So if that content is rendered differently (for instance, due to the used character spacing) the position may end up being slightly different.

Have you tried using GemBox.Document for this:

Also, note that GemBox.Document can convert RTF to DOCX.

What do you mean by that?
Are you perhaps referring to your previous issue in which the column break was moved to the second column and thus it resulted in a page break?

Regards,
Mario

Not exactly. After changing implementing your code to change ratio in text size, there is sometimes a new line inserted and after inspecting, I discovered that column breks are merged as a new line:

Input

Ouput

After removing column break, layout breaks, but then after fixing layout without the column break:

I hadn’t try with this but you can see what I mean after giving it a try today:

Can you try saving the DocumentModel to DOCX to check from where does that new line come from.
Of course, when you make the changes to the Size, set the Scaling to 100 because otherwise, you’ll have both formatting in the output DOCX file.

Can you send us your input PDF file?
Also, can you tell us how that PDF file was generated?

1 Like

Documents sent already

Hey Mario,

Last hotfix is not solving the issue.

Thank you for your support

Thank you for your support via email. Really appreciate it

The main cause of the issue with checkbox misalignment was the workaround for the unsupported Scaling. So, we have now added support for CharacterFormat.Scaling in output PDF:

Install-Package GemBox.Document -Version 35.0.1517-hotfix

The issue with the missing lines occurs because these are the VML polyline shapes that need to be converted to custom geometry shapes, but unfortunately, GemBox.Document currently doesn’t support this conversion.
We will introduce this in the future, but for now, perhaps something like this can be used:

var document = DocumentModel.Load("Minnesota-HIPAA-Auth-Form.docx");

var preservedInlines = document.GetChildElements(true).OfType<PreservedInline>().ToList();
foreach (var preservedInline in preservedInlines)
{
    string rawXml = preservedInline.ToString();
    var xml = XElement.Parse(rawXml);

    if (xml.Name != "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pict")
        continue;

    var polyline = xml.Element("{urn:schemas-microsoft-com:vml}polyline");
    if (polyline == null)
        continue;

    var points = polyline.Attribute("points").Value
        .Split(',')
        .Select(coord => ParseNumber(coord))
        .ToList();
    if (points.Count != 4)
        continue;

    string style = polyline.Attribute("style").Value;

    string horizontal = Regex.Match(style, "mso-position-horizontal-relative:([a-z]+)").Groups[1].Value;
    var horizontalAnchor = horizontal switch
    {
        "page" => HorizontalPositionAnchor.Page,
        _ => throw new NotImplementedException()
    };

    string vertical = Regex.Match(style, "mso-position-vertical-relative:([a-z]+)").Groups[1].Value;
    var verticalAnchor = vertical switch
    {
        "page" => VerticalPositionAnchor.Page,
        "text" => VerticalPositionAnchor.Paragraph,
        _ => throw new NotImplementedException()
    };

    string strokecolor = polyline.Attribute("strokecolor").Value;
    var color = new Color(int.Parse(strokecolor.TrimStart('#'), NumberStyles.HexNumber));

    string strokeweight = polyline.Attribute("strokeweight").Value;
    var weight = ParseNumber(strokeweight);

    var horizontalLine = new Shape(document, ShapeType.Line,
        new FloatingLayout(
            new HorizontalPosition(points[0], LengthUnit.Point, horizontalAnchor),
            new VerticalPosition(points[1], LengthUnit.Point, verticalAnchor),
            new Size(points[2] - points[0], points[3] - points[1]))
        { WrappingStyle = TextWrappingStyle.InFrontOfText });

    horizontalLine.Outline.Fill.SetSolid(color);
    horizontalLine.Outline.Width = weight;

    var parent = preservedInline.ParentCollection;
    int index = parent.IndexOf(preservedInline);
    parent.RemoveAt(index);
    parent.Insert(index, horizontalLine);

    double ParseNumber(string value)
    {
        int unitIndex = value.Length - 2;
        double number = double.Parse(value.Remove(unitIndex));
        LengthUnit unit = value.Substring(unitIndex) switch
        {
            "pt" => LengthUnit.Point,
            "px" => LengthUnit.Pixel,
            "in" => LengthUnit.Inch,
            _ => throw new NotImplementedException()
        };
        return LengthUnitConverter.Convert(number, unit, LengthUnit.Point);
    }
}

document.Save("Output.pdf");

The issue with the new line appearing in the place of the column break is usually the expected behavior. However, in this document, it seems there is some sort of an exception.

Unfortunately, we were unable to find the reason why that is, the layout is very complex.
What’s worse is we noticed that nothing changes in the Word document even when we remove that column break (as if it’s being ignored completely).