Get bounds of each word

Dear Gembox,

Is it possible to get the bounds of each word (or even better, each character) in a PdfTextContent? I see there is a GetGlyphOffsets but that only gives me the x position of each letter. And if possible is it also possible to get a SpaceCharWidth value for that PdfTextContent (the .format.[text].WordSpacing seems to be 0)?

Hi Mario,

Can I nudge this question for an answer please.

Hi Simon,

There is no way to get height information for each letter, but we’re working on it.
I’ll contact you again as soon as possible.

Regards,
Mario

1 Like

Hi Simon,

We added support for retrieving PDF text content glyphs with PdfTextContent.Text property.
Please try using this NuGet package:

Install-Package GemBox.Pdf -Version 2025.4.100-hotfix

Note that this is a hidden (unlisted) version. To install it, you’ll need to run the above command on the NuGet Package Manager Console (Tools → NuGet Package Manager → Package Manager Console).

And try the following example that draws rectangles on PdfTextContent elements’ bounds and individual glyphs bounds:

static void Main()
{
    using var document = PdfDocument.Load("input.pdf");
    var page = document.Pages[0];

    var elements = page.Content.Elements;
    var boundsGroup = elements.AddGroup();
    elements.Group(elements.First, elements.Last.Previous);

    // Use page.Transform if not drawing bounds on the page but calculating the bounds on a potentially transformed (rotated) page.
    // var transform = page.Transform;
    var transform = PdfMatrix.Identity;

    using var enumerator = elements.All(transform, flattenForms: true).GetEnumerator();
    while (enumerator.MoveNext())
    {
        if (enumerator.Current.ElementType != PdfContentElementType.Text)
            continue;

        var textElement = (PdfTextContent)enumerator.Current;
        transform = textElement.Transform * enumerator.Transform;
        foreach (var glyph in textElement.Text)
        {
            var glyphBounds = glyph.Bounds;
            transform.Transform(ref glyphBounds);
            DrawBounds(boundsGroup, PdfColors.Green, glyphBounds);
        }

        transform = enumerator.Transform;
        var elementBounds = textElement.Bounds;
        transform.Transform(ref elementBounds);
        DrawBounds(boundsGroup, PdfColors.Red, elementBounds);
    }

    document.Save("output.pdf");
}

static void DrawBounds(PdfContentGroup group, PdfColor color, PdfQuad bounds)
{
    var pathElement = group.Elements.AddPath();
    pathElement.BeginSubpath(bounds.Point0, isClosed: true)
        .LineTo(bounds.Point1)
        .LineTo(bounds.Point2)
        .LineTo(bounds.Point3);

    var strokeFormat = pathElement.Format.Stroke;
    strokeFormat.IsApplied = true;
    strokeFormat.Width = 0.5;
    strokeFormat.Color = color;
}

I hope this helps.

Regards,
Mario

Hi Mario,

Works perfectly, thank you.

Does that mean this feature will be released in the next general update or do I need to avoid updating the dll so as not to overwrite this hot-fix version?

Regards,
Simon

Hi Simon,

We have released a new version today; it includes this, so you can feel free to update to the latest version.

Regards,
Mario

1 Like