Extracting image from PDF file

cschellenbach · May 22, 2020, 11:31pm

When I extract the attached content element, using the Export Images example, the only thing that is saved is the background image and the lines don’t make it out.

Is there a way to pull them all and save as an image?

mario.gembox · May 23, 2020, 5:10am

Hi Craig,

Unfortunately, there is no file in the attachment.
Can you try sending it again so that I can take a look at the mentioned content element?

Regards,
Mario

mario.gembox · May 27, 2020, 5:41am

After investigating the PDF file we were able to observe that those lines come from a PdfPathContent elements that are located on top of the targeted PdfImageContent element.

To extract this content we can crop the page to just that desired area using the SetMediaBox method, like the following:

static void Main()
{
    using var document = PdfDocument.Load("input.pdf");
    var page = document.Pages[0];

    var image = (PdfImageContent)page.Content.Elements.All().First(e => e.ElementType == PdfContentElementType.Image);
    var imageTransform = GetImageTransform(page.Content.Elements, image, PdfMatrix.Identity).Value;

    double offsetX = imageTransform.OffsetX;
    double offsetY = imageTransform.OffsetY;

    page.SetMediaBox(offsetX, offsetY, offsetX + imageTransform.M11, offsetY + imageTransform.M22);

    document.Save("output.png");
}

static PdfMatrix? GetImageTransform(PdfContentElementCollection elements, PdfImageContent targetImage, PdfMatrix transform)
{
    PdfMatrix? matrix;
    foreach (var element in elements)
        switch (element)
        {
            case PdfImageContent image:
                if (image == targetImage)
                    return PdfMatrix.Multiply(image.Transform, transform);
                break;
            case PdfContentGroup group:
                matrix = GetImageTransform(group.Elements, targetImage, PdfMatrix.Multiply(group.Transform, transform));
                if (matrix != null)
                    return matrix;
                break;
        }
    return null;
}

Regards,
Mario