Remove text from PDF

Hi Mario

My task is:

  1. Load PDF file
  2. Find text “Hello”
  3. Remove this text
  4. Save to PDF again

I am trying the next:

var page = document.Pages[0];
var textElement = page.Content.FindText("HELLO").OfType<PdfTextContent>().Reverse();
textElement.Collection.Remove(textElement);

Hi,

Try this:

var page = document.Pages[0];
foreach (var text in page.Content.GetText().Find("HELLO"))
    text.Redact();

Does this work for you?

Regards,
Mario

Hi Mario. Thanks It works!
But May I ask you how to change “HELLO” on another word “Thanks” and save to pdf again. I mean Find and Replace text in PDF.

GemBox.Pdf currently doesn’t have an API for “Find and Replace”, but we do intend to provide one in the future.
For now, can you try using something like this:

var page = document.Pages[0];
var pageContent = page.Content;
var texts = pageContent.GetText().Find("HELLO").ToList();

foreach (var text in texts)
{
    using var formattedText = new PdfFormattedText();
    formattedText.Append("Thanks");
    pageContent.DrawText(formattedText, new PdfPoint(text.Bounds.Left, text.Bounds.Bottom));
    text.Redact();
}

Thanks. It works perfectly.

I was also looking for a viable search a replace code for pdf.
This one works, but the written text do not reflect the same font of the source text found.
Is it possible to extract the proper font from the source text, and apply to the replaced text instead of using the generic PdfFormattedText ?

Hi Paolo,

We are currently investigating the possible ways to add this functionality and we will inform you about our progress.

Regards,
Stipo

Hi Paolo,

Please try again with this NuGet package:

Install-Package GemBox.Pdf -Version 17.0.1593-hotfix

Note that this is a hidden pre-released version, to install it you’ll need to run the above command on the NuGet Package Manager Console (Tools → NuGet Package Manager → Package Manager Console).

And try running the following example:

var testFiles = new string[]
{
    "TestFileMicrosoft.pdf",
    "TestFileAdobe.pdf",
    "TestFileGemBoxDocument.pdf",
    "TestFileGemBoxPdf.pdf"
};

foreach (var testFile in testFiles)
{
    using var document = PdfDocument.Load(testFile);
    document.Load();

    var page = document.Pages[0];
    var texts = page.Content.GetText().Find("HELLO").ToList();

    using var formattedText = new PdfFormattedText();
    foreach (var text in texts)
    {
        formattedText.Clear();
        formattedText.Font = text.Format.Text.Font;
        formattedText.Color = text.Format.Fill.Color;
        formattedText.Append("Thanks");

        page.Content.DrawText(formattedText, new PdfPoint(text.Bounds.Left, text.Bounds.Bottom));
        text.Redact();
    }

    document.Save(Path.GetFileNameWithoutExtension(testFile) + "-Replaced.pdf");
}

From here input PDF files created with various sources and output PDF files created with the above example code.

I hope this helps.

Regards,
Mario