Extracting images from a document

Hi guys,

I am looking for some advice on extracting “images” from a word document into files in their own right.
Ideally, I would like to get Shape, Textbox, and Picture elements in a format that lets me save them as an image representation.

This seems to be possible with Picture (using PictureStream) but the other elements don’t have an image representation.

The use of a FilesDirectoryPath in HtmlSaveOptions seems to back this up as shapes and textboxes are ignored from this even when they are just “images” on the page.

Any suggestions?

Hi Dave,

Please try again with the latest BugFix version or the latest NuGet package and use the following code:

var document = DocumentModel.Load("input.docx");

int counter = 0;
foreach (var drawing in document.GetChildElements(true).OfType<DrawingElement>())
{
    if (drawing.ElementType == ElementType.PreservedDrawingElement)
        continue;

    ++counter;
    drawing.FormatDrawing().Save($"drawing_{counter}.png");
}

Does this solve your issue?

Regards,
Mario

Hi Mario,
I won’t get chance to look at this until tomorrow, unfortunately, but this looks promising.
Just a thought…If the .FormatDrawing().Save() method would support saving to a stream that would be perfect. I am uploading these files to cloud storage so having to create “temp” files locally, read them back in and upload them is just a bit annoying.

Update:
I get this error
One or more errors occurred. (Could not load file or assembly 'PresentationCore, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'. The system cannot find the file specified.)

When I execute this line:
using var ms = new MemoryStream();
source.FormatDrawing().Save(ms,new ImageSaveOptions(ImageSaveFormat.Png));

I suspect it’s because I’m not using WPF ?

Yes, you can save to stream.

The FormattedDrawingElement has the same Save overload methods as DocumentModel, except for the Save(XmlWriter, HtmlSaveOptions).

Yes, GemBox.Document currently requires WPF for saving to image formats so for now, you’ll need to enable WPF usage in your .NET Core application.

However, some Windows environments don’t support WPF, for instance, Azure Functions.
What type of application do you have?

Last as an FYI, in the current road-map, we plan to remove the WPF dependence for saving to image format and provide cross-platform support for it.

Regards,
Mario

Ah…OK…so I’m using asp.net core3.1 which I don’t think lets me use the WPF references?
Happy to be advised otherwise though :slight_smile:

Turns out it’s dead easy to make this work…well on my local environment anyway…
I’ve just added the “UseWPF” line into my csproj file…
<PropertyGroup>
<TargetFramework>netcoreapp3.1</TargetFramework>
<UseWPF>true</UseWPF>
</PropertyGroup>

To enable WPF on ASP.NET Core 3.1 application use this:

<Project Sdk="Microsoft.NET.Sdk.Web">

    <PropertyGroup>
        <TargetFramework>netcoreapp3.1</TargetFramework>
    </PropertyGroup>

    <ItemGroup>
        <PackageReference Include="GemBox.Document" Version="*" />
    </ItemGroup>

    <ItemGroup>
        <FrameworkReference Include="Microsoft.WindowsDesktop.App.Wpf" />
    </ItemGroup>

</Project>

Also, to avoid any issue with hosting on IIS, add the following compatibility switch:

public Startup(IConfiguration configuration)
{
    Configuration = configuration;
    // Add compatibility switch.
    AppContext.SetSwitch("Switch.System.Windows.Media.ShouldRenderEvenWhenNoDisplayDevicesAreAvailable", true);
}
1 Like

Note that the current latest version of GemBox.Document has cross-platform support for saving documents to images.
https://www.nuget.org/packages/GemBox.Document/
In other words, there is no longer a need to enable WPF when calling the DrawingElement.FormatDrawing() method.