If I do this in a console app:
static string ExtractTextFromPdf(string pdfPath)
{
using var ms = new MemoryStream(File.ReadAllBytes(pdfPath));
DocumentModel document = DocumentModel.Load(ms);
using var stream = new MemoryStream();
document.Save(stream, SaveOptions.TxtDefault);
return Encoding.UTF8.GetString(stream.GetBuffer(), 0, (int)stream.Length);
}
Works perfect but If I use it inside a asp.net controller like
public async Task<MemoryStream> ConverToTXT(IFormFile file)
{
using var ms = new MemoryStream();
await file.CopyToAsync(ms);
DocumentModel document = DocumentModel.Load(ms);
var stream = new MemoryStream();
document.Save(stream, SaveOptions.TxtDefault);
return stream;
}
I get an error (converting the same PDF file to TXT file)
No document file format was recognized from the stream
Any idea what’s wrong?
I want to add that I tried the same method converting docx and doc files to txt and works perfect, the problem is only opening a PDF when the file is coming from a IFormFile stream