Web lists-archives.com

Re: xsane & tesseract

Joe Pfeiffer wrote:
> I scanned the document to ppm files, sent them to tesseract, put the
> output of tesseract into a .txt file, and cleaned up from there.

You could try gimagereader, a frontend for tesseract, making this
process somewhat easier. Among others, it uses a spell checker, so
errors are easily recognizable.

If the resolution of the scanned image is (at least) 300 dpi, then my
findings are that text recognition with tesseract is very good.

There is also ocrmypdf, using tesseract, adding a text layer to a pdf
consisting of scanned documents, making the pdf searchable. Also works
very well.