Symptoms
You have scanned PDF files but are not able to search for text or copy/paste into a new application.
Cause
There is no text layer in the PDF. You should create a searchable PDF instead.
Resolution
In general, this is best done at the source, i.e. configure the scanner so that it performs OCR on the document during the scanning process.
Prerequisites:
- XPdf command line tools https://www.xpdfreader.com/download.html (choco install tesseract)
- Ghostscript command line https://ghostscript.com/releases/gsdnld.html (choco install ghostscript)
- Tesseract command line https://tesseract-ocr.github.io/tessdoc/Downloads.html (choco install tesseract)
If you already have PDF's and want to create searchable document then try the following example script. You will require Ghostscript and Tesseract to be installed on the computer first; you will also need to add their paths to the PATH environment variable or amend the script to give the location of the executables.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article