Page 1 of 1

Execute command - Tesseract

Posted: Wed Feb 19, 2025 11:47 pm
by shalavin
Hi Guys,

Still relatively new to Enfocus Switch, and I'm a bit stuck.

I'm trying to build a flow which uses Tesseract OCR tool (https://github.com/tesseract-ocr/tesseract) via the Execute Command element to scan incoming JPEGs, and extract the text to a ".txt" or ".doc" format to a specified location.

Image

However, I'm stuck because:
A)the specified "test.bat" file does not run when the job passes through the flow, and
B)I'm not sure which arguments to write for the Execute Command element, as the name of the input file is dynamic, and the name of the output file is also dynamic (.txt).

test.bat

Code: Select all

D:
cd "D:\Enfocus\Resources\Tesseract\Images"
tesseract ocrtest2_1.jpeg test
"ocrtest2_1.jpeg" is the input file, and "test" is the output file name which creates a "test.txt" in the same location with scanned text

Could anyone point me in the right direction for this? I've had a look at some historic forums but can't seem to work it out! Perhaps my logic for the flow is not correct also.
viewtopic.php?t=3812
viewtopic.php?t=4151

Thank you in advance! :D

Re: Execute command - Tesseract

Posted: Thu Feb 20, 2025 7:49 am
by jan_suhr
It's easier to read text directly from a PDF-file and there are many ways to do that.
Both PitStop and pdfToolbox can do that.

Re: Execute command - Tesseract

Posted: Thu Feb 20, 2025 8:35 am
by magnussandstrom
Here's an example Flow using Tesseract >JPG to PDF (text overlay).
Tesseract_example.zip
(28.57 KiB) Downloaded 700 times

Re: Execute command - Tesseract

Posted: Thu Feb 20, 2025 8:42 am
by JimmyHartington
Are the pdfs scans of pages or natively generated pdfs from a computer program?
If it is natively generated pdfs you can extract the text with the program PDF Stripper from the Enfocus Appstore (https://www.enfocus.com/en/appstore/pro ... f-stripper).

If you need to process the JPGs I would also recommend the free program Run Command (https://www.enfocus.com/en/appstore/product/run-command)
I think it is easier when working with command-line programs.
With the setup below I got it to work.

Code: Select all

"C:\Program Files\Tesseract-OCR\tesseract.exe" %%InputFilePath%% [Job.NameProper]
Image
Off course you need to change the path the Tesseract executable to match you system.