Execute command - Tesseract

Post Reply
shalavin
Newbie
Posts: 3
Joined: Thu Sep 12, 2024 2:55 am

Execute command - Tesseract

Post by shalavin »

Hi Guys,

Still relatively new to Enfocus Switch, and I'm a bit stuck.

I'm trying to build a flow which uses Tesseract OCR tool (https://github.com/tesseract-ocr/tesseract) via the Execute Command element to scan incoming JPEGs, and extract the text to a ".txt" or ".doc" format to a specified location.

Image

However, I'm stuck because:
A)the specified "test.bat" file does not run when the job passes through the flow, and
B)I'm not sure which arguments to write for the Execute Command element, as the name of the input file is dynamic, and the name of the output file is also dynamic (.txt).

test.bat

Code: Select all

D:
cd "D:\Enfocus\Resources\Tesseract\Images"
tesseract ocrtest2_1.jpeg test
"ocrtest2_1.jpeg" is the input file, and "test" is the output file name which creates a "test.txt" in the same location with scanned text

Could anyone point me in the right direction for this? I've had a look at some historic forums but can't seem to work it out! Perhaps my logic for the flow is not correct also.
viewtopic.php?t=3812
viewtopic.php?t=4151

Thank you in advance! :D
jan_suhr
Advanced member
Posts: 687
Joined: Fri Nov 04, 2011 1:12 pm
Location: Nyköping, Sweden

Re: Execute command - Tesseract

Post by jan_suhr »

It's easier to read text directly from a PDF-file and there are many ways to do that.
Both PitStop and pdfToolbox can do that.
Jan Suhr
Color Consult AB
Sweden
=============
Check out my apps
User avatar
magnussandstrom
Advanced member
Posts: 510
Joined: Thu Jul 30, 2020 6:34 pm
Location: Sweden
Contact:

Re: Execute command - Tesseract

Post by magnussandstrom »

Here's an example Flow using Tesseract >JPG to PDF (text overlay).
Tesseract_example.zip
(28.57 KiB) Downloaded 700 times
User avatar
JimmyHartington
Advanced member
Posts: 453
Joined: Tue Mar 22, 2011 7:38 am

Re: Execute command - Tesseract

Post by JimmyHartington »

Are the pdfs scans of pages or natively generated pdfs from a computer program?
If it is natively generated pdfs you can extract the text with the program PDF Stripper from the Enfocus Appstore (https://www.enfocus.com/en/appstore/pro ... f-stripper).

If you need to process the JPGs I would also recommend the free program Run Command (https://www.enfocus.com/en/appstore/product/run-command)
I think it is easier when working with command-line programs.
With the setup below I got it to work.

Code: Select all

"C:\Program Files\Tesseract-OCR\tesseract.exe" %%InputFilePath%% [Job.NameProper]
Image
Off course you need to change the path the Tesseract executable to match you system.
Post Reply