Merge two text files
Posted: Thu Dec 10, 2020 3:15 pm
Hello,
I need to run a OCR on PDF Files. I want to use Tesseract. Tesseract need pictures so I build a Flow to convert PDF to JPEG and read them with OCR.
Tesseract spits out txt files. I then want to read in those text Files in my script and use RegEx to sort them by specific identifiers and finally upload them to my Database via API.
In the case where the PDF has multiple pages the "Enfocus PitStop Server PDF2Image"-Modul spits out 2 JPEGs with the Page Number attached to it like "documentname_1.pdf, documentname_2.pdf".
How can I make them one text-File? The Problem as I see it is that switch handles the files one by one.
How can I wait for all the txt-files of the main PDF-File.
I need to run a OCR on PDF Files. I want to use Tesseract. Tesseract need pictures so I build a Flow to convert PDF to JPEG and read them with OCR.
Tesseract spits out txt files. I then want to read in those text Files in my script and use RegEx to sort them by specific identifiers and finally upload them to my Database via API.
In the case where the PDF has multiple pages the "Enfocus PitStop Server PDF2Image"-Modul spits out 2 JPEGs with the Page Number attached to it like "documentname_1.pdf, documentname_2.pdf".
How can I make them one text-File? The Problem as I see it is that switch handles the files one by one.
How can I wait for all the txt-files of the main PDF-File.