Determine files missing from a group

Post Reply
RunDontStop
Member
Posts: 67
Joined: Mon Apr 05, 2021 8:03 pm

Determine files missing from a group

Post by RunDontStop »

I have a flow that uses Split PDF to create single page PDFs. These go through a third party app to be processed (I use the "Generic Application" element for this). Once processed the single page PDFs are regrouped and then I merge them into a single PDF again.

There are occasions where one (or more) of the single page PDFs fail to process in the third party app.

For example, a 20 page PDF is submitted to the flow. I end up with 20 single page PDFs. Page 18 fails to process. The other 19 wait for regrouping. The orphan time out passes and I end up with a 19 page PDF when all is done.

So I am trying to think of a quick and easy way to know which page failed. My initial idea is extremely cumbersome. I will detail it below and hopefully someone will suggest something more efficient.

1. The multi-page PDF starts in the flow. I capture the unique ID and save it as private data tag. I also save the page count as a private data tag.
2. After split PDF, my single pages will continue to the third party app. But they will also continue into a separate repository, just a temporary holding location. Their names will include the unique ID that I earlier captured as a private data tag.
3. After the single pages are processed by the third party app, they will continue on to be regrouped. But, they will also continue on a different path, to inject the PDFs that were earlier placed in the repository. So.. if any of these inject the repository PDFs, all is well, after all those pages were successfully processed. After injection, no reason to keep these, they can be recycled.
5. Meanwhile, the processed single page PDFs have been regrouped, then merged back into a multi page PDF. If the page count of this matches the page count I originally captured as private data, all is well, it means I have all my processed pages in the resulting multi page PDF
6. But if the page count does not match, the fun begins. I need my multi page to inject any pages that failed to process. These would still be in the repository, having never had a processed single page PDF to inject them. I could use Inject Wildcard to inject my failures, in the event that more than one page failed to process.

Anyway if you follow, this would work, I would have my incomplete multi-page PDF, and also my single page failures. But maybe a quicker way to achieve this end result.
freddyp
Advanced member
Posts: 1157
Joined: Thu Feb 09, 2012 3:53 pm

Re: Determine files missing from a group

Post by freddyp »

If I interpret this correctly you assemble the original pages if the third-party application fails to process a page, right?

You use "Generic application" so the third-party application uses hot folders. I assume it has an error output folder where the original file is placed (if it does not, shame on the third-party application). Add that folder to the output of "Generic application" and you can just continue with the original file and all will be merged. You may want to do something extra (send a mail, add something inside the PDF, ...) to alert you of the fact that a certain page has not been processed by the third-party application.

Side remark: there is no need to add private to remember the unique ID and the number of files, "Ungroup job" does that automatically. Check the documentation.
chris@clvisual.com
Newbie
Posts: 1
Joined: Sun Jul 02, 2023 1:37 pm

Re: Determine files missing from a group

Post by chris@clvisual.com »

I am very new to switch but in the first workflow we did, we ungrouped folders into single pdf's and then reassembled the folder. We set a variable that said it needs all the files complete before reassembling folder. Maybe same method can be used in the split PDF as well.
freddyp
Advanced member
Posts: 1157
Joined: Thu Feb 09, 2012 3:53 pm

Re: Determine files missing from a group

Post by freddyp »

To reassemble an ungrouped job you can use "Scheme - Ungrouped job" in "Assemble job" and then you do not have to create any variables (private data), "Ungroup job" does that for you (see the documentation).

The "Split PDF pages" app takes a different approach: instead of outputting a job folder it outputs individual job files and adds the same private data as "Ungroup job" so you can also recombine the job files into a folder with "Assemble job" by using the ungroup job scheme.
oiledsociable
Newbie
Posts: 1
Joined: Wed Jan 17, 2024 10:17 am

Re: Determine files missing from a group

Post by oiledsociable »

It's quite new to be able to split and merge PDF files through a third party. I don't know if the saved data is secure or not. Normally the processing certainly takes time. "Scheme - Ungrouped job" in "Assemble job" is really good suggestion to do such advanced conversion.
I'm a new Orlo member from Louisvillerun 3, KY
rhd_ole
Member
Posts: 149
Joined: Mon Jan 24, 2022 5:36 pm

Re: Determine files missing from a group

Post by rhd_ole »

We had a similar issue when we were splitting PDF by tray for mailing and the file were imposed outside of switch, it could be 100's of files waiting as part of the job that as split to assemble back.

We would group the split pdfs (Split by a script), then ungroup to get the PD key of the group and use the Ungroup.NumFiles to add as a Suffix '_Parts[Ungroup.NumFiles]'. The files would have the range of the split in them long with the number of files suffix, then would then leave switch for a 3rd party imposition.

File name example : 12345_test_00001-12009_Parts2.pdf ; 12345_test_12010-14000_Parts2.pdf

As the files came back into to switch I used the after "_Parts" as the assemble number and use the job number prefix we added the name of the files as the Job Identifier.

Once the condition was met they would assemble and we remove the 'Parts" and page range info and off it goes to press.

If they didn't get all the parts, it would go to the fail connection I would capture files that output and email them so it was easy to see what part of the group was missing.
Screenshot 2024-01-22 at 5.42.45 AM.png
Screenshot 2024-01-22 at 5.42.45 AM.png (177.16 KiB) Viewed 9901 times
Color Science & Workflow Automation
laurawoods
Newbie
Posts: 1
Joined: Fri Jan 26, 2024 7:48 am

Re: Determine files missing from a group

Post by laurawoods »

Yes, indeed! You can capture the unique ID and page count as private data tags, then use conditional logic and comments during the workflow to track successful and failed processing of individual pages, simplifying the identification of failures and minimizing the need for complex repositories and injections. Finally, validate the page count before merging to easily identify and handle any discrepancies.
sarahlison
Newbie
Posts: 1
Joined: Fri Apr 12, 2024 6:12 am

Re: Determine files missing from a group

Post by sarahlison »

freddyp wrote: Tue Jul 25, 2023 8:52 am To reassemble an ungrouped job you can use "Scheme - Ungrouped job" in "Assemble job" and then you do not have to create any variables (private data), "Ungroup job" does that for you (see the documentation).

The "Split PDF pages" app takes a different approach: instead of outputting a job folder it outputs individual job files and adds the same private data as "Ungroup job" so you can also recombine the job files into a folder with "Assemble job" by using the ungroup job scheme.
I also do it this way. It's not too complicated but everything is handled correctly. Overall assessment is that it is more optimal than all.
DylanWalker
Newbie
Posts: 1
Joined: Mon Apr 22, 2024 9:41 am
Location: united states
Contact:

Re: Determine files missing from a group

Post by DylanWalker »

Thank you both. I was having the same problem and didn't know how to solve it until I read this article.
Post Reply