Routing jobs based on lookup from data file

jstotz · Post by **jstotz** » Tue Oct 31, 2017 9:02 pm

I currently have a flow that routes jobs(PDF files) based on part of the file name(job.name). They all start in one input folder and are routed to one of 9 different output folders. Each connector is using a "condition with variables defined" that contains multiple conditions OR'd together in the form of:
[job.name] contains STRING1
OR [job.name] contains STRING2
OR [job.name] contains STRING3
etc.

A more real world example would be:
Output folders:
Mammal
Fish
Bird

Input files:
Dog.pdf
Salmon.pdf
Cat.pdf
Robin.pdf
Starling.pdf
Flounder.pdf
Cow.pdf
Bass.pdf
Horse.pdf

Since I have 9 output folders and about 100 strings distributed among them, it is getting hard to maintain. The list of strings keeps growing. Every week I add one or two new strings to look for.

I want to redo the flow in a way that would make it easier to maintain by having one central list of strings. I was thinking of making a text file that would be used to determine which file went to each output folder. I figured that the file would be a list of key/value pairs (string, output folder) that could be read by a script and used to route the files.

The script might look something like this: (not real code)

Method A
read all the lines from the data file into an array
for each element in array
if jobname.contains( element.key ) then
output = element.value
exit loop
end if
end

sendto( output )

Method B
Data file is grouped by output folder. Could be a separate data file for each folder. Would just have strings, no output values for each line.

read all the lines from the data file(s) but put each group into a separate array for each output folder
array1
array2
array3

str = extract the part of the file name that will determine the output folder (for example 4 chars after the first underscore)
For each array
if array.contains( str ) then
output = array#
exit loop
end if
end

sendto( output )

Method B is probably more efficient because it has less array elements to check, however the problem with both methods is that it has to re-read the data file for each PDF that comes through the flow.

Is there a way to read the data file into the flow once when the flow is started and have that info in memory all the time?

Is there a better approach?

Thanks.

loicaigon · Post by **loicaigon** » Wed Nov 01, 2017 11:45 pm

By using Regular Expressions, you could already narrow the scope and limit outgoing connections.

In your demo case, you could have only 3/4 outgoing connections (think of Mammals, Fishes, Birds & unknown). Then with regular expressions you can filter files based on their names.
For exemple, the Mammals incoming connection would be set such as :
(Cat|Dog|Horse|Cow)\.pdf

But if you prefer rely on an external file, you will have to use a script expression (which requires the scripting module):

Code: Select all

var f = new File ( "/Users/ozalto/Desktop/mammals.txt" );
var c = File.read("/Users/ozalto/Desktop/mammals.txt", "UTF-8");
var names = c.split("\n");
var reg  = new RegExp("("+names.join("|")+")");
var jobName = job.getNameProper();
jobName.match ( reg );
reg.matchedLength > 0;

The later has the advantage that you can append the txt file outside of Switch and have it use it syncd.

Enfocus Community

Routing jobs based on lookup from data file

Routing jobs based on lookup from data file

Re: Routing jobs based on lookup from data file