Routing jobs based on lookup from data file
Posted: Tue Oct 31, 2017 9:02 pm
I currently have a flow that routes jobs(PDF files) based on part of the file name(job.name). They all start in one input folder and are routed to one of 9 different output folders. Each connector is using a "condition with variables defined" that contains multiple conditions OR'd together in the form of:
[job.name] contains STRING1
OR [job.name] contains STRING2
OR [job.name] contains STRING3
etc.
A more real world example would be:
Output folders:
Mammal
Fish
Bird
Input files:
Dog.pdf
Salmon.pdf
Cat.pdf
Robin.pdf
Starling.pdf
Flounder.pdf
Cow.pdf
Bass.pdf
Horse.pdf
Since I have 9 output folders and about 100 strings distributed among them, it is getting hard to maintain. The list of strings keeps growing. Every week I add one or two new strings to look for.
I want to redo the flow in a way that would make it easier to maintain by having one central list of strings. I was thinking of making a text file that would be used to determine which file went to each output folder. I figured that the file would be a list of key/value pairs (string, output folder) that could be read by a script and used to route the files.
The script might look something like this: (not real code)
Method A
read all the lines from the data file into an array
for each element in array
if jobname.contains( element.key ) then
output = element.value
exit loop
end if
end
sendto( output )
Method B
Data file is grouped by output folder. Could be a separate data file for each folder. Would just have strings, no output values for each line.
read all the lines from the data file(s) but put each group into a separate array for each output folder
array1
array2
array3
str = extract the part of the file name that will determine the output folder (for example 4 chars after the first underscore)
For each array
if array.contains( str ) then
output = array#
exit loop
end if
end
sendto( output )
Method B is probably more efficient because it has less array elements to check, however the problem with both methods is that it has to re-read the data file for each PDF that comes through the flow.
Is there a way to read the data file into the flow once when the flow is started and have that info in memory all the time?
Is there a better approach?
Thanks.
[job.name] contains STRING1
OR [job.name] contains STRING2
OR [job.name] contains STRING3
etc.
A more real world example would be:
Output folders:
Mammal
Fish
Bird
Input files:
Dog.pdf
Salmon.pdf
Cat.pdf
Robin.pdf
Starling.pdf
Flounder.pdf
Cow.pdf
Bass.pdf
Horse.pdf
Since I have 9 output folders and about 100 strings distributed among them, it is getting hard to maintain. The list of strings keeps growing. Every week I add one or two new strings to look for.
I want to redo the flow in a way that would make it easier to maintain by having one central list of strings. I was thinking of making a text file that would be used to determine which file went to each output folder. I figured that the file would be a list of key/value pairs (string, output folder) that could be read by a script and used to route the files.
The script might look something like this: (not real code)
Method A
read all the lines from the data file into an array
for each element in array
if jobname.contains( element.key ) then
output = element.value
exit loop
end if
end
sendto( output )
Method B
Data file is grouped by output folder. Could be a separate data file for each folder. Would just have strings, no output values for each line.
read all the lines from the data file(s) but put each group into a separate array for each output folder
array1
array2
array3
str = extract the part of the file name that will determine the output folder (for example 4 chars after the first underscore)
For each array
if array.contains( str ) then
output = array#
exit loop
end if
end
sendto( output )
Method B is probably more efficient because it has less array elements to check, however the problem with both methods is that it has to re-read the data file for each PDF that comes through the flow.
Is there a way to read the data file into the flow once when the flow is started and have that info in memory all the time?
Is there a better approach?
Thanks.