Split XML based on field value

PdFUser5000 · Post by **PdFUser5000** » Wed Jan 12, 2022 4:02 pm

My XML looks like this:

<?xml version="1.0" encoding="ISO-8859-13"?>
<orderinfo>
	<customerID>6070426</customerID>
	<customerPath></customerPath>
	<orderID>XO0016184</orderID>
	<model>
	    <modelNo>M800</modelNo>
	    <modelName>Pants</modelName>
	    <designID>7778888</designID>
	    <orderedSizes>M,L,XL</orderedSizes>
	    <sizeAmount>3</sizeAmount>
	    <DPauthor>email@email.com</DPauthor>
	    <PPauthor>email@email.com</PPauthor>
	    <modelUpdate>2021-01-13T00:00:00.000</modelUpdate>
	    <order>new order</order>
	    <confirmPP>no confirm</confirmPP>
	</model>
</orderinfo>

Is it possible to split the XML using <orderedSizes> values split by commas? In the example there are 3 sizes ordered, and i would like the xmls being created be named:

<modelNo> + <designID> + first value of orderedSizes array? (M)
<modelNo> + <designID> + second value of orderedSize sarray? (L)
etc

I have never touched XSL transforming, so maybe somebody can share some material or tips?

freddyp · Post by **freddyp** » Wed Jan 12, 2022 6:19 pm

There is a tokenize function in XSLT2.0 which can be used to solve this, but you need a black belt in XSLT. I only have a green belt

Do you have control over the structure of the XML because this is not the correct way of putting information into an XML? The proper way of structuring this in XML is:

Code: Select all

<orderedSizes>
  <orderedSize>M</orderedSize>
  <orderedSize>L</orderedSize>
  <orderedSize>XL</orderedSize>
</orderedSizes>

Then it is much easier to create the XSL.

If it is not possible to change the XML structure, then you could try this somewhat convoluted method that does not require XSLT (it is an idea, it is not tested!):
- use the StringSplitter app to split the string of that node. This will add private data to the job with the value of each part and also the number of parts (check the documentation for the correct names)
- duplicate the job with the Job Repeater app. The number of times is available in the private data from StringSplitter.
- create as many connections as the maximum number of values in that node and it the nth part as detected by StringSplitter is not empty the job goes to an instance of the String Replace app where the node value is replaced by the the nth part.
- rename the job with all the available info

Good luck and let us know.

PdFUser5000 · Post by **PdFUser5000** » Fri Jan 14, 2022 2:12 pm

freddyp wrote: ↑Wed Jan 12, 2022 6:19 pm There is a tokenize function in XSLT2.0 which can be used to solve this, but you need a black belt in XSLT. I only have a green belt

Do you have control over the structure of the XML because this is not the correct way of putting information into an XML? The proper way of structuring this in XML is:
Code: Select all
<orderedSizes>
  <orderedSize>M</orderedSize>
  <orderedSize>L</orderedSize>
  <orderedSize>XL</orderedSize>
</orderedSizes>
Then it is much easier to create the XSL.

If it is not possible to change the XML structure, then you could try this somewhat convoluted method that does not require XSLT (it is an idea, it is not tested!):
- use the StringSplitter app to split the string of that node. This will add private data to the job with the value of each part and also the number of parts (check the documentation for the correct names)
- duplicate the job with the Job Repeater app. The number of times is available in the private data from StringSplitter.
- create as many connections as the maximum number of values in that node and it the nth part as detected by StringSplitter is not empty the job goes to an instance of the String Replace app where the node value is replaced by the the nth part.
- rename the job with all the available info

Good luck and let us know.

I managed to change the structure. Now the file looks like this:

Code: Select all

<?xml version="1.0" encoding="ISO-8859-13"?>
<orderinfo>
	<customerID>6070426</customerID>
	<customerPath></customerPath>
	<orderID>XO0016184</orderID>
	<model>
	    <modelNo>M800</modelNo>
	    <modelName>Pants</modelName>
	    <designID>7778888</designID>
	    <orderedSizes>
            <orderedSize>S</orderedSize>
            <orderedSize>M</orderedSize>
            <orderedSize>L</orderedSize>
            <orderedSize>XL</orderedSize>
            <orderedSize>XXL</orderedSize>
            <orderedSize>3XL</orderedSize>
            </orderedSizes>
	    <sizeAmount>3</sizeAmount>
	    <DPauthor>email@email.com</DPauthor>
	    <PPauthor>email@email.com</PPauthor>
	    <modelUpdate>2021-01-13T00:00:00.000</modelUpdate>
	    <order>new order</order>
	    <confirmPP>no confirm</confirmPP>
	</model>
</orderinfo>

I used this code for splitting the xml

Code: Select all

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <!-- Group Items by Title -->
        <xsl:for-each-group select="/orderinfo/model/orderedSizes" group-by="orderedSize">
            <!-- Write each group to a file with the title in the name -->
            <xsl:result-document href="{current-grouping-key()}.xml">
                <orderinfo>
                <model>
                        <modelNo><xsl:value-of select="/orderinfo/model/modelNo"/> </modelNo>   
                        <designID><xsl:value-of select="/orderinfo/model/designID"/> </designID>
                        <orderedSizes>
                    <xsl:copy-of select="current-grouping-key()"/>
                     </orderedSizes>
                </model>
                </orderinfo>
            </xsl:result-document>
        </xsl:for-each-group>
    </xsl:template>
</xsl:stylesheet>

What my grand goal would be is to check which files exist from <orderedSizes>, remove those from the main xml and then send only the missing ones to printing. For this i would need to merge the xml again after eliminating the existing sizes from <orderedSizes>. Although im not sure if this is a reasonable thing to do if it takes too much time for processing.

freddyp · Post by **freddyp** » Mon Jan 17, 2022 10:23 am

You want to create an XML for each orderSize, so it is not correct to work with orderedSizes as a group. You have to address the individual orderedSize elements.

Code: Select all

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:for-each select="/orderinfo/model/orderedSizes/orderedSize">
            <!-- Write each ordered size to a file with the title in the name -->
            <xsl:result-document href="items-{current()}.xml">
                <Items>
                    <xsl:copy-of select="current()"/>
                </Items>
            </xsl:result-document>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

PdFUser5000 · Post by **PdFUser5000** » Mon Jan 17, 2022 3:03 pm

freddyp wrote: ↑Mon Jan 17, 2022 10:23 am You want to create an XML for each orderSize, so it is not correct to work with orderedSizes as a group. You have to address the individual orderedSize elements.

Code: Select all

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:for-each select="/orderinfo/model/orderedSizes/orderedSize">
            <!-- Write each ordered size to a file with the title in the name -->
            <xsl:result-document href="items-{current()}.xml">
                <Items>
                    <xsl:copy-of select="current()"/>
                </Items>
            </xsl:result-document>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Thanks!

My last move would be to re combine the files to a single xml again so that orderedSizes would be together like in the beginning:

Code: Select all

<orderedSizes>
  <orderedSize>M</orderedSize>
  <orderedSize>L</orderedSize>
  <orderedSize>XL</orderedSize>
</orderedSizes>

Do i have to create a job folder for this? Also how hard it is to combine xmls if the amount of files can be different each time?

freddyp · Post by **freddyp** » Mon Jan 17, 2022 4:26 pm

What my grand goal would be is to check which files exist from <orderedSizes>, remove those from the main xml and then send only the missing ones to printing. For this i would need to merge the xml again after eliminating the existing sizes from <orderedSizes>.

I am sure it is not, but this sounds very strange to me. First of all, how can S or L be a file? It must be a combination of different pieces from the XML that makes up a path to a file that could exist. What baffles me completely is that you want to print the missing files. How can you print a missing file?

My last move would be to re combine the files to a single xml again so that orderedSizes would be together like in the beginning

You do not mention it explicitly but based on what you wrote in the first quote I assume you only want the missing "files" to remain in the XML. Is this really necessary? Is it not enough to continue working with the individual files?

Anyhow, I would script it, but here is an attempt to describe a solution that does not require a script:
- chain the Scan hierarchy app as many times as the maximum number of orderedSize nodes. In the variable for the filename use the XPath that ends in orderedSize[1] for the first instance, orderedSize[2] for the second, etc.
- when the file is found place that XPath in a piece of private data and call it e.g. orderedSize1, orderedSize2, etc., when the file is not found do not do anything
- at the end you use the XML Magic app to remove the unwanted nodes. There will be as many lines as the maximum number of orderedSize nodes and on each line you use one of the private data key as the XPath. When the node has to be removed private data key orderedSize3 for example will contain a valid XPath and it will get removed and when the private data is empty because it does not exist I assume XML Magic will do nothing (to be tested!).
IMPORTANT: the removal of the nodes has to be done in reverse order! Otherwise the order gets mixed up: when you remove orderedSize[2] then orderedSize[5] becomes [4] and if you then remove [5] you are removing the [6] from the original.

laurentd · Post by **laurentd** » Mon Jan 17, 2022 4:55 pm

When you have the order sizes like this in your input file (or after some transformation, which is the case)
<orderedSizes>
<orderedSize>M</orderedSize>
<orderedSize>L</orderedSize>
<orderedSize>XL</orderedSize>
</orderedSizes>

Then you can use the Variable XPath Repeater app to loop into each orderedSize without having to split or recombine the xml file.
https://www.enfocus.com/en/appstore/pro ... able-xpath

PdFUser5000 · Post by **PdFUser5000** » Tue Jan 18, 2022 8:42 am

freddyp wrote: ↑Mon Jan 17, 2022 4:26 pm
What my grand goal would be is to check which files exist from <orderedSizes>, remove those from the main xml and then send only the missing ones to printing. For this i would need to merge the xml again after eliminating the existing sizes from <orderedSizes>.
I am sure it is not, but this sounds very strange to me. First of all, how can S or L be a file? It must be a combination of different pieces from the XML that makes up a path to a file that could exist. What baffles me completely is that you want to print the missing files. How can you print a missing file?
My last move would be to re combine the files to a single xml again so that orderedSizes would be together like in the beginning
You do not mention it explicitly but based on what you wrote in the first quote I assume you only want the missing "files" to remain in the XML. Is this really necessary? Is it not enough to continue working with the individual files?

Anyhow, I would script it, but here is an attempt to describe a solution that does not require a script:
- chain the Scan hierarchy app as many times as the maximum number of orderedSize nodes. In the variable for the filename use the XPath that ends in orderedSize[1] for the first instance, orderedSize[2] for the second, etc.
- when the file is found place that XPath in a piece of private data and call it e.g. orderedSize1, orderedSize2, etc., when the file is not found do not do anything
- at the end you use the XML Magic app to remove the unwanted nodes. There will be as many lines as the maximum number of orderedSize nodes and on each line you use one of the private data key as the XPath. When the node has to be removed private data key orderedSize3 for example will contain a valid XPath and it will get removed and when the private data is empty because it does not exist I assume XML Magic will do nothing (to be tested!).
IMPORTANT: the removal of the nodes has to be done in reverse order! Otherwise the order gets mixed up: when you remove orderedSize[2] then orderedSize[5] becomes [4] and if you then remove [5] you are removing the [6] from the original.

I have one Illustrator file in the server that has all the OrderedSize sizes in it. User sends a xml with selected sizes from illustrator to Switch. Switch then takes the Illustrator file from the server and creates separate PDF files for each size via Illustrator element using a script. On some occasions, some PDFs have already been made in the past and there is no need to do them again. For this would like to check if any sizes exists beforehand and remove them from the OrderedSizes node, and then use the Illustrator element to only print the missing size files.

Since i am not very good at scripting, i thought there maybe is a way to check the files without a script.

freddyp · Post by **freddyp** » Tue Jan 18, 2022 8:45 am

The way to check the existence of a file is with the app Scan hierarchy as in my description of a possible solution that does not involve scripting.

Enfocus Community

Split XML based on field value

Split XML based on field value

Re: Split XML based on field value

Re: Split XML based on field value

Re: Split XML based on field value

Re: Split XML based on field value

Re: Split XML based on field value

Re: Split XML based on field value

Re: Split XML based on field value

Re: Split XML based on field value