Hello Mike,
Reading any XML file is much more faster with XMLExtract or XMLXPath Reader.
However, if you need to split the original XML file into more subXML files, you can do this as described above:
For example, if you have a file with the following structure:
<root>
<node>
<node1>
<node2>
<foo>bar1</foo>
</node2>
</node1>
</node>
<node>
<node1>
<node2>
<foo>bar2</foo>
</node2>
</node1>
</node>
<node>
<node1>
<node2>
<foo>bar3</foo>
</node2>
</node1>
</node>
<node>
<node1>
<node2>
<foo>bar4</foo>
</node2>
</node1>
</node>
</root>
Create the following graph:
UniversalDataReader → XSLTransformer → UniversalDataWriter.
First metadata has no other delimiter except “EOF as delimiter”.
XSLTransformer will be the following transformation code:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node">
<xsl:copy-of select="*"/>
</xsl:template>
</xsl:stylesheet>
The second metadata will have only record delimiter. (Delete the default delimiter). The record delimiter will be:
UniversalDataWriter will be set to Records per file to 1.
File URL in UniversalDataWriter will be ${DATATMP_DIR}/yoursubxmlfile$$$.xml (number of wild cards should satisfy to the number of files created).
The resulting files will be yoursubxmlfile000.xml to yoursubxmlfile050, for example. Maybe the last one contains only the delimiter (). Thus, delete the last file if necessary.
Then you can use XMLExtract or XMLXPathReader where File URL is ${DATATMP_DIR}/yoursubxml*.xml.
This way, all these input files will be read one after another and the number of your output edges will be smaller than if the original XML file was read.
But remember, this will be slower than if you read the original XML file.
Best regards,
Tomas Waller