Dear Javlin team,
Currently, I’m trying to write a complex output XML file. For that, I first used XMLwriter to produce a basic XML file and then post-process it with XSL transformation in order to create the final XML output. It works fine for 10’000 records but then I encounter memory issues. I tried to increase JVM parameters but it’s not sufficient as I need to process millions of records.
Therefore, I’m looking for another solution, and I would like to be able to directly generate the correct XML file throught the Clover XMLwriter element (or another one).
I absolutely need the XML mapping functionality as I join two input elements in one and I also need to use different sources.
For example, I would like to do something like:
With 4 different sources (no chance to merge these 3 sources). But this is not a valid mapping and I don’t see a clean solution.
Any help is highly welcomed
Jerome Cuendet
Hello Jerome,
see OutOfMemoryError with XMLWriter topic. It could help you to solve the OutOfMemory as well as mapping problem.
Sounds a good solution as I really would like to have a sequential writing of the XML to avoid any memory issue and ensure scalability.
Thanks a lot
In fact, it is absolutely fine to use hashJoin component but my two input elements have a relation 1 to n.
Meaning that I don’t see a solution to write a transform function that produce an output such as:
<contract>
<id>1</id>
<name>toto</name>
<specs>
<spec>
<idspec>123</idspec>
<element>abc</element>
</spec>
<spec>
<idspec>456</idspec>
<element>def</element>
</spec>
</specs>
</contract>
Where contract elements (id, name) are coming from one input (1 record) and specs elements (idspec, element) are coming from another input (2 records).
Do you see a solution for that?
Thanks
I still encounter issue as I need to produce an XML output such as:
<contract>
<id>1</id>
<name>toto</name>
<specs>
<spec>
<idspec>1234</idspec>
<namespec>abcd</namespec>
</spec>
<spec>
<idspec>5678</idspec>
<namespec>efgh</namespec>
</spec>
</specs>
</contract>
From two different inputs:
- contract with one record (id, name)
- specs with two row (idspecs, namespec and id as the foreign key)
How can I manage that with hashJoin or other components?
Thanks
Hello,
you have to con-cat the attributes with the same id before joining by Denormalize component:
<?xml version="1.0" encoding="UTF-8"?>
<Graph author="avackova" created="Thu Jun 11 13:46:15 CEST 2009" guiVersion="0.0.0.devel" id="1244721135669" licenseType="Evaluation Devel" modified="Mon Apr 12 13:06:15 CEST 2010" modifiedBy="avackova" name="xmlJoin" revision="1.48">
<Global>
<Metadata id="Metadata2">
<Record fieldDelimiter="|" name="attributes" recordDelimiter="\n" type="delimited">
<Field name="id" type="integer"/>
<Field name="idspec" type="integer"/>
<Field name="namespec" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata0">
<Record fieldDelimiter="|" name="master" recordDelimiter="\n" type="delimited">
<Field name="id" type="integer"/>
<Field name="name" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata3">
<Record fieldDelimiter="|" name="slave" recordDelimiter="\n" type="delimited">
<Field name="id" type="integer"/>
<Field name="output" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata1">
<Record fieldDelimiter="|" name="xml" recordDelimiter="\n" type="delimited">
<Field name="field1" type="string"/>
</Record>
</Metadata>
<Property fileURL="workspace.prm" id="GraphParameter0"/>
</Global>
<Phase number="0">
<Node id="DATA_GENERATOR0" recordsNumber="1" type="DATA_GENERATOR">
<attr name="generate"><![CDATA[//#TL
// Generates output record.
function generate() {
$0.id := 1;
$0.name := 'toto';
}
// Called during component initialization.
// function init() {}
// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node id="DATA_GENERATOR1" recordsNumber="2" type="DATA_GENERATOR">
<attr name="generate"><![CDATA[//#TL
int counter = -1;
// Generates output record.
function generate() {
counter++;
$0.id := '1';
$0.idspec := iif(counter == 0, 1234, 5678);
$0.namespec := iif(counter == 0, 'abcd', 'efgh');
}
// Called during component initialization.
// function init() {}
// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node id="DENORMALIZER0" key="id" type="DENORMALIZER">
<attr name="denormalize"><![CDATA[//#TL
string output = "";
// This transformation defines the way in which multiple input records
// (with the same key) are denormalized into one output record.
// This function is called for each input record from a group of records
// with the same key.
function append() {
output = output + "\t<spec>\n\t<idspec>" + $idspec + "</idspec>\n\t<namespec>" + $namespec + "</namespec>\n\t</spec>\n";
}
// This function is called once after the append() function was called for all records
// of a group of input records defined by the key.
// It creates a single output record for the whole group.
function transform() {
$0.id := $0.id;
$0.output := output;
}
// Called after transform() to return the resources that have been used to their initial state
// so that next group of records with different key may be parsed.
function clean() {
output = "";
}
// Called to return a user-defined error message when an error occurs.
// function getMessage() {}
// Called during component initialization.
// function init() {}
// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node id="EXT_HASH_JOIN0" joinKey="$id=$id" slaveDuplicates="true" type="EXT_HASH_JOIN">
<attr name="transform"><![CDATA[//#TL
// Transforms input record into output record.
function transform() {
$0.field1 := "<contract>\\n<id>" + $id + "</id>\\n<name>" + $name + "</name>" + $1.output + "\\n</specs>\\n</contract>";
}
// Called during component initialization.
// function init() {}
// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node id="TRASH0" type="TRASH"/>
<Edge fromNode="DATA_GENERATOR0:0" id="Edge0" inPort="Port 0 (driver)" metadata="Metadata0" outPort="Port 0 (out)" toNode="EXT_HASH_JOIN0:0"/>
<Edge fromNode="DATA_GENERATOR1:0" id="Edge1" inPort="Port 0 (in)" metadata="Metadata2" outPort="Port 0 (out)" toNode="DENORMALIZER0:0"/>
<Edge fromNode="DENORMALIZER0:0" id="Edge4" inPort="Port 1 (slave)" metadata="Metadata3" outPort="Port 0 (out)" toNode="EXT_HASH_JOIN0:1"/>
<Edge debugMode="true" fromNode="EXT_HASH_JOIN0:0" id="Edge3" inPort="Port 0 (in)" metadata="Metadata1" outPort="Port 0 (out)" toNode="TRASH0:0"/>
</Phase>
</Graph>
Fantastic
It works well, I used two Denormalizers components (as my structure is a bit more complex than the example I posted), one hashJoin and a structuredDataWriter for final XML.
Thanks a lot for your help