Writing complex XML output

Dear Javlin team,

Currently, I’m trying to write a complex output XML file. For that, I first used XMLwriter to produce a basic XML file and then post-process it with XSL transformation in order to create the final XML output. It works fine for 10’000 records but then I encounter memory issues. I tried to increase JVM parameters but it’s not sufficient as I need to process millions of records.

Therefore, I’m looking for another solution, and I would like to be able to directly generate the correct XML file throught the Clover XMLwriter element (or another one).
I absolutely need the XML mapping functionality as I join two input elements in one and I also need to use different sources.

For example, I would like to do something like:

With 4 different sources (no chance to merge these 3 sources). But this is not a valid mapping and I don’t see a clean solution.

Any help is highly welcomed

Jerome Cuendet

Hello Jerome,
see OutOfMemoryError with XMLWriter topic. It could help you to solve the OutOfMemory as well as mapping problem.

Sounds a good solution as I really would like to have a sequential writing of the XML to avoid any memory issue and ensure scalability.
Thanks a lot

In fact, it is absolutely fine to use hashJoin component but my two input elements have a relation 1 to n.
Meaning that I don’t see a solution to write a transform function that produce an output such as:


<contract>
  <id>1</id>
  <name>toto</name>
  <specs>
    <spec>
       <idspec>123</idspec>
       <element>abc</element>
    </spec>
    <spec>
       <idspec>456</idspec>
       <element>def</element>
    </spec>
   </specs>
</contract>

Where contract elements (id, name) are coming from one input (1 record) and specs elements (idspec, element) are coming from another input (2 records).
Do you see a solution for that?
Thanks

I still encounter issue as I need to produce an XML output such as:


<contract>
 <id>1</id>
 <name>toto</name>
 <specs>
  <spec>
   <idspec>1234</idspec>
   <namespec>abcd</namespec>
  </spec>
  <spec>
   <idspec>5678</idspec>
   <namespec>efgh</namespec>
  </spec>
 </specs>
</contract>

From two different inputs:
- contract with one record (id, name)
- specs with two row (idspecs, namespec and id as the foreign key)
How can I manage that with hashJoin or other components?
Thanks

Hello,
you have to con-cat the attributes with the same id before joining by Denormalize component:

<?xml version="1.0" encoding="UTF-8"?>
<Graph author="avackova" created="Thu Jun 11 13:46:15 CEST 2009" guiVersion="0.0.0.devel" id="1244721135669" licenseType="Evaluation Devel" modified="Mon Apr 12 13:06:15 CEST 2010" modifiedBy="avackova" name="xmlJoin" revision="1.48">
<Global>
<Metadata id="Metadata2">
<Record fieldDelimiter="|" name="attributes" recordDelimiter="\n" type="delimited">
<Field name="id" type="integer"/>
<Field name="idspec" type="integer"/>
<Field name="namespec" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata0">
<Record fieldDelimiter="|" name="master" recordDelimiter="\n" type="delimited">
<Field name="id" type="integer"/>
<Field name="name" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata3">
<Record fieldDelimiter="|" name="slave" recordDelimiter="\n" type="delimited">
<Field name="id" type="integer"/>
<Field name="output" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata1">
<Record fieldDelimiter="|" name="xml" recordDelimiter="\n" type="delimited">
<Field name="field1" type="string"/>
</Record>
</Metadata>
<Property fileURL="workspace.prm" id="GraphParameter0"/>
</Global>
<Phase number="0">
<Node id="DATA_GENERATOR0" recordsNumber="1" type="DATA_GENERATOR">
<attr name="generate"><![CDATA[//#TL

// Generates output record.
function generate() {
	$0.id := 1;
	$0.name := 'toto';
}

// Called during component initialization.
// function init() {}

// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node  id="DATA_GENERATOR1" recordsNumber="2" type="DATA_GENERATOR">
<attr name="generate"><![CDATA[//#TL
int counter = -1;
// Generates output record.
function generate() {
	counter++;
	$0.id := '1';
	$0.idspec := iif(counter == 0, 1234, 5678);
	$0.namespec := iif(counter == 0, 'abcd', 'efgh');
}

// Called during component initialization.
// function init() {}

// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node id="DENORMALIZER0" key="id" type="DENORMALIZER">
<attr name="denormalize"><![CDATA[//#TL
string output = "";
// This transformation defines the way in which multiple input records 
// (with the same key) are denormalized into one output record. 
// This function is called for each input record from a group of records
// with the same key.
function append() {
	output = output + "\t<spec>\n\t<idspec>" + $idspec + "</idspec>\n\t<namespec>" + $namespec + "</namespec>\n\t</spec>\n";
	
}

// This function is called once after the append() function was called for all records
// of a group of input records defined by the key.
// It creates a single output record for the whole group.
function transform() {
	$0.id := $0.id;
	$0.output := output;
}

// Called after transform() to return the resources that have been used to their initial state
// so that next group of records with different key may be parsed.
function clean() {
	output = "";
}

// Called to return a user-defined error message when an error occurs.
// function getMessage() {}

// Called during component initialization.
// function init() {}

// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node id="EXT_HASH_JOIN0" joinKey="$id=$id" slaveDuplicates="true" type="EXT_HASH_JOIN">
<attr name="transform"><![CDATA[//#TL

// Transforms input record into output record.
function transform() {
	$0.field1 := "<contract>\\n<id>" + $id + "</id>\\n<name>" + $name + "</name>" + $1.output + "\\n</specs>\\n</contract>";
}

// Called during component initialization.
// function init() {}

// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node id="TRASH0" type="TRASH"/>
<Edge fromNode="DATA_GENERATOR0:0"  id="Edge0" inPort="Port 0 (driver)" metadata="Metadata0" outPort="Port 0 (out)" toNode="EXT_HASH_JOIN0:0"/>
<Edge fromNode="DATA_GENERATOR1:0"  id="Edge1" inPort="Port 0 (in)" metadata="Metadata2" outPort="Port 0 (out)" toNode="DENORMALIZER0:0"/>
<Edge fromNode="DENORMALIZER0:0"  id="Edge4" inPort="Port 1 (slave)" metadata="Metadata3" outPort="Port 0 (out)" toNode="EXT_HASH_JOIN0:1"/>
<Edge debugMode="true" fromNode="EXT_HASH_JOIN0:0"  id="Edge3" inPort="Port 0 (in)" metadata="Metadata1" outPort="Port 0 (out)" toNode="TRASH0:0"/>
</Phase>
</Graph>

Fantastic :slight_smile:
It works well, I used two Denormalizers components (as my structure is a bit more complex than the example I posted), one hashJoin and a structuredDataWriter for final XML.
Thanks a lot for your help