How to read mapping with same xmlFields using XMLExtract

Hi,

I am running transformation from XML to Database. The structure of XML can be anything. I want the values of xml tags to get inserted in one single row.
Eg:- the structure of xml is something likes this:

<?xml version="1.0" encoding="UTF-8"?> Item 1 00erer013616 03 Product 1 MC1 Item 2 0004erere3623 03 Product 2 MC2

My mapping is of the form:
Map ID field of Item_number to column1
Map ID field of Mill_Description to column2
Map ID field of ManufactureName to column3

I mean my output should be of form:
Output:
±------±--------±-----------±--------+
|Row |Column1 |Column2 |Column3 |
±------±--------±-----------±--------+
|# 1 |Item 1 |Product 1 | MC1 |
|# 2 |Item 2 |Product 2 | MC2 |
±------±--------±-----------±--------+

Currently I am using the mapping given below (but its not working):




Please suggest the mapping structure which we need to build for the above scenario using XMLExtract.

Thanks in advance

-Pushpendra

Hello,

Your mapping should look like the following:


<Mappings>
<Mapping element="Item_Number">
  <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="Mill_Description">
  <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="ManufactureName">
  <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
</Mappings>

You need to create a sequence (internal or external), XMLExtract will have three output ports and all edges will have identical metadata:


<Metadata id="Metadata0" previewAttachmentCharset="ISO-8859-1">
<Record fieldDelimiter="|" name="recordName1" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\r\n" type="delimited">
<Field name="count" type="integer"/>
<Field name="ID" type="string"/>
</Record>
</Metadata>

Then you need to use ExtMergeJoin with the following Transform attribute:


function transform() {
	$0.field1 := $0.ID;
	$0.field2 := $1.ID;
	$0.field3 := $2.ID;
}

The “count” field will serve as Join key field for each of the three edges.

The output edge of the ExtMergeJoin will have the following metadata:


<Metadata id="Metadata1">
<Record fieldDelimiter="|" name="recordName2" recordDelimiter="\r\n" type="delimited">
<Field name="field1" type="string"/>
<Field name="field2" type="string"/>
<Field name="field3" type="string"/>
</Record>
</Metadata>

Then you will obtain the result you request. And you only need to use any writer you want: file writer, db writer, advanced writer, etc.

Best regards,

Tomas Waller

Note that the elements without outPort:

.
<Mapping element="nameOfTheElement">
  ...
  ...
</Mapping>

serve to specify the parent element for each child element (ID, in our case) that already contains some outPort in its mapping.

Regards,

Tomas Waller

I forgot mention that you need to create a sequence whose ID will be Sequence0. Its values will be used as values of the “count” fields.

Thanks Tomas for your quick reply. Now I am able to run XML to Database transform successfully using clover graph.

But when I am trying to replicate things in Java, I am facing an issue “how to create Mapping element without specifying output port in java”.

I have gone through the XMLExtract.Mapping javadocs API, but I am not able to found any constructor which works only for element.
new XMLExtract.Mapping(java.lang.String element) ;

Also is there any alternate available to do mappings things without the use of Sequence as we need to provide string fileName while creating sequence which can cause IO problem when dealing with concurrent access to the file.

Thanks
Pushpendra

My java code which I am using to run the above XML transformation are as follows:
As I not able to get any constructor in XMLExtract.Mapping which works only for element and not for outPort, I am passing outPort as -ve nos.

// create connection object. Get driver and connect string from cfg file
// specified as a first argument
DBConnection dbCon = new DBConnection("Conn0", "D:/postgre.cfg");		
dbCon.setName("Test");

EngineInitializer.initEngine("plugins", null, null);
EngineInitializer.forceActivateAllPlugins();
GraphRuntimeContext runtimeContext = new GraphRuntimeContext();

TransformationGraph graph = new TransformationGraph();
graph.setDebugMode(true);
graph.addConnection(dbCon);

Phase phase = new Phase(0);

Sequence sequence = new SimpleSequence("Sequence0",graph,"Sequence0" , "C:/misc/seq.dat",0,1,1000);		
graph.addSequence(sequence);
sequence.isInitialized();

DataRecordMetadata metadata1=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
metadata1.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
metadata1.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));		
Edge inEdge1 = new Edge("InEdge1",metadata1);	

DataRecordMetadata metadata2=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
metadata2.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
metadata2.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));		
Edge inEdge2 = new Edge("InEdge2",metadata2);

XMLExtract nodeXML=new XMLExtract("Smallest");
nodeXML.setInputFile("C:/misc/smallest.xml");
nodeXML.setEnabled("enabled");
nodeXML.setUseNestedNodes(true);
nodeXML.addOutputPort(0, inEdge1);
nodeXML.addOutputPort(1, inEdge2);

XMLExtract.Mapping childMap1=nodeXML.new Mapping("Item_Number",-1);
XMLExtract.Mapping childMap2=nodeXML.new Mapping("ID",0);
childMap2.setSequenceField("count");
childMap2.setSequenceId("Sequence0");
childMap1.addChildMapping(childMap2);		
nodeXML.addMapping(childMap1);

XMLExtract.Mapping childMap4=nodeXML.new Mapping("Mill_Description",-2);
XMLExtract.Mapping childMap3=nodeXML.new Mapping("ID",1);
childMap3.setSequenceField("count");
childMap3.setSequenceId("Sequence0");
childMap4.addChildMapping(childMap3);		
nodeXML.addMapping(childMap4);

DataRecordMetadata metadata3=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
metadata3.addField(new DataFieldMetadata("field1", DataFieldMetadata.STRING_FIELD,"\n"));
metadata3.addField(new DataFieldMetadata("field2", DataFieldMetadata.STRING_FIELD,"\n"));		
Edge inEdge3 = new Edge("InEdge3",metadata3);

String[] joinKeys = {"count"};
MergeJoin join = new MergeJoin("join1","count;#count", null,"com.clover.test.Transform","",MergeJoin.Join.LEFT_OUTER,true,true);
join.addInputPort(0, inEdge1);
join.addInputPort(1, inEdge2);
join.setGraph(graph);
join.setPhase(phase);
join.addOutputPort(0, inEdge3);

DBOutputTable dbOutputTable = new DBOutputTable("DBTABLE",dbCon.getId(),"product");
dbOutputTable.setSqlQuery("insert into product (product_id,name) values(?,?)");		
dbOutputTable.addInputPort(0, inEdge3);

String[] cloverFields = {"field1;field2"};
dbOutputTable.setCloverFields(cloverFields);		

//add nodes to phase
try {	
	graph.addPhase(phase);
	phase.addNode(nodeXML);
	phase.addNode(join);
	phase.addNode(dbOutputTable);
	graph.addEdge(inEdge1);
	graph.addEdge(inEdge2);
	graph.addEdge(inEdge3);						
	
	EngineInitializer.initGraph(graph, runtimeContext);
} catch (GraphConfigurationException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
} catch (ComponentNotReadyException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
}

IThreadManager threadManager = new SimpleThreadManager();
WatchDog watchDog = new WatchDog(graph,runtimeContext);
runtimeContext.setUseJMX(true);
watchDog.getGraphRuntimeContext().setVerboseMode(true);
threadManager.executeWatchDog(watchDog);

and my Transform method code :

public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords) throws TransformException {
	try {
		// user's code STARTs from here !
		System.err.println("### Running Transformation ");	
		(outputRecords[0].getField(0)).setValue((inputRecords[0].getField(0).toString()));
		(outputRecords[0].getField(1)).setValue((inputRecords[1].getField(0).toString()));
		//(outputRecords[0].getField(2)).setValue((inputRecords[2].getField(0).toString()));
		
		// user's code ENDs here !
	} catch(Exception e) {
		throw new TransformException("Error in transformation class " + Transform.class.getName() + ": " + e.getMessage(), e);
	}
	return ALL;
}

I am getting NullRecord at outPort 1 from XMLExtract to MergeJoin when transform method gets invoked. And also because of NullRecord I am getting exception as parameter number: 2 missing in insert query.

Please suggest some solution for the above issue.

Thanks
Pushpendra

Try this, there was wrong id and maybe something else:



		EngineInitializer.initEngine("plugins", null, null);
		EngineInitializer.forceActivateAllPlugins();
		GraphRuntimeContext runtimeContext = new GraphRuntimeContext();

		TransformationGraph graph = new TransformationGraph();
		graph.setDebugMode(true);

		Phase phase = new Phase(0);

		DataRecordMetadata metadata1=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
		metadata1.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
		metadata1.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));      
		Edge inEdge1 = new Edge("InEdge1",metadata1);   

		DataRecordMetadata metadata2=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
		metadata2.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
		metadata2.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));      
		Edge inEdge2 = new Edge("InEdge2",metadata2);

		XMLExtract nodeXML=new XMLExtract("Smallest");
		nodeXML.setInputFile("c:/Workspaces/wruntime/examples/smallest.xml");
		nodeXML.setEnabled("enabled");
		nodeXML.setUseNestedNodes(true);
		nodeXML.addOutputPort(0, inEdge1);
		nodeXML.addOutputPort(1, inEdge2);

		XMLExtract.Mapping childMap1=nodeXML.new Mapping("Item_Number",-1);
		childMap1.prepareProcessSkipOrNumRecords();
		XMLExtract.Mapping childMap2=nodeXML.new Mapping("ID",0);
		childMap2.setSequenceField("count");
		childMap2.setSequenceId(null);
		childMap2.setSequenceId("Sequence1");
		childMap1.addChildMapping(childMap2);
		childMap2.setParent(childMap1);
		childMap2.prepareProcessSkipOrNumRecords();
		nodeXML.addMapping(childMap1);

		XMLExtract.Mapping childMap4=nodeXML.new Mapping("Mill_Description",-2);
		childMap4.prepareProcessSkipOrNumRecords();
		XMLExtract.Mapping childMap3=nodeXML.new Mapping("ID",1);
		childMap3.setSequenceField("count");
		childMap3.setSequenceId(null);
		childMap3.setSequenceId("Sequence1");
		childMap4.addChildMapping(childMap3);
		childMap3.setParent(childMap4);
		childMap3.prepareProcessSkipOrNumRecords();
		nodeXML.addMapping(childMap4);

		Trash trash1 = new Trash("trash1");
		trash1.setDebugPrint(true);
		trash1.addInputPort(0, inEdge1);

		Trash trash2 = new Trash("trash2");
		trash2.setDebugPrint(true);
		trash2.addInputPort(0, inEdge2);

		//add nodes to phase
		try {   
		   graph.addPhase(phase);
		   phase.addNode(nodeXML);
		   phase.addNode(trash1);
		   phase.addNode(trash2);
		   graph.addEdge(inEdge1);
		   graph.addEdge(inEdge2);
		   
		   EngineInitializer.initGraph(graph, runtimeContext);
		} catch (GraphConfigurationException e) {
		   // TODO Auto-generated catch block
		   e.printStackTrace();
		} catch (ComponentNotReadyException e) {
		   // TODO Auto-generated catch block
		   e.printStackTrace();
		}

		IThreadManager threadManager = new SimpleThreadManager();
		WatchDog watchDog = new WatchDog(graph,runtimeContext);
		runtimeContext.setUseJMX(true);
		watchDog.getGraphRuntimeContext().setVerboseMode(true);
		threadManager.executeWatchDog(watchDog);


Hello Pushpendra,
it may be impossible to configure XMLExtract correctly from java code, while some setters are private (http://bug.cloveretl.org/view.php?id=2819).
As a workaround I can advice you to use fromXML method of XMLExtract object:

		XMLExtract nodeXML = null;
		try {
			nodeXML = (XMLExtract) XMLExtract.fromXML(graph, read(new FileInputStream("data-tmp/Smallest.xml")).getDocumentElement());
		} catch (Exception e1) {
			// TODO Auto-generated catch block
			e1.printStackTrace();
			return;
		} 

with the Tomas Waller’s mapping

Thanks Agata and Jausperger for your ruick reply.

Agata, this workaround won’t work because size of xml which I am using is more than 500MB, so it won’t be feasible to load that much amount of data in memory. I have currently creating object by passing port as negative nos.

Jausperger, actually the thing which I am looking is to merge the outPort 0 & 1 output of XMLExtract to database table in one single row at a time using ExtMergeJoin as suggested by Tomas. I was able to do so using graph but I am not able to do so using clover Java API.
You have given example code for outPort to different trash but I am looking to merge those different port output to one single database row. I have tried this using MergeJoin component as it is the replica component of ExtMergeJoin but not able to do so.

Please suggest some solution to do so. You can refer code for Merge Join in my earlier post.

Thanks
Pushpendra

You don’t need to load the input file to memory. Create the xml file with node definition:

<Node id="Smallest" sourceUri="data-in/smallest.xml" type="XML_EXTRACT" useNestedNodes="true">
<attr name="mapping"><![CDATA[<Mappings>
<Mapping element="Item_Number">
  <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="Mill_Description">
  <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="ManufactureName">
  <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
</Mappings>]]></attr>
</Node>

and this file load in fromXML method

Thanks again Agata for your quick reply.

I have tried your code and it works fine. But still it won’t work in my case as my xml structure is not defined and also mapping elements are not defined, they are changing dynamically. So with your approach I need to write an xml file with logic catering to defined mapping structure.

Please send me some solution for my issue regarding Merge Join in my earlier post.

Thanks in advance

-Pushpendra

Could you send the error stack trace? I can’t reproduce the problem.

Thanks Agata, now I am able to successfully executing XML to Database transformation using clover Java API. I have created logic to create mapping stucture dynamically according to my requirement.

I just want to know why a restrictions is put on Merge Join to have atleast two input port, also if you can give some inputs regarding the transformation language we can use at the time of merging the inputs using Merge Join.

Thanks
Pushpendra

MergeJoin has to have at least two input ports as it merges data from two different streams.
For CTL see CloverETL Transformation Language concept and Clover ETL Transformation Language functions.

Hi,

Using above mapping structure I am not able to read attributes values from xml files. The above mapping structure works fine for reading elements values, but in my XML file I want to read attributes as well. So for reading attributes values as well what are the changes that I need to do in mapping structure.

Thanks
Pushpendra

Hello,

I have an input file with attributes:


<?xml version="1.0" encoding="UTF-8"?>
<Rows>
<Row>
<Item_Number>
<ID myattribute="valueItem1">Item 1</ID>
</Item_Number>
<GTIN_Number>00erer013616</GTIN_Number>
<Mill_Code>03</Mill_Code>
<Mill_Description>
<ID myattribute="valueProduct1">Product 1</ID>
</Mill_Description>
<Style_Number>
<Style>
<ManufactureName>
<ID myattribute="valueMC1">MC1</ID>
</ManufactureName>
</Style>
</Style_Number>
</Row>
<Row>
<Item_Number>
<ID myattribute="valueItem2">Item 2</ID>
</Item_Number>
<GTIN_Number>0004erere3623</GTIN_Number>
<Mill_Code>03</Mill_Code>
<Mill_Description>
<ID myattribute="valueProduct2">Product 2</ID>
</Mill_Description>
<Style_Number>
<Style>
<ManufactureName>
<ID myattribute1="valueMC2">MC2</ID>
</ManufactureName>
</Style>
</Style_Number>
</Row>
</Rows>

And use the following graph:


<?xml version="1.0" encoding="UTF-8"?>
<Graph author="cloveruser" created="Tue Jan 12 11:32:50 CET 2010" guiVersion="0.0.0.devel" id="1263292668555" licenseType="Evaluation Devel" modified="Tue Jan 12 15:59:35 CET 2010" modifiedBy="cloveruser" name="XMLForumAttributes" revision="1.63">
<Global>
<Metadata id="Metadata5" previewAttachmentCharset="ISO-8859-1">
<Record fieldDelimiter="|" name="recordName6" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\r\n" type="delimited">
<Field name="ID" type="string"/>
<Field name="count" type="integer"/>
<Field name="myattribute" type="string"/>
</Record>
</Metadata>
<Property fileURL="workspace.prm" id="GraphParameter0"/>
<Sequence cached="1" fileURL="${SEQ_DIR}/seq.txt" id="Sequence0" name="counting" start="1" step="1" type="SIMPLE_SEQUENCE"/>
</Global>
<Phase number="0">
<Node enabled="enabled" guiHeight="0" guiName="Trash" guiWidth="0" guiX="334" guiY="350" id="TRASH0" type="TRASH"/>
<Node enabled="enabled" guiHeight="0" guiName="Trash" guiWidth="0" guiX="330" guiY="140" id="TRASH1" type="TRASH"/>
<Node enabled="enabled" guiHeight="0" guiName="Trash" guiWidth="0" guiX="335" guiY="255" id="TRASH2" type="TRASH"/>
<Node enabled="enabled" guiHeight="0" guiName="XMLExtract" guiWidth="0" guiX="18" guiY="241" id="XML_EXTRACT0" sourceUri="${DATAIN_DIR}/XMLforumFINALComplex.xml" type="XML_EXTRACT">
<attr name="mapping"><![CDATA[<Mappings>
<Mapping element="Item_Number">
  <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="Mill_Description">
  <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="ManufactureName">
  <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
</Mappings>]]></attr>
</Node>
<Edge debugMode="true" fromNode="XML_EXTRACT0:0" guiBendpoints="" id="Edge1" inPort="Port 0 (in)" metadata="Metadata5" outPort="Port 0 (out)" toNode="TRASH1:0"/>
<Edge debugMode="true" fromNode="XML_EXTRACT0:1" guiBendpoints="" id="Edge6" inPort="Port 0 (in)" metadata="Metadata5" outPort="Port 1 (out)" toNode="TRASH2:0"/>
<Edge debugMode="true" fromNode="XML_EXTRACT0:2" guiBendpoints="" id="Edge0" inPort="Port 0 (in)" metadata="Metadata5" outPort="Port 2 (out)" toNode="TRASH0:0"/>
</Phase>
</Graph>

As the result, both the values and the attributes are sent to output ports.

Or did you have other graph or other mapping?

Best regards,

Tomas Waller