How to read mapping with same xmlFields using XMLExtract

Support/help with CloverETL (4.9) and CloverDX (5.0 or newer) implementation problems

pgargdd
Posts: 19
Joined: Mon Sep 07, 2009 10:55 am

How to read mapping with same xmlFields using XMLExtract

Postby pgargdd » Mon Sep 07, 2009 11:27 am

Hi,

I am running transformation from XML to Database. The structure of XML can be anything. I want the values of xml tags to get inserted in one single row.
Eg:- the structure of xml is something likes this:

<?xml version="1.0" encoding="UTF-8"?>
<Rows>
<Row>
<Item_Number>
<ID>Item 1</ID>
</Item_Number>
<GTIN_Number>00erer013616</GTIN_Number>
<Mill_Code>03</Mill_Code>
<Mill_Description>
<ID>Product 1</ID>
</Mill_Description>
<Style_Number>
<Style>
<ManufactureName>
<ID>MC1</ID>
</ManufactureName>
</Style>
</Style_Number>
</Row>
<Row>
<Item_Number>
<ID>Item 2</ID>
</Item_Number>
<GTIN_Number>0004erere3623</GTIN_Number>
<Mill_Code>03</Mill_Code>
<Mill_Description>
<ID>Product 2</ID>
</Mill_Description>
<Style_Number>
<Style>
<ManufactureName>
<ID>MC2</ID>
</ManufactureName>
</Style>
</Style_Number>
</Row>
</Rows>

My mapping is of the form:
Map ID field of Item_number to column1
Map ID field of Mill_Description to column2
Map ID field of ManufactureName to column3

I mean my output should be of form:
Output:
+-------+---------+------------+---------+
|Row |Column1 |Column2 |Column3 |
+-------+---------+------------+---------+
|# 1 |Item 1 |Product 1 | MC1 |
|# 2 |Item 2 |Product 2 | MC2 |
+-------+---------+------------+---------+

Currently I am using the mapping given below (but its not working):
<Mapping element="Rows">
<Mapping element="Row" outPort="0"
xmlFields="ID;ID;ID"
cloverFields="field2;field1;field3">
</Mapping>
</Mapping>
</Mappings>

Please suggest the mapping structure which we need to build for the above scenario using XMLExtract.

Thanks in advance

-Pushpendra

twaller
Posts: 49
Joined: Mon Feb 23, 2009 4:21 pm

Re: How to read mapping with same xmlFields using XMLExtract

Postby twaller » Tue Sep 08, 2009 8:39 am

Hello,

Your mapping should look like the following:

Code: Select all

<Mappings>
<Mapping element="Item_Number">
  <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="Mill_Description">
  <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="ManufactureName">
  <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
</Mappings>


You need to create a sequence (internal or external), XMLExtract will have three output ports and all edges will have identical metadata:

Code: Select all

<Metadata id="Metadata0" previewAttachmentCharset="ISO-8859-1">
<Record fieldDelimiter="|" name="recordName1" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\r\n" type="delimited">
<Field name="count" type="integer"/>
<Field name="ID" type="string"/>
</Record>
</Metadata>


Then you need to use ExtMergeJoin with the following Transform attribute:

Code: Select all

function transform() {
   $0.field1 := $0.ID;
   $0.field2 := $1.ID;
   $0.field3 := $2.ID;
}


The "count" field will serve as Join key field for each of the three edges.

The output edge of the ExtMergeJoin will have the following metadata:

Code: Select all

<Metadata id="Metadata1">
<Record fieldDelimiter="|" name="recordName2" recordDelimiter="\r\n" type="delimited">
<Field name="field1" type="string"/>
<Field name="field2" type="string"/>
<Field name="field3" type="string"/>
</Record>
</Metadata>


Then you will obtain the result you request. And you only need to use any writer you want: file writer, db writer, advanced writer, etc.

Best regards,

Tomas Waller
Tomas Waller
Javlin, a.s.
wallert@mail.javlin.cz

twaller
Posts: 49
Joined: Mon Feb 23, 2009 4:21 pm

Re: How to read mapping with same xmlFields using XMLExtract

Postby twaller » Tue Sep 08, 2009 8:43 am

Note that the elements without outPort:

Code: Select all

.
<Mapping element="nameOfTheElement">
  ...
  ...
</Mapping>


serve to specify the parent element for each child element (ID, in our case) that already contains some outPort in its mapping.

Regards,

Tomas Waller
Tomas Waller
Javlin, a.s.
wallert@mail.javlin.cz

twaller
Posts: 49
Joined: Mon Feb 23, 2009 4:21 pm

Re: How to read mapping with same xmlFields using XMLExtract

Postby twaller » Tue Sep 08, 2009 8:50 am

I forgot mention that you need to create a sequence whose ID will be Sequence0. Its values will be used as values of the "count" fields.
Tomas Waller
Javlin, a.s.
wallert@mail.javlin.cz

pgargdd
Posts: 19
Joined: Mon Sep 07, 2009 10:55 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby pgargdd » Tue Sep 08, 2009 12:42 pm

Thanks Tomas for your quick reply. Now I am able to run XML to Database transform successfully using clover graph.

But when I am trying to replicate things in Java, I am facing an issue "how to create Mapping element without specifying output port in java".

I have gone through the XMLExtract.Mapping javadocs API, but I am not able to found any constructor which works only for element.
new XMLExtract.Mapping(java.lang.String element) ;

Also is there any alternate available to do mappings things without the use of Sequence as we need to provide string fileName while creating sequence which can cause IO problem when dealing with concurrent access to the file.

Thanks
Pushpendra

pgargdd
Posts: 19
Joined: Mon Sep 07, 2009 10:55 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby pgargdd » Tue Sep 08, 2009 3:45 pm

My java code which I am using to run the above XML transformation are as follows:
As I not able to get any constructor in XMLExtract.Mapping which works only for element and not for outPort, I am passing outPort as -ve nos.

Code: Select all

// create connection object. Get driver and connect string from cfg file
// specified as a first argument
DBConnection dbCon = new DBConnection("Conn0", "D:/postgre.cfg");      
dbCon.setName("Test");

EngineInitializer.initEngine("plugins", null, null);
EngineInitializer.forceActivateAllPlugins();
GraphRuntimeContext runtimeContext = new GraphRuntimeContext();

TransformationGraph graph = new TransformationGraph();
graph.setDebugMode(true);
graph.addConnection(dbCon);

Phase phase = new Phase(0);

Sequence sequence = new SimpleSequence("Sequence0",graph,"Sequence0" , "C:/misc/seq.dat",0,1,1000);      
graph.addSequence(sequence);
sequence.isInitialized();

DataRecordMetadata metadata1=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
metadata1.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
metadata1.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));      
Edge inEdge1 = new Edge("InEdge1",metadata1);   

DataRecordMetadata metadata2=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
metadata2.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
metadata2.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));      
Edge inEdge2 = new Edge("InEdge2",metadata2);

XMLExtract nodeXML=new XMLExtract("Smallest");
nodeXML.setInputFile("C:/misc/smallest.xml");
nodeXML.setEnabled("enabled");
nodeXML.setUseNestedNodes(true);
nodeXML.addOutputPort(0, inEdge1);
nodeXML.addOutputPort(1, inEdge2);

XMLExtract.Mapping childMap1=nodeXML.new Mapping("Item_Number",-1);
XMLExtract.Mapping childMap2=nodeXML.new Mapping("ID",0);
childMap2.setSequenceField("count");
childMap2.setSequenceId("Sequence0");
childMap1.addChildMapping(childMap2);      
nodeXML.addMapping(childMap1);

XMLExtract.Mapping childMap4=nodeXML.new Mapping("Mill_Description",-2);
XMLExtract.Mapping childMap3=nodeXML.new Mapping("ID",1);
childMap3.setSequenceField("count");
childMap3.setSequenceId("Sequence0");
childMap4.addChildMapping(childMap3);      
nodeXML.addMapping(childMap4);

DataRecordMetadata metadata3=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
metadata3.addField(new DataFieldMetadata("field1", DataFieldMetadata.STRING_FIELD,"\n"));
metadata3.addField(new DataFieldMetadata("field2", DataFieldMetadata.STRING_FIELD,"\n"));      
Edge inEdge3 = new Edge("InEdge3",metadata3);

String[] joinKeys = {"count"};
MergeJoin join = new MergeJoin("join1","count;#count", null,"com.clover.test.Transform","",MergeJoin.Join.LEFT_OUTER,true,true);
join.addInputPort(0, inEdge1);
join.addInputPort(1, inEdge2);
join.setGraph(graph);
join.setPhase(phase);
join.addOutputPort(0, inEdge3);

DBOutputTable dbOutputTable = new DBOutputTable("DBTABLE",dbCon.getId(),"product");
dbOutputTable.setSqlQuery("insert into product (product_id,name) values(?,?)");      
dbOutputTable.addInputPort(0, inEdge3);

String[] cloverFields = {"field1;field2"};
dbOutputTable.setCloverFields(cloverFields);      

//add nodes to phase
try {   
   graph.addPhase(phase);
   phase.addNode(nodeXML);
   phase.addNode(join);
   phase.addNode(dbOutputTable);
   graph.addEdge(inEdge1);
   graph.addEdge(inEdge2);
   graph.addEdge(inEdge3);                  
   
   EngineInitializer.initGraph(graph, runtimeContext);
} catch (GraphConfigurationException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
} catch (ComponentNotReadyException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
}

IThreadManager threadManager = new SimpleThreadManager();
WatchDog watchDog = new WatchDog(graph,runtimeContext);
runtimeContext.setUseJMX(true);
watchDog.getGraphRuntimeContext().setVerboseMode(true);
threadManager.executeWatchDog(watchDog);


and my Transform method code :

Code: Select all

public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords) throws TransformException {
   try {
      // user's code STARTs from here !
      System.err.println("### Running Transformation ");   
      (outputRecords[0].getField(0)).setValue((inputRecords[0].getField(0).toString()));
      (outputRecords[0].getField(1)).setValue((inputRecords[1].getField(0).toString()));
      //(outputRecords[0].getField(2)).setValue((inputRecords[2].getField(0).toString()));
      
      // user's code ENDs here !
   } catch(Exception e) {
      throw new TransformException("Error in transformation class " + Transform.class.getName() + ": " + e.getMessage(), e);
   }
   return ALL;
}


I am getting NullRecord at outPort 1 from XMLExtract to MergeJoin when transform method gets invoked. And also because of NullRecord I am getting exception as parameter number: 2 missing in insert query.

Please suggest some solution for the above issue.

Thanks
Pushpendra

jausperger
Posts: 19
Joined: Wed Apr 04, 2007 2:44 pm

Re: How to read mapping with same xmlFields using XMLExtract

Postby jausperger » Wed Sep 09, 2009 10:03 am

Try this, there was wrong id and maybe something else:

Code: Select all


      EngineInitializer.initEngine("plugins", null, null);
      EngineInitializer.forceActivateAllPlugins();
      GraphRuntimeContext runtimeContext = new GraphRuntimeContext();

      TransformationGraph graph = new TransformationGraph();
      graph.setDebugMode(true);

      Phase phase = new Phase(0);

      DataRecordMetadata metadata1=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
      metadata1.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
      metadata1.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));     
      Edge inEdge1 = new Edge("InEdge1",metadata1);   

      DataRecordMetadata metadata2=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
      metadata2.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
      metadata2.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));     
      Edge inEdge2 = new Edge("InEdge2",metadata2);

      XMLExtract nodeXML=new XMLExtract("Smallest");
      nodeXML.setInputFile("c:/Workspaces/wruntime/examples/smallest.xml");
      nodeXML.setEnabled("enabled");
      nodeXML.setUseNestedNodes(true);
      nodeXML.addOutputPort(0, inEdge1);
      nodeXML.addOutputPort(1, inEdge2);

      XMLExtract.Mapping childMap1=nodeXML.new Mapping("Item_Number",-1);
      childMap1.prepareProcessSkipOrNumRecords();
      XMLExtract.Mapping childMap2=nodeXML.new Mapping("ID",0);
      childMap2.setSequenceField("count");
      childMap2.setSequenceId(null);
      childMap2.setSequenceId("Sequence1");
      childMap1.addChildMapping(childMap2);
      childMap2.setParent(childMap1);
      childMap2.prepareProcessSkipOrNumRecords();
      nodeXML.addMapping(childMap1);

      XMLExtract.Mapping childMap4=nodeXML.new Mapping("Mill_Description",-2);
      childMap4.prepareProcessSkipOrNumRecords();
      XMLExtract.Mapping childMap3=nodeXML.new Mapping("ID",1);
      childMap3.setSequenceField("count");
      childMap3.setSequenceId(null);
      childMap3.setSequenceId("Sequence1");
      childMap4.addChildMapping(childMap3);
      childMap3.setParent(childMap4);
      childMap3.prepareProcessSkipOrNumRecords();
      nodeXML.addMapping(childMap4);

      Trash trash1 = new Trash("trash1");
      trash1.setDebugPrint(true);
      trash1.addInputPort(0, inEdge1);

      Trash trash2 = new Trash("trash2");
      trash2.setDebugPrint(true);
      trash2.addInputPort(0, inEdge2);

      //add nodes to phase
      try {   
         graph.addPhase(phase);
         phase.addNode(nodeXML);
         phase.addNode(trash1);
         phase.addNode(trash2);
         graph.addEdge(inEdge1);
         graph.addEdge(inEdge2);
         
         EngineInitializer.initGraph(graph, runtimeContext);
      } catch (GraphConfigurationException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
      } catch (ComponentNotReadyException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
      }

      IThreadManager threadManager = new SimpleThreadManager();
      WatchDog watchDog = new WatchDog(graph,runtimeContext);
      runtimeContext.setUseJMX(true);
      watchDog.getGraphRuntimeContext().setVerboseMode(true);
      threadManager.executeWatchDog(watchDog);


avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby avackova » Wed Sep 09, 2009 10:24 am

Hello Pushpendra,
it may be impossible to configure XMLExtract correctly from java code, while some setters are private (http://bug.cloveretl.org/view.php?id=2819).
As a workaround I can advice you to use fromXML method of XMLExtract object:

Code: Select all

      XMLExtract nodeXML = null;
      try {
         nodeXML = (XMLExtract) XMLExtract.fromXML(graph, read(new FileInputStream("data-tmp/Smallest.xml")).getDocumentElement());
      } catch (Exception e1) {
         // TODO Auto-generated catch block
         e1.printStackTrace();
         return;
      }
with the Tomas Waller's mapping
Agata Vackova
Javlin a.s.
agata.vackova@javlin.eu

pgargdd
Posts: 19
Joined: Mon Sep 07, 2009 10:55 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby pgargdd » Wed Sep 09, 2009 10:47 am

Thanks Agata and Jausperger for your ruick reply.

Agata, this workaround won't work because size of xml which I am using is more than 500MB, so it won't be feasible to load that much amount of data in memory. I have currently creating object by passing port as negative nos.

Jausperger, actually the thing which I am looking is to merge the outPort 0 & 1 output of XMLExtract to database table in one single row at a time using ExtMergeJoin as suggested by Tomas. I was able to do so using graph but I am not able to do so using clover Java API.
You have given example code for outPort to different trash but I am looking to merge those different port output to one single database row. I have tried this using MergeJoin component as it is the replica component of ExtMergeJoin but not able to do so.

Please suggest some solution to do so. You can refer code for Merge Join in my earlier post.

Thanks
Pushpendra

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby avackova » Wed Sep 09, 2009 10:55 am

You don't need to load the input file to memory. Create the xml file with node definition:

Code: Select all

<Node id="Smallest" sourceUri="data-in/smallest.xml" type="XML_EXTRACT" useNestedNodes="true">
<attr name="mapping"><![CDATA[<Mappings>
<Mapping element="Item_Number">
  <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="Mill_Description">
  <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="ManufactureName">
  <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
</Mappings>]]></attr>
</Node>

and this file load in fromXML method
Agata Vackova

Javlin a.s.

agata.vackova@javlin.eu

pgargdd
Posts: 19
Joined: Mon Sep 07, 2009 10:55 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby pgargdd » Wed Sep 09, 2009 12:46 pm

Thanks again Agata for your quick reply.

I have tried your code and it works fine. But still it won't work in my case as my xml structure is not defined and also mapping elements are not defined, they are changing dynamically. So with your approach I need to write an xml file with logic catering to defined mapping structure.

Please send me some solution for my issue regarding Merge Join in my earlier post.

Thanks in advance

-Pushpendra

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby avackova » Thu Sep 10, 2009 8:27 am

Could you send the error stack trace? I can't reproduce the problem.
Agata Vackova

Javlin a.s.

agata.vackova@javlin.eu

pgargdd
Posts: 19
Joined: Mon Sep 07, 2009 10:55 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby pgargdd » Fri Sep 11, 2009 10:14 am

Thanks Agata, now I am able to successfully executing XML to Database transformation using clover Java API. I have created logic to create mapping stucture dynamically according to my requirement.

I just want to know why a restrictions is put on Merge Join to have atleast two input port, also if you can give some inputs regarding the transformation language we can use at the time of merging the inputs using Merge Join.

Thanks
Pushpendra

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby avackova » Fri Sep 11, 2009 12:24 pm

MergeJoin has to have at least two input ports as it merges data from two different streams.
For CTL see CloverETL Transformation Language concept and Clover ETL Transformation Language functions.
Agata Vackova

Javlin a.s.

agata.vackova@javlin.eu

pgargdd
Posts: 19
Joined: Mon Sep 07, 2009 10:55 am

Re: How to read mapping with same xmlFields using XMLExtract

Postby pgargdd » Mon Jan 11, 2010 7:20 pm

Hi,

Using above mapping structure I am not able to read attributes values from xml files. The above mapping structure works fine for reading elements values, but in my XML file I want to read attributes as well. So for reading attributes values as well what are the changes that I need to do in mapping structure.

Thanks
Pushpendra