XMLExtract:Reading elements and attributes with same name 2

The question title is repeated cause it fits my case but it is indeed different.

I have a simple but complicated case, I am trying to fix a reader after an engine upgrade, and this has taking me a couple of days of frustration that I finally got here. This is what I’m trying to get:

I have this sample input file:


<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.w3.org/2005/Atom" xmlns:csm="http://www.blabla.org/feed/2.0">
  <entry>
    <id>urn:nid:24477</id>
    <csm:review star_rating="3">
      <csm:themes>
        <csm:theme id="1234">A theme value</csm:theme>
      </csm:themes>
    </csm:review>
    <summary><![CDATA[This is the summary of this review]]></summary>
  </entry>
</feed>

I found that the current faulty mapping code is this:


<Mappings>
    <Mapping element="entry" outPort="1" xmlFields="id" cloverFields="csm_id_uri">
        <Mapping element="csm:theme" outPort="0" parentKey="csm_id_uri" generatedKey="csm_id_uri" xmlFields="id" cloverFields="tag_id"/>
    </Mapping>
</Mappings>

This mapping just reads nothing, all the records the log says they go to port1, but when I dump it to a file, the records are empty, no data is passed.

I think (not sure yet, but…) the expected record should have:

|urn:nid:24477|||1234||

which is:

/feed/entry/id → csm_id_uri = urn:nid:24477
/feed/entry/csm:review/csm:themes/csm:theme/@id → tag_id = 1234

and the rest of the fields are left blank.

The point is that I can’t get to have a single record with those values in a single record, I have been able to get one or the other, or two records per value.

I also tried this:


<Mapping element="entry" >

    <Mapping element="csm:review">
    <Mapping element="csm:themes">
        <Mapping element="csm:theme" outPort="1"
                 xmlFields="id;../../../id"
                 cloverFields="tag_id;csm_id_uri" />
    </Mapping>
    </Mapping>

</Mapping>

but no matter what combination I try, I can not refer to a child element using the dots, if the upper Id was an attribute instead of a child element, the problem would have been solved already. Is there a way to achieve this?

I’m looking for a general way to map N to 1 with this structure, for example, having the input


<feed>
  <entry>
    <id>5555<id>
    <listOfThings>
      <item id=33>item 1</item>
      <item id=44>item 2</item>
    </listOfThings>
  <entry>
  <entry>
    <id>6666<id>
    <listOfThings>
      <item id=77>item 3</item>
      <item id=88>item 4</item>
    </listOfThings>
  <entry>
<feed>

to get these records:


|5555|33|||
|5555|44|||
|6666|77|||
|6666|88|||

All the examples I have found are where the same name input fields are of the same nature, usually an attribute.

Can anyone please help?

Thank you in advance.

Hi, jfuentesve,

XMLExtract is not exactly the right component for this task. I would use XMLReader where it is possible control mapping with xpath expressions. The mapping would be as following:


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Context xpath="//n2:theme" outPort="0" namespacePaths='n1="http://www.w3.org/2005/Atom";n2="http://www.blabla.org/feed/2.0"'>
	<Mapping cloverField="field1" xpath="../../../n1:id"/>
	<Mapping cloverField="field4" xpath="@id"/>
</Context>

This way you can process files with many entries or with many themes or both.

Best regards,

Hi, thank you for your response.

Unfortunately I can not find such a reader of type XMLReader available, these are the ones I can choose from:

XMLExtract
XMLXPathReader

I went ahead and implemented a solution using XMLXpathReader, but now Im dealing with a missing class that should be accesible from the plugins:


Error loading graph:/opt/ds/jetty-cloverServer/sandboxes/csm/graph/tag-association-program.grf Can't create object of type XML_XPATH_READER with reason: net/sf/saxon/sxpath/XPathEvaluator

I can run it locally from Eclipse and it works with a java test, but doesn’t work once deployed in the server, anyways that is not related to the topic. (though some help is welcome)

This is what I’m using with XMLXpathReader:

Tha mapping:


<Context xpath="/feed/entry" outPort="0" sequenceField="IndexKey" namespacePaths='"http://www.w3.org/2005/Atom";csm="http://www.blahblah.org/feed/2.0"'>
   <Mapping xpath="./id" cloverField="csm_id_uri"/>
   <Context xpath="./csm:review/csm:themes/csm:theme" parentKey="IndexKey" generatedKey="ThemeIndexKey" outPort="1">
    <Mapping xpath="./@id" cloverField="tag_id"/>
   </Context>
</Context>

The node usage:

Screen shot 2013-01-29 at 11.07.03 AM.png

The corresponding edges have these fields:

csm_id_uri;IndexKey
tag_id;ThemeIndexKey

I had to invert the order of the ports from/to the reader and the hash join to choose the right master key, using the keys in the mapping for the matching, it works like a charm.

We are using clover 3.2.1. engine and designer.