Return raw XML subset string from an Xpath query

I have an HTML document that I am extracting elements out of with XPath. I’m trying to work out how to extract the full HTML branch of a given set of

tags so that I can process them further in subsequent graph nodes.

I see a similar note posted for doing this in PHP but I am not familiar with how to use this language:
http://stackoverflow.com/questions/1534 … ery-in-php

In case it helps, this is the Mapping syntax i am currently using which extracts hyperlinks and descriptions from each DIV where class=‘details’:
<Context
xpath=“//DIV[@class=‘details’]”
outPort=“0” >
<Mapping
xpath=“./DIV[@class=‘vehicle’]//A/@href
cloverField=“Record_TagA_href”/>
<Mapping
xpath=“./DIV[@class=‘vehicle’]//A”
cloverField=“Record_TagA”/>
<Mapping
xpath=“./DIV[@class=‘vehicle’]//H5”
cloverField=“Record_TagH5”/>
<Mapping
xpath=“.//UL[@class=‘specifics’]”
cloverField=“Record_TagULspecifics”/>

Instead I would like a full string of each HTML DIV node where class=‘details’ stripping out everything else e.g. just leaving:

.

I originally tried in a standard transform via string manipulation with regex:

foreach (string item : find($in.0.Document_XHTML,'<DIV class="details">(.*?)</DIV>')

However, this didn’t return the full outer DIV if there were any nested DIV tags within it.

Thanks

Hello chathaway,

If I understand right, the described functionality is available in XMLExtract component since CloverETL version 3.4.0, see https://bug.javlin.eu/browse/CL-2118

You can extract the whole XML subtree this way and then filter the result using a regexp to have only the DIVs with the desired class.

Best regards,