Return raw XML subset string from an Xpath query

chathaway · August 16, 2013, 12:00am

I have an HTML document that I am extracting elements out of with XPath. I’m trying to work out how to extract the full HTML branch of a given set of

tags so that I can process them further in subsequent graph nodes.

I see a similar note posted for doing this in PHP but I am not familiar with how to use this language:
http://stackoverflow.com/questions/1534 … ery-in-php

In case it helps, this is the Mapping syntax i am currently using which extracts hyperlinks and descriptions from each DIV where class=‘details’:
<Context
xpath=“//DIV[@class=‘details’]”
outPort=“0” >
<Mapping
xpath=“./DIV[@class=‘vehicle’]//A/@href”
cloverField=“Record_TagA_href”/>
<Mapping
xpath=“./DIV[@class=‘vehicle’]//A”
cloverField=“Record_TagA”/>
<Mapping
xpath=“./DIV[@class=‘vehicle’]//H5”
cloverField=“Record_TagH5”/>
<Mapping
xpath=“.//UL[@class=‘specifics’]”
cloverField=“Record_TagULspecifics”/>

Instead I would like a full string of each HTML DIV node where class=‘details’ stripping out everything else e.g. just leaving:

…

.

I originally tried in a standard transform via string manipulation with regex:

foreach (string item : find($in.0.Document_XHTML,'<DIV class="details">(.*?)</DIV>')

However, this didn’t return the full outer DIV if there were any nested DIV tags within it.

Thanks

imriskal · August 19, 2013, 1:43pm

Hello chathaway,

If I understand right, the described functionality is available in XMLExtract component since CloverETL version 3.4.0, see https://bug.javlin.eu/browse/CL-2118

You can extract the whole XML subtree this way and then filter the result using a regexp to have only the DIVs with the desired class.

Best regards,

Topic		Replies	Views
Getting Element Name via XML Readers CloverDX Platform	5	0	December 8, 2009
How to capture [element contents] in XML Extract or else? CloverDX Platform	4	0	March 22, 2012
Simple XMLExtract -- Help for a noob CloverDX Platform	1	1	July 13, 2011
XMLextract CloverDX Platform	4	0	July 16, 2007
Pulling Content from a Node (XML files) CloverDX Platform	1	0	August 30, 2017

Return raw XML subset string from an Xpath query

Related topics