I have an HTML document that I am extracting elements out of with XPath. I’m trying to work out how to extract the full HTML branch of a given set of
I see a similar note posted for doing this in PHP but I am not familiar with how to use this language:
http://stackoverflow.com/questions/1534 … ery-in-php
In case it helps, this is the Mapping syntax i am currently using which extracts hyperlinks and descriptions from each DIV where class=‘details’:
<Context
xpath=“//DIV[@class=‘details’]”
outPort=“0” >
<Mapping
xpath=“./DIV[@class=‘vehicle’]//A/@href”
cloverField=“Record_TagA_href”/>
<Mapping
xpath=“./DIV[@class=‘vehicle’]//A”
cloverField=“Record_TagA”/>
<Mapping
xpath=“./DIV[@class=‘vehicle’]//H5”
cloverField=“Record_TagH5”/>
<Mapping
xpath=“.//UL[@class=‘specifics’]”
cloverField=“Record_TagULspecifics”/>
Instead I would like a full string of each HTML DIV node where class=‘details’ stripping out everything else e.g. just leaving:
I originally tried in a standard transform via string manipulation with regex:
foreach (string item : find($in.0.Document_XHTML,'<DIV class="details">(.*?)</DIV>')
However, this didn’t return the full outer DIV if there were any nested DIV tags within it.
Thanks