Hello
I am evaluating clover.etl as a replacement aggregation engine for an application. Many of the data input sources are XML streams. I understand that XMLextract is still under development. I have some ideas about how to handle XML streams and how to break them up into flat representations. However, it appears that the process of accepting the XML mappings as children of Node elements is not implemented as described in the XMLextract javadoc.
If the maping elements are entered through the “mapping” property of the GUI, they result as attributes of the NODE element having been processed by Component XMLAttributes.
<?xml version="1.0" encoding="UTF-8"?>
When input directly into the graph description, they are ignored.
<?xml version="1.0" encoding="UTF-8"?>
Am I just misinterpreting the DTD or is this something that is planned for future releases. I see that there is an internal calss in XMLextract for handling "Mapping".
Mike
Thanks
I did figure that out.
I have rewritten XMLExtract to do some other interesting things like inserting the context of the extracted data segment and some synchronization information when repeating elements terminate. I will submit my results when I am satisfied with the results.
Mike
I added some extension to the XMLExtract myself.
I had problems with namespaces like
Clover’s metadata only accepts letters and numbers for field names, so I made something similar to dbFields-cloverFields mapping
here is example:
xmlFields=“xml:lang” cloverFields=“lang”
If you find this useful I could send you the patch.
BTW, What are the future plans for XMLExtract component?
Hello Mike.
XMLExtract has mapping attribute, which is inner xml node. Now it is one and only component, which need this functionality. In the future we want support this case too. But for now you must write xml code for this component by hand.
OtaSanek
HI !
We are definitely interested in your changes. Please, send the changes to david.pavlis centrum.cz
As for the future of the component - it has been created by Kuan Kou who does not seem to be maintaining it any more.
We have been thinking about recoding this component, but this is currently somewhere in the future.
David.