I am rather new to CloverETL, using the technology to develop a datastore for an analytic web application (Oracle Endeca Information Discovery 2.3).
I am using the XMLExtract component to read a jagged file, essentially a scrape from a CMS directory. Each record in the file varies in width, carrying a different collection of source properties (anywhere from about 5 to 30).
First, I tried without a schema using the source following source code, but cannot generate a tree structure to be edited in the visual mapping editor:
Also, I am unsure of how best to configure the metadata for this file.
I would like to map each PROP and its PVAL on a source record to a Clover property with the PROP NAME. Is there a way to avoid doing this manually?
I would appreciate any direction or insight anyone could provide on this.
yes this is possible. However since the graph reads the metadata on it’s startup, you have to split the action into 2 graphs. First one generates your metadata through the XMLWriter component and the second graph can then use it. You can chain them together by the RunGraph component (as seen in the example). Also the metadata have to have no whitespaces, so I’ve added an reformat component which replaces all unwanted characters with underscore.
I’ve created a sample solution, which you can view.
If you have a chance, I am wondering about the following error:
ERROR [WatchDog] - Node REMOVE_ALL_WHITE_SPACES finished with status: Error occurred in nested transformation: ERROR caused by: Message: Transform failed! caused by: org.jetel.ctl.TransformLangExecutorRuntimeException: Interpreter runtime exception in function replace on line 6 column 21 - Unexpected null value.
Using your solution I was able to successfully deploy data from a couple of files. However, execution of the transform fails on a large file with the same XML structure.
Does this error indicate malformed XML? The file reads and maps properly in another environment (Endeca InFront). Anything else worth noting here?
if there are null records in your data, you need to strip the data flow rid of them. You could make an extra Reformat for this. I just incorporated this into the one that removes all whitespaces. It throws away all records that don’t have a NAME set (no property name). If you want to do something with these records, you can dump them to the second port for instance or, name them with something generic and include them back in the stream.
Open the RemoveAllWhitespaces component in both of your graphs, switch to the “source” tab and replace the code with the one provided below.
function integer transform() {
if($in.0.Name!=null){ // Is the record OK?
$out.0.Pval = $in.0.Pval;
$out.0.Name = replace($in.0.Name, "[^a-zA-Z0-9_]", "_"); // Get rid of all "bad" characters.
$out.0.id = $in.0.id;
return OK; // This means "pass the record on".
}
else return -1; // This means "throw away this record".
}