Normalising sparse data - best method?

I have un-normalised data held in a string field as optional key value pairs.

Here’s a row example:
‘num=04^author=1^dataizd=^dataizm=^tom=12^link=^kolon=^comment=’

I would like to split this into disparate fields ignoring empty data. I have read much of the docs, and watched most tutorials but so far haven’t achieved what I am looking for. I am sure that this must be a common task that is already catered for in CloverETL but I don’t have the experience or vocabulary to identify the approved method. Hence, I’m reaching out to the community.

This is how I am approaching it (although it isn’t working):

$out.0.created_by = str2integer(find($in.0.data, "author=([0-9]*)\\^", 1)[0] : "0");
$out.0.modified = str2date(find($in.0.data, "dataizd=([0-9][0-9]\\.[0-9][0-9]\\.\\d{4})\\^")[1]:"01.01.1970", "dd.MM.yyyy");
$out.0.publish_down = str2date(find($in.0.data, "dataizm=([0-9][0-9]\\.[0-9][0-9]\\.\\d{4})\\^")[1]:"01.01.1970", "dd.MM.yyyy");
$out.0.attribs = find($in.0.data, "tom=(\\d*)\\^")[1]:"";
$out.0.link_ref = str2integer(find($in.0.data, "kolon=(\\d.*)\\^")[1]:"0");
$out.0.metadesc = find($in.0.data, "comment=(.*)")[1]:"";

I would appreciate some help with my first foray into ETL.
Thanks
Sadler

Hi,
it is a bit tricky… if you want to account for all possible variations/missing values. But following example might help you:


function integer transform() {
	map[string,string] values; //map for holding your "sparse" data
	string[] pairs = split($in.0.data,"\\^");  //split input string into key-value pairs

        //iterate and create real keys&values
	foreach(string pair : pairs){
			string[] def=split(pair,"=");
			values[def[0]]=def[1]:null;   //if there is no value for key, insert null as a key's value
	}
	
        printLog(info,values); //just DEBUG print

	$out.0.modified = str2date(nvl(values["dataizd"],"01.01.1970" ), "dd.MM.yyyy");  //if there is no dataizd value or key does not exist at all, use default value
        $out.0.link_ref = str2integer(nvl(values["kolon], "0"));
         ///.... other assignments....
	
	return ALL;	
}

Thank you very much. :smiley: