I have a text file that is in a normalized format: ID|AddressLine1|AddressLine2|City|State|Zip|Country
with multiple addresses records per ID
I want to transform the data into a denormalized column structure with the key being the “ID” and the Address columns denormalized into separate numbered repeating group columns:
IID|AddressLine1_1|AddressLine2_1|AddressCity_1|AddressState_1|AddressZip_1|AddressCountry_1|AddressLine1_2|AddressLine2_2|AddressCity_2|AddressState_2|AddressZip_2|AddressCountry_2|AddressLine1_3|AddressLine2_3|AddressCity_3|AddressState_3|AddressZip_3|AddressCountry_3|AddressLine1_4|AddressLine2_4|AddressCity_4|AddressState_4|AddressZip_4|AddressCountry_4|AddressLine1_5|AddressLine2_5|AddressCity_5|AddressState_5|AddressZip_5|AddressCountry_5
I am trying to determine if the “Denormalizer” component is the correct component to use for this. It seems like it should be, but the only examples I can find in your documentation concatenate multiple source values into one output column as opposed to mapping to individual numbered output columns like the example I am providing. I am attaching some sample data for each file structure.
Well, this is fairly easily achievable by Denormalizer, as you suggested. If you can rely on number and structure of input records - which won’t exceed projected output record’s width and will have fields specified in correct order, you should be able to use following code (assuming all fields are strings, except ID).
<output record's name> accumulator;
integer cnt = 0; // How many records were processed in this group
integer RECORD_OFFSET = 1; // How many fields should be skipped from reading of incoming record - skip just first field (ID)
function integer append() {
// Iteration number multiplied by number of fields to transfer (whole record, minus offset)
integer outputOffset = cnt * (length($in.0) - RECORD_OFFSET);
// Copy all fields from the record, disregard those at the beginning
for (integer i=0;i<length($in.0)-RECORD_OFFSET;i++) {
integer inFieldIdx = i+RECORD_OFFSET;
integer outFieldIdx = outputOffset+i+RECORD_OFFSET;
// Copy field of to an offset + current field position, skipping global offset (beginning of a record)
// If there's anything to copy
if (!isNull($in.0,inFieldIdx) {
setStringValue(accumulator,outFieldIdx,getStringValue($in.0,inFieldIdx));
}
}
// Increase iteration number
cnt++;
return OK;
}
// Copy accumulation record to the output and set ID of a group
function integer transform() {
$out.0.* = accumulator.*;
$out.0.ID = $in.0.ID;
return OK;
}
// Reset counter and accumulation record
function void clean() {
cnt = 0;
resetRecord(accumulator);
}
There might be some errors in the code, haven’t actually executed it. But the idea is to use CTL reflection and copy fields by order with given offset from “short” to “long” aggregated record.
Thank you Pavel! That worked perfectly! I just added one check to make sure the number of records in each group did not exceed the number I want to load which is 5.
if ( cnt <= 4 ) {
if (isNull($in.0,inFieldIdx) == false ) {
setStringValue(accumulator,outFieldIdx,getStringValue($in.0,inFieldIdx));
}
}