Hi all,
I’m new to Clover and I’ve been tasked to create a chart to compare 2 flat files of similar metadata.
My idea is to have a component that takes in 2 inputs, and do a custom transformation to compare the data from $in.0 with $in.1.
If they are the same, nothing should be outputted, if they are different, I need to output both rows into a flat file (dump it in trash for now).
I tried a number of Transformer components, but none seem to give me what I need.
I also tried a customjavatransformer component, but even the simplest logic below doesn’t work:
@Override
public void execute(){
DataRecord inRec0 = readRecordFromPort(0);
DataRecord inRec1 = redRecordFromPort(1);
writeRecordToPort(0, inRec0);
writeRecordToPort(0, inRec1);
}
Any idea on how I can accomplish this? What am I missing here?
Any help is appreciated! All the best!
Hi Rodrigo,
I’m pretty sure, other members provide better solution than me (as i think, the ideal and universal solution is loop thru the data fields, and compare them, or similar, and not ‘fixed’ solutions).
Also, solutions can depend on what is means exactly ‘same data’ or ‘similar data’, etc.
Eg. solutions may different if you expect only one matching case, but solutions can differ if more than one (1) ‘same’ data rows exist in the different sources, etc.
However, i describe 3 solution, when i assume the two file is ‘pair’ (1:1 checking) on data values in fields.
Sample data from two file: file1 / file2 / file 1+2 data (difference highlighted)
clover_compare_src_data.png
Solution 1:
Gather the content of the two file with ‘SimpleGather’ component. Then sort them by ExtSort on ‘all’ fields where you think about ‘matching’/‘same’. And add ‘DeDup’ dedup component, set the ‘Dedup key’ the sorted fields. Set the Dedup component ‘Keep’ properties to ‘Unique’
And youre done: the ‘Dedup’ send on ‘Out0’ the ‘different’ rows. and send out in the ‘Out1’ the ‘same’ rows.
clover_compare_out-data.png
Regards: Andras,
p.s. solution 2,3 later. But basically same principles: once with DeDup Keep: first, and once use ‘Combine’ component.
Solution 2: DeDup, but with keep: ‘First’ + some little code:
clover_coampare_sol2_graph.png
First we load the datas (files) same as in the previous, Gather them to one, and DeDup (key: key(a);title(a);value(a)).
In the out0, we get the ‘single’ items (4row), in the out1 we get the what is ‘dedupped’ (2row).
I add with reformat to the out0 data a flag: ‘1’ (integer) to ‘Dedup_NR’ field.
For the out1 data i add ‘2’ to the ‘Dedup_NR’.
Gather back the two data.
After that, i sort again, with key: key(a);title(a);value(a);dedup_nr(d)
Please note, i sort by dedup_nr DESC (eg. 2, 1). To get the ‘dedupped’ values first in the order.
clover_compare_sol2_dedup_data.png
In the Reformat in the end, i calculate flag to the ‘combine_logic’ field: ‘SAME’/‘DIFF’
If the value Dedup_nr= ‘2’ => we sure, its ‘dedupped’ => means has ‘pair’ = > ‘SAME’
If the value Dedup_nr = ‘1’ and the previous row Dedup_nr = ‘2’ => we get the ‘pair’ => ‘SAME’
If the value Dedup_nr = ‘1’ and the previous row Dedup_nr = ‘1’ => the actual row is ‘not paired’ => ‘DIFF’
clover_compare_sol2_reformat.png
Please see code (not perfect, but may helps) in below
//#CTL2
string isSAME;
integer prevDEDUP;
string firstRow;
// Transforms input record into output record.
function integer transform() {
$out.0.* = $in.0.*;
if (isEmpty(firstRow) == true)
{
firstRow = "notfirstrow";
prevDEDUP = $in.0.dedup_nr;
if (prevDEDUP == 2)
{
$out.0.combine_logic = "SAME";
}
else
{
$out.0.combine_logic = "DIFF";
}
}
else
{
if ($in.0.dedup_nr == 2)
{
$out.0.combine_logic = "SAME";
}
else
{
if (prevDEDUP == 1)
{
$out.0.combine_logic = "DIFF";
}
else
{
$out.0.combine_logic = "SAME";
}
}
prevDEDUP = $in.0.dedup_nr;
}
return ALL;
}
Andras,
3rd solution, a bit different. Now i use the ‘Combine’ component.
I think its relative less used component, but can great if you like to compare two (or more) dataset, when you know some ‘primary key’ is the same in booth case. Example: you want to compare ‘old’ invoicing system invoice results with ‘new invoicing system’ results. You expect, the two has same amount of records, booth has the some unique ‘key’ value (eg. ‘Invoice Nr/Order Nr.’)
In below, i sort the two data by field:‘key’.
clover_compare_combine_outs.png
I hope its helps. Comparing data some case can be tricky (as may not 1:1, but N:M existing/not exisiting can be done.)
Andras,
And i forgot my favorite component: InterSection… So simple, and i forgot it… I love it… Less complex than the others…
clover_compare_intersect_graph.png
Anyhow, data comparing can be pretty simple, few component, and no coding.
For some reason, i’m a fan of the DeDup, and i use it even its may some case not the most straight forward.
Usually Intersection helps you out most of the cases.
I hope its helps (as it was mention: ‘Any help is appreciated! All the best!’).
All the bests!
Andras,
Hi Rodrigofrb and Andras,
first of all, I would like to thank Andras for the effort to provide a lot of different approaches to this data challenge. Your input is much appreciated. Let me make just a minor contribution to this topic as well, just to demonstrate another possible approach here. I have attached a simple graph to show how to compare 2 very similar flat files. The main idea is this:
-
I am using the Combine component to gather both data flows into a single record. Note that I don’t parse the flat files at all so both data flows are treated as a single field. This is more convenient in this case because the whole record on port 0 needs to coincide with the respective port 1 record entirely anyway.
-
Then, I am using the Reformat component with an if statement. Note the return statements that push the data to the desired output ports.
-
Alternatively, you can use the Filter component here in a similar fashion. Worth noting is that the condition is even simpler to design because the Filter component sends those records that don’t fulfill the filter condition to port 1 by default.
Kind regards,