Comparing records

Hello,
i recently discovered a little problem/challenge/opportunity(?) with an existing graph I wrote. We have some account data arriving in a flat file where the account numbers are shared potentially between multiple individuals. I thought the sortWithinGroups component would be ideal…I could group by account number and sort by last_name and first_name within the same account number, but I need to find a way to distinguish between the number of individuals within a group so sorted…Basically I need a counter of some kind added which will keep track of each individual within a given group for later reference and granularity…There isn’t anything in the input data to identify the number of individuals in the same group in a given file…Sample input

Account#, Last name, First name, Order_amount, etc…
123, Smith, John, 1000,…
123, Smith, Mary, 1000,…
123, Jones, Davey, 50,…
456, Bird, Larry, 10,…
456, Rogers,Ginger, 2000,…
789, Crosby,Bing, 300,…

The files will potentially have 1000’s of records

Needed output
123-1, Jones, Davey,50,…
123-2, Smith, John,1000,…
123-3, Smith, Mary, 1000,…
456-1, Bird, Larry,10
456-2, Rogers, Ginger, 2000,…
789-1, Crosby, Bing,300,…

All ideas are greatly appreciated,
Thanks
--Matt

Hi Matt,

I think that your case is pretty specific so it is not necessary to make a new component for it. You can achieve the same result with a combination of ExtSort and Reformat. ExtSort is capable of sorting by more than one key and Reformat takes care of adding the unique account id. Sort key for ExtSort will be Account(a);Last_name(a);First_name(a) and Reformat will have following CTL code:


string account = '';
string last_name = '';
string first_name = '';
integer counter = 1;
boolean first_record = true;

function integer transform() {
	if ($in.0.Account == null || $in.0.Last_name == null || $in.0.First_name == null || $in.0.Order_amount == null) {
		raiseError('Wrong input file. Some value contains NULL.');
	}
	
	if (first_record) {
		account = $in.0.Account;
		last_name = $in.0.Last_name;
		first_name = $in.0.First_name;
		first_record = false;
	}
	
	if (account == $in.0.Account && last_name == $in.0.Last_name && first_name == $in.0.First_name) { // account number and person are the same as in the previous record
		$out.0.Account = $in.0.Account + '-' + counter;
	} else if (account == $in.0.Account && !(last_name == $in.0.Last_name && first_name == $in.0.First_name)) { // account number is the same but it is a different person
		counter++;
		$out.0.Account = $in.0.Account + '-' + counter;
	} else { // different account number
		counter = 1;
		$out.0.Account = $in.0.Account + '-' + counter;
	}
	
	$out.0.Last_name = $in.0.Last_name;
	$out.0.First_name = $in.0.First_name;
	$out.0.Order_amount = $in.0.Order_amount;
	
	account = $in.0.Account;
	last_name = $in.0.Last_name;
	first_name = $in.0.First_name;

	return OK;
}

However, this is just an example working for data you provided. Real application should use some unique identifier, not the first_name and last_name combination.

Best regards,

Thanks for the info sir!

--Matt

Hi,

more generic solution may look like:


//#CTL2

string currentUuid;
input_csv lastRecord;

string[] checkedFields = ["Account", "Last_name", "First_name"];

function boolean equalRecords(input_csv a, input_csv b, string[] fields) {
	foreach (string field : fields) {
		if (isNull(a, field) || isNull(b, field)) {
			return false;
		}
		
		if (compare(a, field, b, field)!=0) {
			return false;
		}
	}
	
	return true;
}

function integer transform() {
	if (!equalRecords(lastRecord, $in.0, checkedFields)) {
		currentUuid = randomUUID();
	}

	$out.0.* = $in.0.*;
	$out.0.Account = currentUuid;

	lastRecord = $in.0;

	return OK;
}