Remove duplicate records and sorting in DEDUP?

achan · August 21, 2008, 12:00am

Hi,

How do i remove duplicate records without specifying all the fields? my record has a metadata of 2000 fields…
here is a subset of my input data, sorted by REFERENCE (primary key):

“REFERENCE”,“NAME”,“NO”
"000000010271 ","WFB ",“1”
"000000010271 ","WFB ",“1”
"000000010272 ","ABC ",“1”
"000000010272 ","ABC ",“2”

i want an output result like this:

“REFERENCE”,“NAME”,“NO”
"000000010271 ","WFB ",“1” (removed the duplicate)
"000000010272 ","ABC ",“1”
"000000010272 ","ABC ",“2”

i know i can use DEDUP and set the dedupKey=“REFERENCE;NAME;NO” to achieve my output, but if my input data has 2000 fields, i do not want to set dedupKey to 2000 fields, right? moreover, can dedupKey be set to such a long string? so, is there a way to tell CloverETL to remove duplicate records if i have 2000 fields to match?

i would think DEDUP would just need a flag, say remove_only_if_all_fields_matches, set to true and can reference the FMT for the list of fields… if values of each respective fields match, then it’s a duplicate and remove it… that way, DEDUP would not need the dedupKey to be set to a large number of field names… right?

just to make sure, DEDUP does not sort the records, right?

thanks,
al

achan · August 28, 2008, 11:33pm

anyone has any idea of a better solution than putting all 2000 fields in the “key”?

this is an urgent matter for me, so any help/suggestion would be greatly appreciated

al

avackova · September 4, 2008, 7:14am

Hello,
only idea I have is to use Partition instead of Dedup component: in partiotion function you can compare current record with previous


int getOutputPort(DataRecord record){
  if (record.compareTo(previous) != 0) {
     previous = record;
     return 0;
  }else {
     return 1;
  }
}

and then on port 0 you will have only distinct records.

achan · September 6, 2008, 7:19am

Thanks for the suggestion

I had to fix one thing: change
“previous = record;” to “previous = record.duplicate();”…

if not, the previous value will always be the current record since they are basically the same “pointer” or “address”…

Topic		Replies	Views
URGENT: remove duplicate records and sorting in DEDUP? CloverDX Platform	4	6	June 27, 2013
How to deduplicate information? CloverDX Platform	1	4	April 9, 2018
Exclude records that are duplicated CloverDX Platform	2	1	August 2, 2013
StringDataField and empty string CloverDX Platform	5	4	July 16, 2007
Best way to merge records with a common field into a list CloverDX Platform	1	10	May 10, 2021

Remove duplicate records and sorting in DEDUP?

Related topics