I am working with large amount of data ( web data). When I load into fact table, I would like to check if the row is already there before I insert. I use incremental load so there is no chance of having duplicates. Only when the graph fails for some other reason and I need to rerun , I should be able to run from the point where it failed. Using intersection slows down the process quite a bit. Fact has millions of rows already. I was thinking if there is a way to use a transformation where I update the row if there is a duplicate. I will appreciate if you can give me some idea on how to avoid intersection and still accomplish the gal.
Thanks,
Kasturi
Hi,
since you need to solve only the rerun after graph fails, I believe it shouldn’t occur too often.
You can avoid intersection and try to insert all input records. If the record already exists, DBOutputTable can’t insert it again and rejects the record.
Rejected record is sent to the output port 0.
(please don’t use DBOutputTable in batch mode, otherwise all records in the batch would be evaluated as rejected).
You may also connect the output port 0 to next DBOutputTable which may update the existing record.
Best regards,
Martin