Finding difference between two datasets

Support/help with CloverETL (4.9) and CloverDX (5.0 or newer) implementation problems

sshrestha
Posts: 15
Joined: Tue Jul 11, 2017 6:50 pm

Finding difference between two datasets

Postby sshrestha » Thu Mar 22, 2018 5:37 pm

Hi,

I am not sure how to achieve what I am trying to do. Could you please point me to the right direction.

Let's say I have a table in database called products
---------------------------
# | Name | Quantity |
---------------------------
1 | Bat | 10 |
2 | Ball | 5 |
3 | Chair | 3 |
---------------------------

I get data from a source with following information.
---------------------------
# | Name | Quantity |
---------------------------
1 | Bat | 5 |
2 | Ball | 5 |
3 | Table | 6 |
---------------------------

I want to update my database with this new information where,
    1. if name matches then update ( Update quantity of Bat row)
    2. if its a new name then insert (Insert Table row)
    3. if name is not present then delete. (Delete Chair row)
    4. if information hasn't changed, do nothing (Do not modify Ball row)

I guess I am having trouble finding the difference in the data I have and data that is incoming. Extracting and loading data to database should be straightforward. What are the components that could help me achieve this? I would appreciate any suggestion you could give me.

Thank you,
Shubha

jandikovae
Posts: 41
Joined: Fri Nov 04, 2016 8:51 am

Re: Finding difference between two datasets

Postby jandikovae » Wed Mar 28, 2018 2:11 pm

Hi Shubha,

If I understand it correctly, you need to update the database to whatever is in the second table, is that right? If you don't need to keep track of the changes being done to the database table, I would suggest you delete the table rows and replace it with the new data. It is possible to delete all rows in a table without deleting the table. This means that the table structure, attributes, and indexes will be intact. In MySQL you can use the following syntax in a DBExecute component:

Code: Select all

DELETE FROM table_name;


However, if you would like to also control what has been changed, or update the quantity differently (for example add the new value to the original one), I would like to suggest you a component called DataIntersection. This component intersects data from two inputs and processes them to three output ports as follows: Such input records that are on both the input port 0 and input port 1 are processed according to the user-defined transformation and the result is sent to the output port 1. Such input records that are only on the input port 0 are sent unchanged to the output port 0. Such input records that are only on the input port 1 are sent unchanged to the output port 2. Please note that the component requires sorted data. Based on your example tables I have created an example graph (see the graph attached, I have also included input data that fit the graph's metadata).
In this example, I have set the transformation within the DataIntersection component to sum the quantities from both inputs if the join key matches (the key, in this case, is Name field). Yet, you can, of course, update the transformation to whatever meets your needs (like send the record to the output unchanged as well).

Please take a look and let me know if you have any additional questions or concerns.
Have a nice day, Eva
Attachments
data2.txt
Input data second table
(30 Bytes) Downloaded 114 times
data1.txt
Input data first table
(31 Bytes) Downloaded 114 times
compare_graph.grf
ETL graph
(3.8 KiB) Downloaded 123 times
---
Eva Jandikova
CloverCARE Support
CloverDX

Visit us online at http://www.cloverdx.com