I am trying to copy data from one DB to another, but DB_OUTPUT_TABLE is taking about 30 mins for 10,000 records.
Total data size is 200 GB.
Although extracting data is really fast 47s for about 10000 records and a few min for 3.5GB extract(.txt).
I should be able to do this data transfer in about a 3-4 hours.
Can some one please help me on this??
Dear mohit3688,
Could you please provide more information about your issue?
- Which version of CloverETL do you have?
- Which DBs in which versions do you have?
- How do the records look like? Could you provide also a sample of your data as well as metadata used (or the whole graph, if possible)?
- How much memory did you assign to your graph?
- Does the size 200GB apply for all your records or for the mentioned 10.000?
Thank you in advance.
Kind regards,
- I am using clover 3.1.2
- Both the DBs are Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production.
- About DB
- about 300 tables
- lots of joins between tables, some have upto 10.
- no. of records in table vary from few 100 to tens of millions, one I know has 75 mn records.
- Have attached sample graph.
- I did not allocate any memory explicitly to the graph.
- 200 GB is the all the data that needs to be transferred
- I started out with top 10,000 records of a single table for testing purpose only.
To get an estimate of time.
Dear mohit3688,
Let’s simplify it for now. When we find out the bottleneck, we can continue with some more advanced situations.
- We can ignore joins as they are not part of the graph.
- You are writing into one table in your example graph so we can again ignore the rest of the tables.
- We have 10.000 records taking 30 minutes for the writing. We can ignore the rest of the records.
- DBInputTable works fine according to your description so we can ignore it. Just read the first 10.000 records, save them into csv file and then use UniversalDataReader (UDR) and read this file.
So now, we have your simple graph with UDR reading csv file consisting of 10.000 records and saving the records into one db table.
Things that may help: (Please try them one by one to see, which one helped you in case of any improvement.)
- Assign as much memory as possible to your graph. It will come handy especially with all 200GB of your data. Run → Run Configurations… → -Xmx
- Set batch mode to true with Batch size e.g. 1.000 and Commit interval e.g. 5.000 records.
- Run the same graph in our latest version 3.5.M1 evaluation version to see whether we fixed some bug preventing you from quicker processing.
- Run the same graph and save the data into some other db, e.g. MySQL.
Additional questions:
- Do you have also CloverETL Server (If so, how much memory did you assign to it?) or you have just a local project in Designer?
- How big is the file with 10.000 records in terms of MBs?
- Which version of Oracle JDBC driver are you using?
- How much time took run of the graph with each suggested improvement?
- Can you watch the graphs in jvisualvm during the tests? Are there any unusual spikes or values?
Thank you.
Kind regards,