I am running a simple graph with "DB_INPUT_TABLE " node going to a “DELIMITED_DATA_WRITER_NIO” node.
I am using the default jdbc driver for mysql i.e. Connector/J. This runs’s fine if the number of records being pumped out is small. For large data sets, I see the memory being consumend go up to 512MB at which point I get javaOutOfMemory error. I am pumping about 300MB of data (~7 mil records) out.
I don’t have any issues with oracle. I could successfully do the above in oracle with good perfromance.
Thanks for your help.
Akhil
output from error graph—>
*** CloverETL framework/transformation graph runner… ***
Graph definition file: …/graph/akhil_graphDBUnloadFromMysql.grf
Initializing DB connection: jdbc:mysql://localhost/test … OK
[Clover] starting WatchDog thread …
[WatchDog] Thread started.
[WatchDog] Running on 2 CPU(s) max available memory for JVM 1257 KB
[Clover] Initializing phase: 0
initializing edges: all edges initialized successfully…
initializing nodes:
INPUTjava.lang.OutOfMemoryError
[Clover] WatchDog thread finished - total execution time: 73 (sec)
[Clover] !!! Graph execution finished with errors !!!
Failed starting graph !
setFetchSize to Integer.MIN_VALUE (i.e fetchSize=“-2147483648”) would work except for one little problem.
The property is set in the DBInputTable Component but is never passed on to the SQLDataParser which continues to use the default fetch size.
A simple fix woould be to fix the setFetchSize function in DBInputTable.
public void setFetchSize(int fetchSize){
this.fetchSize=fetchSize;
parser.setFetchSize(fetchSize);
}
This would also explain why setFetchSize doesn’t seem to work with other drivers as well.
P
The fetchSize is now set by DBInputTable within init() method just befor open() method of the parser is called - this way, the fetchSize() should be set for a new JDBC statement created internally by the parser:
public void init() {
//set fetch size (if defined)
if (fetchSize!=0) parser.setFetchSize(fetchSize);
// try to open file & initialize data parser
parser.open(this.graph.getDBConnection(dbConnectionName), getOutputPort(WRITE_TO_PORT).getMetadata());
}
This change will be part of clover starting with version 1.1.7 (it is currently in CVS).
Thanks, this helps. You are probably right, since oracle works fine!
Regards
Akhil
I can’t tell much to this as I don’t have experience with using Clover with MySQL.
I have used Clover with Oracle,PostgreSQL, Sybase ASA, MS SQL, MS ACCESS (through JDBC-ODBC bridge) and Firebird DB.
In generall, there is no massive memory allocation done by Clover when loading or unloading data through JDBC. This must be some problem in the mysql JDBC driver.
You mentioned Oracle - with oracle, you can use batch processing - “batchMode” option. It speeds up significantly loading of data into target tables.
David.
Hi,
I’ve also run into this issue when querying large data-sets using DB_INPUT_TABLE and mysql.
I couldn’t find an easy way to extend cloverETL to provide this support (without having to re-implement the rest of DB_INPUT_TABLE) so I’ve taking the interim step of patching a local copy of the source to use a different SQLParser instance that creates the statement using the mysql required options (based on a new parameter to DB_INPUT_TABLE)
Although this isn’t a problem with cloverETL, I wonder whether it would be possible to include a similar change, or re-factor the source to make such a customisation possible without needing a source modification? I’d be happy to assist with any changes / testing.
Regards,
Peter
Hi Parth
Thanks a bunch for your reply…I did not actually have any resolution! What you are saying makes a lot of sense!
Since I am working with DW, I decided to spit out results in a file and bulk load – really fast option!
Thanks
Akhil
Hello Peter and others !
As usually, I am ready to accept any patch/modification which helps prevent bad things from happening.
If you could send me the fix/change, I may try to fit it into Clover and ensure that it does not clobber other stuff.
David.
PS: Nonetheless the fetch size can be set with the new release of Clover. Although it does not
work consistently with all JDBC drivers.
Akhil,
I don’t know if you ever resolved this one. I ran into the problem the first time I used Clover with a 1Gb file.
Turns out the problem _is_ with the MySQL JDBC driver. From the MySQL Connector/J documentation:
--------------------------------------------
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to ‘stream’ the results back one row at-a-time.
To enable this functionality, you need to create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
----------------------------------------------
Unfortunately, the JDBC driver doesn’t implement the setFetchSize call correctly.Apparently the call has little effect (except to set the fetch size to fetch one row at a time) in JDBC drivers earlier than version 3.2.1. Version 3.2.1 is currently in alpha. The fix mentioned above does work, though at the cost of speed.
P