Memory excpectation for large table copy

Hi,

are there any special memory expectation (JVM heap size) for this kind of simple graph :

DB_TABLE_INPUT —> DB_TABLE_OUTPUT

where select from first node gives about 10 millions tuples ?

Thanks,
Franck

Hi,

I have investigated to find out where my out of heap space commes from and found that Postgesql jdbc driver has a few requirement to enable fetching based on cursors.

This is quoted from the postgresql jdbc doc :

"By default the driver collects all the results for the query at once. [snip]
Cursor based ResultSets cannot be used in all situations. There a number of restrictions which will make the driver silently fall back to fetching the whole ResultSet at once.

* The connection to the server must be using the V3 protocol. This is the default for (and is only supported by) server versions 7.4 and later. ====> OK
* The Connection must not be in autocommit mode. The backend closes cursors at the end of transactions, so in autocommit mode the backend will have closed the cursor before anything can be fetched from it.
* The Statement must be created with a ResultSet type of ResultSet.TYPE_FORWARD_ONLY. This is the default, so no code will need to be rewritten to take advantage of this, but it also means that you cannot scroll backwards or otherwise jump around in the ResultSet.
* The query given must be a single statement, not multiple statements strung together with semicolons. ====> OK

So Clover must :

  1. set Autocommit to false
  2. create a statement with TYPE_FORWARD_ONLY resultset.

In the code (org.jetel.data.parser.SQLDataParser.java), code relative to ResultSet type, concurrency and holdability is commented out. So I guess this is why Postgresql behaves poorly.

Is there a reason why this code is commented out ?

Thanks,

Franck

Hi David,

this is a good thing to hear, as at least in theory, Clover should scale well…

I nevertheless have a few problems… but from what you say, they must be Postgresql jdbc drivers problems. This results in :

Exception in thread “WatchDog” java.lang.OutOfMemoryError: Java heap space

during Initialization phase of Clover…

Hi !

If you get the latest version from CVS repository, it already contains the suggested fixes .
It will be part of the next release (1.1.8).
You may try to compile it yourself.

David.

Hi !

Clover in essence allocates only fixed amount of memory for particular graph topology (set of components).
It means that no matter how many records are you processing (1 or 100000) the amount of memory needed is fixed.

Nonetheless, you may slighly adjust memory requirements by increasing several internal buffers - but still this does not depend on how many records are processed.

Said that, there is actually one exception which specifically relates to your situation - JDBC drivers. Some of them (for example MySQL) try to read the whole table into memory when records are fetched - this may cause out-of-memory exception.
But even in this situation, there is usually way of preventing it by setting “fetchSize” parameter of DBInputTable component.

There have been no problems of this kind reported for PostreSQL so far.

David.