Heya,
We are trying to upgrade from Clover v3.0.1 to Clover v3.3.0 and we’re bumping into issues with our I/O and memory intensive graphs (e.g., lots of source files, many sorts, etc). We had gotten around it in v3.0.1 with:
1. Setting Record.MAX_RECORD_SIZE = 65536
2. Setting DEFAULT_INTERNAL_IO_BUFFER_SIZE = 131072 (2*MAX_RECORD_SIZE as recommended in defaultProperties)
3. Having all our “Source” nodes (Readers, Joins, Dedup, Sorts, etc) in one phase and the “Target” (Transform, Sort, Writer) in another phase. This seemed to release all the memory held by the “Source” nodes after the phase completed and allowed the target output to be generated.
We are now using the new defaultProperties (RECORD_LIMIT_SIZE, RECORD_INITIAL_SIZE, etc) with default values, except that we set DEFAULT_INTERNAL_IO_BUFFER_SIZE = 131072 as before. Now these graphs are (1) never completing because they’re stuck in a garbage collection loop that gets *just* enough to keep it going at a very slow pace, (2) running a really long time before they complete, or (3) running a really long time before they run out of memory and die.
The JVM has the settings of -client -XX:MaxPermSize=1024m -Xmx1024m -Xms512m.
When we take a heap dump and analyze it, the Sort nodes seem to be the biggest culprit; at one point the single target sort node was consuming ~85% of all memory.
Do you have any general suggestions/guidelines on how to tune RECORD_LIMIT_SIZE, RECORD_INITIAL_SIZE, DEFAULT_INTERNAL_IO_BUFFER_SIZE, etc. for larger data sets?
Thanks,
Anna