Clover exceeding Java max memory using

Hi,

We executing a Clover graph with many edges and using the defaultProperties to override the default Clover settings. The Clover graph fails to execute because it runs out of memory. Before the Clover graph is executed, we have over 6 GB of free memory available. As the Clover graphs is started, the free memory goes down and eventually the graph stops because it has run out of memory, even though we’ve set the max Java heap setting to 3 GB.

Here’s the command that I’m using to execute the graph:

java -Xmx3072m -Xss2048k -Xmso2048k -XX:+UseParallelGC -server -Djava.io.tmpdir=clover/graphs/data-tmp -classpath … -Dclover.home=“cloveretl” org.jetel.main.runGraph -cfg clover/graphs/workspace.prm clover/graphs/graph/test.grf -plugins cloveretl/plugins -config clover/defaultProperties -noJMX -v -loglevel TRACE -tracking 600 -P:fileInput=data/samples.dat

I’ve also attached the defaultProperties file that is being used.

Why does the Clover graph exceed the Java max memory setting of 3072m (3 GB)? How do we limit the Clover graph not to exceed using more than 3 GB of memory?

Thanks,
Ken

Hello Ken,
you probably have components, that requires a lot of memory. There is only few general advices how to decrease memory requirements. First of all try to not use external jars in jdbc, jms connections - rather add them to the classpath when running graph. Also some components are memory gluttons and often can be replaced by others, that don’t require so much memory (e.g. HashJoin vs. MergeJoin). Some components can be adjusted by proper attributes values (see eg. ExtSort vs. FastSort – which one is better for me?). But, when you suspect a graph to require too much memory, it requires an analysis for this concrete graph.

Based on the log output attached, is there anything that sticks out that is eating up all the memory?

Can attach the log again? No attachment is visible in the post.

Clover logs attached, including vmstat logs showing memory utilization.

Hello Ken,
the main problem in your graph is with huge amount of buffered edges. Each buffered edge allocates two buffers of Record.MAX_RECORD_SIZE * 10 size, and Record.MAX_RECORD_SIZE is 12kb (in CloverETL 2.9, in. 3.1 it is 64kb by default). Each direct edge allocates two buffers of Record.MAX_RECORD_SIZE * 4 size. So I would advice to use rather direct edges, than buffered and decrease the Record.MAX_RECORD_SIZE in default properties file (see Changing Default CloverETL Settings).

Thanks Agata. Would using direct edges change the functionality of the graph in any way compared to the buffered edges? i.e. Does it affect grouping and sorting, etc?

--Ken

Hello Ken,
you can even use directFastPropagate edges, that allocates buffers for one record only. If the buffered edge is required, CloverETL uses it independently what type is set by user.

Thanks for your help, Agata.

--Ken