Heya,
I am looking at various ways (small and big) to tune the performance of our Clover runs. We are using the runGraph class manually for our tests. Are there any general suggestions for optimization? This is what I’ve looked at so far:
1. Refactoring our graph (we have some nodes that could be combined).
2. Making sure the JVM is in server mode (Note: I tried adding the -XX:+UseParallelGC flag, but it seemed to slow it down a little or at least not help depending on the run).
3. Using runGraph’s switches “-loglevel OFF”, “-noJMX”, and “-nodebug” to turn off any tracing.
4. Changing the defaultProperties size/buffer attributes to larger values.
5. Bumping up the amount of RAM allocated to the JVM (which doesn’t seem to do much, even when I doubled it - should I set the min and max to the same value to reduce garbage collection?).
6. Bug our system admins about allocation on the box (We’re using a 4 CPU box, but I think we’re only utilizing 2 CPUs at most).
Thanks!
Anna
Hi !
From your list of optimization items, number 1) and 2) should do the most. The “ParallelGC” is useful only if you have enough CPUs. Since you mentioned that only 2 are available and your graph is probably, complex, the option may not help. Increasing sizes of internal buffers usually does not help too much. As for giving your JVM more memory - that will help only if you are sorting data in your transformation and need to increase the size of buffer sort component for in-memory sorting.
Some general rules which you may apply during the graph refactoring:
-
when parsing data, try to convert them immediately into the “native” type - int, long, date, etc.
-
process only data fields which you need - i.e. if you have on input 15 fields, but only 6 are needed down the road, drop the rest as soon as possible (using Reformat, for instance)
-
if you are sorting data, then make sure you give the sorter enough memory (so it can sort as much data in memory without swapping to disk)
-
first filter, then sort (not the opposite way as we see)
-
prefer hash joins, unless your input data are already sorted according your join key
Heya,
Thank you so much for your reply! I will keep your tips in mind when refactoring the graph - I really appreciate the help…
Anna