Tweaking Memory Settings in Clover 3.3.0?

anweston · April 16, 2013, 12:00am

Heya,

We are trying to upgrade from Clover v3.0.1 to Clover v3.3.0 and we’re bumping into issues with our I/O and memory intensive graphs (e.g., lots of source files, many sorts, etc). We had gotten around it in v3.0.1 with:

1. Setting Record.MAX_RECORD_SIZE = 65536
2. Setting DEFAULT_INTERNAL_IO_BUFFER_SIZE = 131072 (2*MAX_RECORD_SIZE as recommended in defaultProperties)
3. Having all our “Source” nodes (Readers, Joins, Dedup, Sorts, etc) in one phase and the “Target” (Transform, Sort, Writer) in another phase. This seemed to release all the memory held by the “Source” nodes after the phase completed and allowed the target output to be generated.

We are now using the new defaultProperties (RECORD_LIMIT_SIZE, RECORD_INITIAL_SIZE, etc) with default values, except that we set DEFAULT_INTERNAL_IO_BUFFER_SIZE = 131072 as before. Now these graphs are (1) never completing because they’re stuck in a garbage collection loop that gets *just* enough to keep it going at a very slow pace, (2) running a really long time before they complete, or (3) running a really long time before they run out of memory and die.

The JVM has the settings of -client -XX:MaxPermSize=1024m -Xmx1024m -Xms512m.

When we take a heap dump and analyze it, the Sort nodes seem to be the biggest culprit; at one point the single target sort node was consuming ~85% of all memory.

Do you have any general suggestions/guidelines on how to tune RECORD_LIMIT_SIZE, RECORD_INITIAL_SIZE, DEFAULT_INTERNAL_IO_BUFFER_SIZE, etc. for larger data sets?

Thanks,
Anna

imriskal · April 17, 2013, 7:49am

Hello, Anna,

Are you using FastSort or ExtSort components? Did you modify their properties somehow? How many sort components do you have in your graph? Thank you in advance for your answers.

Best regards,

anweston · April 17, 2013, 2:56pm

Heya,

We are using ExtSort components. We are not modifying their properties that I know of - typically they are defined in the graph like so:

There are 51 in the “source” phase and 1 in the “target” phase.

Thanks,
Anna

anweston · April 17, 2013, 7:53pm

Heya,

As a further follow-up, we had another example where it fails within 5 minutes with an out-of-memory. It has only two sorts - one in the source phase and one in the “target” phase. The “target” sort is the one failing and the stack trace is:

2013-04-17 09:15:02,529] FATAL - java.lang.OutOfMemoryError: Java heap space
[2013-04-17 09:15:02,530] ERROR - Graph execution finished with error
[2013-04-17 09:15:02,530] ERROR - Node SORT_1 finished with status: ERROR caused by: Java heap space
[2013-04-17 09:15:06,361] ERROR - Node SORT_1 error details:
java.lang.OutOfMemoryError: Java heap space
at org.jetel.data.StringDataField.duplicate(StringDataField.java:104)
at org.jetel.data.DataRecord.duplicate(DataRecord.java:113)
at org.jetel.data.InternalSortDataRecord$DataRecordCol.put(InternalSortDataRecord.java:309)
at org.jetel.data.InternalSortDataRecord.put(InternalSortDataRecord.java:158)
at org.jetel.data.ExternalSortDataRecord.put(ExternalSortDataRecord.java:127)
at org.jetel.component.ExtSort.execute(ExtSort.java:221)
at org.jetel.graph.Node.run(Node.java:465)
at java.lang.Thread.run(Thread.java:662)
[2013-04-17 09:15:20,924] ERROR - !!! Phase finished with error - stopping graph run !!!
[2013-04-17 09:15:21,615] ERROR - Execution of graph failed !

I did notice that this target had 4,221 fields. When I tested removing fields to a bare minimum (in this case, 33 fields), I no longer got the error. I am working to refactor our code, but this graph did not fail in the older version of Clover and in theory all 4,221 fields might someday be in use. I thought I’d post this in case it might give you any ideas of how we can tune the memory settings. I appreciate any assistance you can give me - I know that these sorts of nebulous issues can be difficult to track down…

Thanks,
Anna

imriskal · April 19, 2013, 12:27pm

Hi, Anna,

It is quite surprising that you were able to successfuly run graphs with 51 ExtSort components or with records with 4221 fields with 1GB of Xmx. But after quite a long testing we have not found any difference in the versions 3.0 and 3.3 in this matter. Memory requirements should be pretty much the same.

Now you have several options:
1. It would be the best if you could raise Xmx for your JVM. The more the better. If you do not have enough free memory, you can push down limit for PermSize which has usually lower requirements than Xmx. In your case, you would set 128MB or 256MB PermSize and the rest can go to Xmx.
2. You can try to optimize your graphs as much as possible to lower the memory requirements. For example by splitting the functionality into a few smaller graphs. Jobflow functionality presented in the version 3.3 should help you with this task, if you have also CloverETL Server. You can also try to split ExtSort components into more phases.
3. You can simulate settings set in 3.0 in the new version. Version 3.3 contains a new functionality - dynamic buffers. If you know that your biggest input record has e.g. 100 kB, you can set record_initial_size to 100 kB. This way, there is no need for dynamic change of maximal limit for buffer. It would be some kind of partial substitution for max_record_size from version 3.0, because there was also no option for dynamic change. Or you can set record_limit size to 100 kB instead of change of record_initial_size. Buffers can grow up to 100 kB this way so it is also some kind of substitution for max_record_size, because, like in 3.0, no record can be bigger than 100 kB. But you have to be sure that no record exceeds the limit. It would cause some troubles otherwise.

I hope something from above will help you.

Best regards,

anweston · April 23, 2013, 3:34pm

Heya,

Just thought I’d post an update - Splitting up each source into its own phase (which broke up the sorts) got the graph to complete, so I have made progress. Now I’m tweaking the memory settings and looking to see if we added anything that would account for the time difference. The graph with 51 ExtSort components (and one “source” phase and one “target phase”) takes 5 hours on our PRODUCTION box - the new split up graph takes 8 hours on PRODUCTION.

We have found that it helps to have a physical server (as opposed to a virtual machine) with very good I/O to run these sorts of things. In the past, things that take quite a long time on our STAGING box (which is a VM) would complete in much shorter time on our PRODUCTION box. With Clover v3.3.0, I’ve found that STAGING and PRODUCTION are taking about the same time.

I will post again if I find out anything else that improves our performance in case it helps someone else!

Thanks,
Anna

imriskal · April 24, 2013, 7:08am

Hello again,

Thank you for sharing your results. Mentioned time difference can be explain very easily. Components running in the same phase can run simultaneously, components running in separate phases have to wait until the previous phase is over. Generally speaking, lower memory requirements mean higher time requirements and vice versa.

Best regards,

anweston · May 2, 2013, 12:01am

Heya,

I have another update and a question.

We found our sweet spot was splitting that graph to where we had 10 inputs (and their various SORT nodes) per phase and then the target in its own phase.

To recap:

Our PRODUCTION box
With 1 input phase and 1 target phase - v3.0.1 was 5 hours, v3.3.0 never finished
With each input in its own phase and 1 target phase - v3.3.0 finished in 8 hours
With 10 inputs (and their various SORT nodes) per phase and 1 target phase - v3.3.0 finished in 5 hours

So we are now getting comparable values with PRODUCTION, but I had an additional memory vs performance question. I was looking at the heap dump for a v3.0.1 run vs a heap dump for a v3.3.0 run. In both cases, it was in our target phase and the final ExtSort (the only SORT node in the target phase) was the biggest memory user. But the v3.0.1 ExtSort footprint wasn’t as big as the v3.3.0 and it seemed to be for 2 reasons:

1. DataRecord is set for 2000 instances in v3.0.1 and 8000 in v3.3.0 (InternalSortDataRecord.DEFAULT_INTERNAL_SORT_BUFFER_CAPACITY property?)
2. StringDataField now uses CloverString instead of StringBuilder

With (2), it seemed to cause a bigger footprint because StringBuilder’s interal char seems to always be exactly the size of the contained string, while CloverString’s char is often bigger. It’s particularly noticeable when a field is null. In v3.0.1, I see a char[0] in my null StringDataField fields; in v3.3.0, I see a char[16]that’s populated with all “/u0000” (aka “null”). In another example in v3.3.0 I have a DataField with a char[82] where there are only 51 characters in the value for one instance and and char[64] with a value of 50 characters in another instance. Is there a reason why this CloverString does this? Was it a trade off for speed?? Would this affect performance? When I started this exercize, our targets had 4,000 fields with most of them null (we’ve managed to cut out most of the null fields for most of our graphs). Would this cause a memory/performance issue between v3.0.1 (where all those nulls would be char[0]) and v3.3.0 (where all those nulls would be char[16])?

If that answer is yes and we should try to cut down on the size, is there a way to cut down the size of the CloverString - it looks like the only way a CloverString’s size can be fixed is if my metadata is for a fixed length record? Although our target metadata is delimited, for this SORT we also know the maximum length of all the fields because our target transform makes sure that the field values are no longer than certain values. For example, I have several fields that I know will never be longer that 3 characters - but each of 'em is a char[19] when sorting. I guess I’m asking if limiting the footprint of the DataField elements (DataField used by Data Record used by InternalSortDataRecord used by ExternalSortDataRecord used by ExtSort - whew!) would help us get a boost or if I am misinterpreting what I see…

Thanks,
Anna

mzatopek · May 7, 2013, 1:53pm

Your observation about CloverString is interesting, since the CloverString is almost deep copy of formerly used StringBuilder. See the implementation, we added just couple new methods, but algorithm of memory allocation has been preserved. Even StringBuilder use 16 chars array to represent empty string. Are you really sure about your observation?

In my opinion I would guess that the higher memory footprint is caused by increase of InternalSortDataRecord.DEFAULT_INTERNAL_SORT_BUFFER_CAPACITY variable. This variable has been really increased from 2000 to 8000, which means now 8000 data records is sorted in memory, which has to significantly enlarge memory footprint and of course improves overall performance. In case you are not satisfied with this change of default value, just use ExtSort attribute ‘bufferCapacity’ to setup your preferred value.

Martin

anweston · May 8, 2013, 5:18pm

Heya,

I went back to look about CloverString vs. StringBuilder - it took me a while and I’m not entirely sure of usage, but I think the difference is:

v3.0.1
-–


private StringDataField(DataFieldMetadata _metadata, CharSequence _value){
   super(_metadata);
      this.value=new StringBuilder(_value.length());
      this.value.append(_value);
}

This constructor (which is used when duplicating fields), will create a char exactly the size of the value. So if I have no value, it will create a char[0] - and if I never set that field it will always be char[0]. That seems to be why my char[3] fields always remain char[3] - because it never needs to expand capacity.

In v3.3.0
-–


private StringDataField(DataFieldMetadata _metadata, CharSequence _value){
   super(_metadata);
   this.value = new CloverString(_value);
}


public CloverString(CharSequence seq) {
   this(seq.length() + 16);
   append(seq);
}

From this code, when it duplicates the field my empty fields will be char[16] and my char[3] fields will always be char[19].

I’m playing around with reducing InternalSortDataRecord.DEFAULT_INTERNAL_SORT_BUFFER_CAPACITY - we have a couple of our test counties that are very wide that are failing on SORT nodes. I’ll update if this gets us around the issue.

Thanks,
Anna

mzatopek · May 9, 2013, 9:03am

Good point Anna,

this change could really easily increase memory footprint of our sorters. Duplicated string field should be optimized for memory usage instead of data manipulation.

See new issue https://bug.javlin.eu/browse/CLO-761

Thanks a lot for your investigation, good work.

anweston · May 13, 2013, 8:02pm

Heya,

I can see you have this item marked as fixed for the next release of Clover - do you have a timeline for that release or can I get a patched v3.3.0 so I can test with it? We were able to get the graphs that were failing to finish by reducing InternalSortDataRecord.DEFAULT_INTERNAL_SORT_BUFFER_CAPACITY back to 2000 (the default in v3.0.1) from 8000, but the one that has a lot of null fields is taking almost an hour more to run. This is just one of the avenues I’m exploring - we discovered that there is a version difference in our Linux OS that might affect the running times, but I’m trying to chase down every lead.

Another note on CloverString vs. StringBuilder (one that may-or-may-not cause performance issues at these higher boundary graphs), I noticed that CloverString tends to ‘reset’ or ‘clean’ its buffer (full it up with ‘/0’) whenever the values changes - StringBuilder does not do that. For example,


StringBuilder temp = new StringBuilder("hi there");
temp.setLength(3);
temp.append("hi");

shows the internal char of StringBuilder looking like:


[h, i,  , t, h, e, r, e, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0]
[h, i,  , t, h, e, r, e, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0]
[h, i,  , h, i, e, r, e, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0]

as you step through the code. From what I can see of CloverString, this would be:


[h, i,  , t, h, e, r, e, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0]
[h, i,  , /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0]
[h, i,  , h, i, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0, /0]

It does make debugging a little easier - it was sort of distracting when I was looking at StringBuilder items that are supposed to be null and seeing “junk” in them when inspecting the objects, but could there be some overhead with doing the cleanup when we’re talking large datasets? Just thought I’d point it out - you’d probably have to do a lot of tests to see if it make a difference with large data.

I really appreciate all the assistance. We really like Clover and this sort of memory/processing analysis is always the hardest…

Thanks,
Anna

mzatopek · May 14, 2013, 8:05am

Hi Anna,

the Clover 3.4 is actually done and will be released this week. Backport into 3.3 version is not planned, at least for now. Regarding your CloverString behaviour observation, you scary me a bit This unnecessary backed char array cleanup could definitely harm overall performance. As I said, the CloverString is actually deep copy of StringBuilder so approach in setLength() method should be identical. I had a look at the code of this method and it seems to be same. I also debug the suggested code snippet with CloverString instance and I didn’t notice the reported behaviour. Can you review your observation and/or send me how to reproduce it?

Thank you very much for your digging into this, every suggestion in this matter can only help our product. Your suggestions and work are really appreciated.

Martin

anweston · May 14, 2013, 3:35pm

Heya Martin,

I took another look back and it looks like I misinterpreted the CloverString behaviour - closer examination shows it was mimicking StringBuilder and I got mixed up with fields that were all longer than the original value in the field. Sorry for the worry. :oops:

If Clover v3.4.0 will be released this week, that will be great for me. We pulled the Clover upgrade out of our release going next week, so that gives up time to test with Clover v3.4.0 for our next release.

I’m glad if my reasearch can be of assistance - we really like Clover as a product and think it works very well. You’ve been very kind to help us with the memory issues we’ve been having with our larger graphs. They are always the trickiest to solve! That’s why we’re looking into things such as OS (direct buffers having a bit of a reliance on the OS), Java configuration, etc and not just assuming it is Clover causing the slowness.

Thanks,
Anna

Topic		Replies	Views
Large, Wide File Performance hints? CloverDX Platform	3	5	July 3, 2013
Clover exceeding Java max memory using CloverDX Platform	8	24	July 19, 2011
Issue upgrading from 3.0.1 to 3.3.0 CloverDX Platform	9	2	March 15, 2013
CloverETL Engine 2.2 slower than 2.1.3 CloverDX Platform	14	14	September 26, 2007
Graph Using More Memory With Each Run CloverDX Platform	5	9	May 14, 2013

Tweaking Memory Settings in Clover 3.3.0?

Related topics