Max_record_size

Heya,

I am using a source file where one of the fields has up to 750 characters. When this field has data that uses all 750 characters and there is data in the other fields, I am getting a MAX_RECORD_SIZE error and the graph fails when it tries to write the file. I looked on the wiki, and I *think* this makes sense because 750*16bits per character = 12000 - not much under 12288 so the other data will make it over the limit. So:

  1. Am I doing the math right?
  2. If yes, why is the data able to be parsed into memory, but not written?
  3. If yes, why didn’t I get a FIELD_BUFFER_LENGTH error since it is over 4096?

The wiki says “You may change the values stored in the file [defaultProperties] by using any plain-text editor. In order to do it, just unzip/un-jar the clover.engine.jar file, modify the defaultProperties resource file and return it to the archive.”

  1. Is there any other way to do that (command line, classpath, etc)? I’d hate the make a change in the jar and then forget about it the next time I upgrade!

  2. Are there any performance implications for upping the limits? I see the recommended max is 64K - will this cause a slowdown?

Thanks!
Anna

Hello,
you are right: the graph should fail during reading already.
Could you send the error stack? There is hard to say anything without it.
If you want to change default properties, but don’t want to change the jar file, you can create your own defaultProperties file and use -config switch when running graph.

Heya,

Thank you for your reply! I will try the config switch.
This is the error stack (snipped for brevity’s sake - it doesn’t really tell you much):

INFO [main] - *** CloverETL framework/transformation graph runner ver 2.5, (c) 2002-06 D.Pavlis, released under GNU Lesser General Public License ***
INFO [main] - Running with framework version: 2.5 build#devel compiled 16/09/2008 16:56:32
INFO [main] - Running on 2 CPU(s) max available memory for JVM 1040512 KB
.
.
WARN [main] - [LOG_EXACT_DUP_0] - Component is of type FIXLEN_DATA_WRITER, which is deprecated
WARN [main] - [LOG_EXACT_DUP_1] - Component is of type DELIMITED_DATA_WRITER, which is deprecated
WARN [main] - [OUTPUT_0] - Component is of type DELIMITED_DATA_WRITER, which is deprecated
WARN [main] - [OUTPUT_1] - Component is of type FIXLEN_DATA_WRITER, which is deprecated
.
.
ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node TRANSFORM_0 finished with status: ERROR caused by: The size of data buffer is only 12288. Set appropriate parameter in defautProperties file.
DEBUG [WatchDog] - Node TRANSFORM_0 error details:
java.lang.RuntimeException: The size of data buffer is only 12288. Set appropriate parameter in defautProperties file.
at org.jetel.data.NumericDataField.serialize(NumericDataField.java:524)
at org.jetel.data.DataRecord.serialize(DataRecord.java:453)
at org.jetel.graph.DirectEdge.writeRecord(DirectEdge.java:238)
at org.jetel.graph.Edge.writeRecord(Edge.java:342)
at org.jetel.graph.Node.writeRecord(Node.java:705)
at org.jetel.component.Reformat.execute(Reformat.java:202)
at org.jetel.graph.Node.run(Node.java:379)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.nio.BufferOverflowException
at java.nio.Buffer.nextPutIndex(Buffer.java:425)
at java.nio.DirectByteBuffer.putDouble(DirectByteBuffer.java:918)
at org.jetel.data.NumericDataField.serialize(NumericDataField.java:522)
… 7 more
ERROR [WatchDog] - !!! Phase finished with error - stopping graph run !!!
~

Because we have some special input handling (variable length rows, custom quote characters, etc), we do initially parse the input file as a single field per line, then divy up the contents amongst the fields for the next component. I’d still think that this field should still cause an error since it’s over 4096, but maybe it’s only a problem when you try to write it out and as long as it is held in memory it is ok? I can send the graph file if you are interested. For me, as long as I can bump up the memory and not cause performance issues, I’m happy…

Thanks again,
Anna :slight_smile:

Hello,
from the stack trace I suspect your Reformat output metadata to be wrong. The error arises during serialization of NumericDataField, that is produced by Reformat component. Try to print out the suspected value while it could cause the problem.

Heya,

I will try and print it out when I get the chance - the error happens after the transform() method returns true and DEBUG mode doesn’t return the field causing the issue (maybe that info should be added to the exception message??). Since we’re not trimming fields automatically, I threw a trim on the 750 character field (which mostly had whitespace at the end), and there was no error thrown. I also increased the max record and field size and the issue went away. Perhaps writing out the 750 character field put it over the max, and writing the next field (which is numeric) caused the complaint?

You suggestion makes me wonder, though. We are intially parsing this file as a single field, which with this 750 character field is waaaay over the 4096 FIELD_BUFFER_LENGTH, but there is no error. Is FIELD_BUFFER_LENGTH is use at all? Or am I mistaken in its use?

Thanks again for you help!
Anna

Hi,
you can print out the suspected value in transform method (System.err.println(…) in java or print_err(…) in CTL).
FIELD_BUFFER_LENGTH is used only during parsing (DelimitedDataParser) or formatting, not during record serialization. For serialization we use MAX_RECORD_SIZE. But it can happen that one string field can be serialized to the buffer and when you split it to more fields it can’t, because in the buffer there is encoded the value itself as well as its length. So if the string value is of the proper size the buffer can accommodate the value and (one) its length, but when you split it, the buffer can be to small to accommodate more values (the sum of lengths is the same as in the first case) with their lengths.