Large map object in dictionary fails with persistence truncation error in Designer, Clover engine (command line) works?

We are using a dictionary entry of type “map”, to pass data from one element in one phase to another element in a subsequent phase. This data is used iteratively in the 2nd element, so we can’t use a join-type connection between them.

We noticed that this approach works, but only for a limited number of key-value pairs in the dictionary. Once we get to about 5,000 or so pairs, we get the following error:


14:47:19,262 INFO  [JobFinalizer_1] Dictionary output: { *testMap={KEYFORID2=VALUEFORID2, ....
14:47:19,267 WARN  [JobFinalizer_1] Updating modified RunRecord ...
14:47:19,315 WARN  [JobFinalizer_1] SQL Error: 20000, SQLState: 22001
14:47:19,316 ERROR [JobFinalizer_1] A truncation error was encountered trying to shrink LONG VARCHAR 'UEsDBBQACAgIAOl1kUcAAAAAAAAAAAAAAAAAAAAAZN2/7i5qfpb3hREYYych&' to length 32700.
14:47:19,316 WARN  [JobFinalizer_1] org.hibernate.exception.DataException: could not execute statement
javax.persistence.PersistenceException: org.hibernate.exception.DataException: could not execute statement
	at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1387)
	at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1310)
	at org.hibernate.ejb.AbstractEntityManagerImpl.throwPersistenceException(AbstractEntityManagerImpl.java:1397)
...
Caused by: org.hibernate.exception.DataException: could not execute statement
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:71)
	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125)
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:110)
...
Caused by: java.sql.SQLDataException: A truncation error was encountered trying to shrink LONG VARCHAR 'UEsDBBQACAgIAOl1kUcAAAAAAAAAAAAAAAAAAAAAZN2/7i5qfpb3hREYYych&' to length 32700.
	at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
	at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
	at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
...
14:47:19,318 WARN  [JobFinalizer_1] Couldn't update RunRecord; org.hibernate.exception.DataException: could not execute statement
javax.persistence.PersistenceException: org.hibernate.exception.DataException: could not execute statement
...
	at org.hibernate.ejb.AbstractQueryImpl.executeUpdate(AbstractQueryImpl.java:111)
	at com.cloveretl.server.dao.RunRecordDao.update(Unknown Source)
	at com.cloveretl.server.dao.PersistenceServiceImpl.updateRunRecord(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...

I can provide more details and a sample graph to reproduce this error if needed.

Interestingly, running this same graph through the command-line engine works without any problems, it’s only in the Designer (4.1.1) that we see this failure.

Hi

This behavior is a bug and I reported it.

However in your case it should be possible to avoid using dictionary, which is causing this issue. You can declare a whole new map in the component where you fill it with data and then send this map over the edge to the other component. The filling and reading functions will stay almost the same, you just need to replace the dictionary.map with your new map.

If such solution is not possible, please send me your graph and some input file with a representative sample of the data you are working with.

Also, graph created in Designer shouldn’t be run by the open-source engine as it could behave unexpectedly. E.g. the set of components is not exactly the same and that could cause issues.

Best regards.

Thanks for confirming/creating the bug report!

I learned something new from your note regarding the ability to define a container (list, map) as a metadata field type. That is really powerful. Thank you for that!

I think I understand what you meant by sending the map “over the edge” to another component: you mean adding a container field to the metadata and then setting the output port’s value for that field to the generated map value:


map[string,string] theMap;
function integer transform() {
	theMap[$in.0.KEY] = $in.0.VALUE;
	$out.0.MAPFIELD = theMap;
	return ALL;
}

However, implementing it this way would be very memory intensive, wouldn’t it? If the map has many elements, then each output row would contain a deep-copy of the (partial) map up to that point and the very last output row would contain the final map.

So, I started experimenting with using two Reformat components, one that is simply counting the number of rows for the map and storing that count in the dictionary (integer). The next Reformat executes at a later phase and uses that count to SKIP all output rows except the last one:


map[string, string] theMap;
integer rowCount = 0;
function integer transform() {
	theMap[$in.0.KEY] = $in.0.VALUE;
	rowCount++;
	if (rowCount == dictionary.mapCount) {
		$out.0.MAPFIELD = theMap;
		return ALL;
	}
	return SKIP;
}

The reduces the output of the “read the map” phase to a single row.

The next challenge is to pass this map to the “using the map” phase. In the phase where we need to use the map, the metadata is very different from the previous phase. I could not find a suitable component that can combine that single input row that contains the map with an input containing many rows with a different metadata.

So I resorted to using an “intermediate” metadata that is the target metadata with that map field added. Then, “read the map” phase creates an output with one single row using this metadata, containing our map. I then use a SimpleGather to combine this “fake” single row with the actual data in the subsequent rows. A Reformat then treats that first row differently from the remaining rows, pulls the map off the first row only and uses that map for processing the remaining “real” rows:


map[string,string] theMap;
integer rowCounter = 0;
function integer transform() {
	if (rowCounter == 0) {
		theMap = $in.0.MAPFIELD;
		printLog(info, "The map size: " + length(theMap));
		rowCounter++;
		return SKIP;
	} else {
		$out.0.ORIGORDER = $in.0.ORIGORDER;
		$out.0.CHILD = theMap["SOME_KEY_ON_THE_MAP"];
	}
	rowCounter++;
	return ALL;
}

It does work, but it’s a lot of moving parts compared to being able to use a dictionary map, so I appreciate any tips or comments you may have!

Hi

You don’t need to use two reformats as CloverETL software has a component to do this kind of job, it’s called Denormalizer. It takes all the input records you send to it and creates one output record, a map in this case.

The workaround you developed to get the map to the second component is ok. Alternatively you can use lookup table to store the data instead of map. This way you can avoid sending the map over the edge completely.

I attached a sample graph, where the aforementioned is shown.

map_lookup_sample.grf
Hope this helps.