Issue upgrading from 3.0.1 to 3.3.0

Heya,

We autogenerate a graph and use the Clover engine to run it as part of our application. We are testing an upgrade of Clover from version 3.0.1 to 3.3.0 and we’re having an issue with one of our smoke tests. The attached graph works fine with Clover 3.0.1, but crashes in Clover 3.3.0 with the following stack trace:

[2013-03-04 16:14:34,166] ERROR - Graph execution finished with error
[2013-03-04 16:14:34,166] ERROR - Node JOIN_3 finished with status: ERROR caused by: Data input 0 is not sorted in ascending order. Record #51: Key fields=“FIELD_CMERGE_Prep_Key_Prep_Parcel:FIELD_CMERGE_Prep_Key_SECNBR:FIELD_CMERGE_Prep_Key_BLD_ID”. Current=“FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:1”; Previous=“FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:2”.
[2013-03-04 16:14:34,166] ERROR - Node JOIN_3 error details:
java.lang.IllegalStateException: Data input 0 is not sorted in ascending order. Record #51: Key fields=“FIELD_CMERGE_Prep_Key_Prep_Parcel:FIELD_CMERGE_Prep_Key_SECNBR:FIELD_CMERGE_Prep_Key_BLD_ID”. Current=“FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:1”; Previous=“FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:2”.
at org.jetel.component.MergeJoin.loadNext(MergeJoin.java:305)
at org.jetel.component.MergeJoin.execute(MergeJoin.java:439)
at org.jetel.graph.Node.run(Node.java:465)
at java.lang.Thread.run(Thread.java:662)
[2013-03-04 16:14:34,194] ERROR - 588: thread forcibly aborted
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.jetel.graph.DirectEdge.fillReadBuffer(DirectEdge.java:216)
at org.jetel.graph.DirectEdge.readRecordDirect(DirectEdge.java:182)
at org.jetel.graph.Edge.readRecordDirect(Edge.java:377)
at org.jetel.component.Trash$InputReader.run(Trash.java:494)
.
.
[2013-03-04 16:14:36,213] ERROR - !!! Phase finished with error - stopping graph run !!!
[2013-03-04 16:14:36,808] ERROR - Execution of graph failed !

I’ve verified that the graph looks the same up to the point where this exception is thrown. Looking the the graph, SORT_71 (which goes into JOIN_0:1) is sorting it by the correct fields, and then nothing is resorting on the port 0 path before it hits JOIN_3:0. Do you know of anything that might’ve changed that could cause the records to get out-of-order? With this upgrade we did do some changes in our custom nodes, so I’m investigating if something we did might cause this, but thought I’d ask if there anything that springs up as obvious from the Clover side.

Thanks,
Anna

Hello, Anna,

I am sorry, but the attached graph is too big for non-involved person. And it also have external metadata which are not attached. Can you please reduce the graph to minimal graph containing this error and attach also all metadata files? (And any other necessary files.) Thank you.

Best regards,

Heya,

I will ask if we can cut it down and see if the problem is still there. What other files would you require? What I’m looking for is to see if you might have some general ideas if something changed in v3.3.0 that could cause rows to go out-of-order (multithreading? reallocating buffers?), because the relevant nodes leading up to JOIN_3 are:


SORT_70-----
            |
SORT_71-- JOIN_0
            |
        FILTER_0
            |
---------JOIN_1  
            |
        FILTER_2
            |
---------JOIN_3

SORT_70 is

sortKey=“FIELD_FILE_Main_file_txt_parcelid;FIELD_FILE_Main_file_txt_CLOVER_ROW_NUM;”
sortOrder=“A;A;”

SORT_71 is

sortKey=FIELD_CMERGE_Prep_Key_Prep_Parcel;FIELD_CMERGE_Prep_Key_SECNBR;FIELD_CMERGE_Prep_Key_BLD_ID;FIELD_CMERGE_Prep_Key_IN_PORT;FIELD_CMERGE_Prep_Key_CLOVER_ROW_NUM;
sortOrder="A;A;A;A;A;

and JOIN_0 is

joinKey=“FIELD_FILE_Main_file_txt_parcelid=FIELD_CMERGE_Prep_Key_Prep_Parcel”

so it should be in the right order. There are no further sorts on the port 0 path down (and the port 1 sorts shouldn’t cause an out-of-order), so what could cause JOIN_3 to throw the error:

Data input 0 is not sorted in ascending order. Record #51: Key fields=“FIELD_CMERGE_Prep_Key_Prep_Parcel:FIELD_CMERGE_Prep_Key_SECNBR:FIELD_CMERGE_Prep_Key_BLD_ID”. Current=“FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:1”; Previous=“FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:2”.

I’m trying to track it down myself, just hoping to get some pointers on where to look…

Thanks,
Anna

Hi, Anna,

Are you sure you posted the right graph? As far as I can see, the graph looks little bit different:


SORT_70-----
            |
SORT_71-- JOIN_0
            |
        FILTER_0
            |
---------JOIN_17 
            |
        FILTER_1
            |
---------JOIN_18

JOIN_3 is not even among the other components in graph outline in the bottom left pane so I do not understand how can JOIN_3 throw any exception.

Regarding your question about files, if we are supposed to reproduce the issue on the minimal graph, we need also all possible externalized files you have in your project - metadata, connections, java transformations, sequences, … And of course a sample of your input data.

Thanks and best regards,

Heya,

I double checked that I posted the right graph and took the time to hand draw it out edge-to-edge (we don’t have the designer - the graph is autogenerated by code). Following the edges, it is the way I drew it, not the way you drew it. Perhaps there’s so many nodes that it does not show up in the designer correctly (Although that makes me worry that if it is showing up wrong in the Deisgner that something *is* going on)?

We are not going to be able to cut it down or provide sample data for you to reproduce, so I will try and run it with full debug on to see if I can see what’s going on, node-wise. When I find a solution, I will post the results…

Thanks,
Anna

Hi Anna,

We checked once again your graph in Designer and it really shows something different than saved in graph file. We suspect duplicate IDs in XML for this problem. Unfortunately your graph is too big to check this by manual change.

Can you please generate your graph in the way all used IDs will be unique? At least we see duplicity among Metadata and Component elements. There is requirement of Engine and Designer that IDs used in graph must be unique.

Sample of problem:


<Metadata id="JOIN_1" fileURL="/home/xxx/clover_staging/work/config/NC117/E6/V1/PFA/202547/JOIN_1.fmt"/>
....
<Node id="JOIN_1" type="EXT_MERGE_JOIN" joinKey="FIELD_CMERGE_Prep_Key_Prep_Parcel=FIELD_FILE_ComPCL_txt_parcel_id" transformClass="com.facorelogic.core.etl.transform.LinkFiles" joinType="fullOuter" joinMetadataID="JOIN_2" slaveKeyFields="FIELD_FILE_ComPCL_txt_parcel_id" includeInOrphanRowReport="true" />

Heya,

I think I am bumping up against this one: https://bug.javlin.eu/browse/CLD-4137

On Clover v3.0.1, it apparently does not check the sorting order of the master on a ExtMergeJoin. BUT, if they are out of order it silently drops the slave records for out-of-order master records. For example, I had sample records:

Driver


ID,D_Seq
1,1
1,2

Secondary1


ID,S1_Seq
1,1
1,2

Secondary2


ID,S2_Seq
1,1
1,2

I linked Driver to Secondary1 on ID, then linked Secondary1 to Secondary2 on ID, S1_seq=S2_seq with a graph that looks like:


DRIVER_INPUT --> SORT ON Driver.ID -----------------------------------
                                                                     |  
SECONDARY_1_INPUT --->SORT ON Secondary1.ID, Secondary1.S1_Seq------JOIN_0
                                                                     |      
SECONDARY_2_INPUT --->SORT ON Secondary2.ID, Secondary2.S2_Seq------JOIN_1

When I run this with Clover 3.0.1, I get:


Driver.ID,Driver.D_Seq,Secondary1.ID,Secondary1.S1_Seq,Secondary2.ID,Secondary2.S2_Seq
1,1,1,1,1,1
1,1,1,2,1,2
1,2,1,1
1,2,1,2

This is bad. In Clover v3.3.0, I get an Exception on the ordering of Secondary1 going into JOIN_1. If I add a sort:


DRIVER_INPUT --> SORT ON Driver.ID -----------------------------------
                                                                     |  
SECONDARY_1_INPUT --->SORT ON Secondary1.ID, Secondary1.S1_Seq------JOIN_0
                                                                     |      
                                                           SORT ON Secondary1.ID, Secondary1.S1_Seq
                                                                      |
SECONDARY_2_INPUT --->SORT ON Secondary2.ID, Secondary2.S2_Seq------JOIN_1

I get the correct results:


Driver.ID,Driver.D_Seq,Secondary1.ID,Secondary1.S1_Seq,Secondary2.ID,Secondary2.S2_Seq
1,1,1,1,1,1
1,2,1,1,1,1
1,1,1,2,1,2
1,2,1,2,1,2

I noticed that this bug item is marked as Unresolved, but it looks like the behaviour changed in v3.3.0? An exception is much better than silently dropping records, so I’m glad something was done!

I will change our autogenerator to add a sort for this kind of condition. It is an extra sort (which is expensive), but I assume that from now on the master port will always have to be sorted, too?

Thanks,
Anna

Hi, Anna,

You are right, this bug is already fixed. Thanks for noticing the inconsistency. Information about this fix is now mentioned also in our bugtracking system.

Best regards,

Heya,

Is it now a requirement that ID must be unique across ALL components? Our understanding when we started using Clover (back in 2007, so our understanding may be very old!) was that the ID had to be unique within a type (e.g. you could not have 2 nodes with an ID of “JOIN_1”, but you COULD have a metadata with an ID of “JOIN_1” and node with an ID of “JOIN_1”). We normally have “checkconfig” turned off, but when it is on the engine does not complain about it and the graphs run fine. Is this globally unique ID a requirement of the Engine or the Designer? We will most likely make the change either way in case we use the Designer at some point, but if it’s not critical it can wait until out next release cycle. If you think it could cause a nasty side effect, we may need to fix it right away…

Thanks,
Anna

Hello again,

Officially, we support just one way of graph creation - via CloverETL Designer. This is the reason why request for unique IDs is not documented - Designer does it automatically. So the only way how to be (relatively) sure when autogenerating the graph is to copy the behavior of Designer as much as possible. We do not know what happens if IDs are not globally unique, this situation can not happen in supported use-cases.

Best regards,