Lookup Join on Persistent Table missing records that should be joined

geoffrey.quinby · March 23, 2016, 12:00am

Hello,

I have a series of graphs that are responsible for propagating ID relationships between data rows in an effort to make sure all relevant normalized data is correctly updated according to those ID’s. To do this, a relationship of the ID values (internal vs external) are created in a persistent lookup table that will be later accessed by subgraphs that handle the ancillary information using those relationships.

The issue I’m running into is that records that should be joined correctly by ID are not being joined. For example, I double checked (using debug) that a row with ID 123456 is in the preceding edge of the LookupTableReaderWriter component. I then ran the subgraph that uses a LookupJoin with the same table. The input port edge of the join component contains a record with ID 123456 that I expect to be joined to the lookup table record with the same ID, but it is instead sent to the second output port as an unjoined record.

Here are the current specs for the config:


#Tue Jun 02 16:57:49 EDT 2015
pageSize=16
commitInterval=100
fileURL=${DATATMP_DIR}/EANToUniqueId
name=EANToUniqueId
metadata=${META_DIR}/EANToUniqueId.fmt
key=EANHotelId
type=persistentLookup
cacheSize=60000

Some more information:

The Persistent Lookup Table stores five fields in Metadata, one of which is a long-form string.
On average, we store about 100k records in the lookup table per session before being reset.
Field replacement and transactions are both enabled.
I’m currently seeing anywhere from 300-2000 records per graph execution get passed through as unjoined, even though the lookup table should have the correct ID, and the dropped records are not consistent.
Increasing the page and cache sizes has not had a visible effect.
There are no errors or warnings shown during graph execution that would indicate failure to write the rows to the Lookup.

Any idea what might be causing this? Is there an upper limit to the size of Persistent Lookup Tables that I’m not aware of, such that I might be losing that data? Is there a way I could actually see what data is stored in the lookup table?

Thanks in advance for any feedback on the issue

PedroVazquezRosario · March 29, 2016, 8:31pm

Hi,

The reason it’s not working is because, lookup tables are initialized at the start of the graphs’ run. Since the Subgraph is a separate process that data will not be able to be available for the LookUpJoin to compare. I would recommend that you utilize Joblfows or a simple solution would be to eliminate the subgraph and place the LookupJoin in the same graph.

geoffrey.quinby · April 6, 2016, 9:09pm

Hi Mr. Rosario,

Thanks very much for the response, but I’m not sure I understand.

I thought the idea of the Persistent lookup table was that it could be made available to other graph processes, since the data was stored in a .db file. If the data were not available to any child processes, wouldn’t that mean that none of the records would be joined? I’m still able to fetch most of the data from the Persistent Lookup table using the LookupJoin component, except for the seemingly random records that are lost.

PedroVazquezRosario · April 8, 2016, 4:31pm

Hi,

The reason you are able to fetch a certain amount of records is because of the commit value in your look up table. Let’s say you have 150 records and that commit value is 100 (the default value) the 100th record is added to the lookup table, but the other 50 records will be committed after the end of the graph process and will not be available for the subgraph. In other words, every time 100 records are added to the lookup table (CloverETL uses a temporary buffer that stores the records and then commits it all at once into the lookup table to improve performance) the process will be committed. You have two possible workarounds in this case:

Decrease the Commit Value = 1 (this may cause performance issues)
Use CTL code to get the first value of the lookup table (this will trigger the commit)

I have taken the liberty of creating a ticket in our system to possibly improve this in future version, you can see more details here.

geoffrey.quinby · April 8, 2016, 5:24pm

That helps a lot, thank you very much for your input!

Topic		Replies	Views
Lookup Join CloverDX Platform	1	10	July 27, 2007
Look up join CloverDX Platform	2	4	September 18, 2007
Look up not working CloverDX Platform	14	19	July 28, 2008
LookupJoin CloverDX Platform	1	3	August 1, 2007
Joining a lookup value with no join key CloverDX Platform	1	3	July 16, 2007

Lookup Join on Persistent Table missing records that should be joined

Related topics