MergeJoin - how to stop the graph

Hello!

I read two files with two data readers, merge join them then write the result to file with data writer. The driver file contains 10 million records, the slave - 60 million. I use inner join. The problem is - when the joiner finds out there’s no more records on input port 0, it broadcasts the EOF message to other components down the graph and then stops. The data writer connected to the joiner stops as it should. But data reader which feeds the slave file into the joiner keeps pumping away until the entire input file is consumed. As a result the execution time increases significantly if all the records from a driver file have matches somwhere in the beginning of a slave file.

So the question is:
Is there a way to stop the reader or graph after the joiner has finished its job in this particular case?

I think we need a more generic solution to this problem - when a Node is about to stop, it must broadcast a message not only to child nodes, but to parent nodes also. The message essentially means “I don’t need your services anymore”. Then the parent Node must decide for itself what to do. If nobody needs it’s services, it must stop and notify it’s own parents.

Hello Daniel,
I’ve created a request in our bug tracking system (http://bug.cloveretl.org/view.php?id=5165) for such component behavior.

If the slave file is re-usable for different master files, you could create lookup table (Persistent lookup Table for flat file or Database lookup table if the data can be inserted to database) and use Lookup Join instead of MergeJoin. But if the slave data changes from one graph run to another I haven’t found any workaround :frowning: .

Hello, Agata!

Thank you for advice. I think Database lookup table may well be an option. I’ll give it a try. Persistent one is not available to us for we use community edition. That’s the whole point of using CloverETL for us - as an open source ETL engine. Like a small convenient swiss knife.

Daniel