Problems regarding ExtMergeJoin component

Hi,

I once mentioned problems with ExtMergeJoin but was unable to point precisely the problems… After a few trials, here they are :

  1. if not set, allowSlavesDuplicates defaults to TRUE

This behaviour is in contradiction with the documentation, with HashJoin’s one and with what users would expect…

  1. if a master record has a null value for join key, record will be filtered out, even if join type is set to LeftOuterJoin

This is not consistent with HashJoin behaviour, and above all against SQL left outer join behaviour.

If my findings are true, I think this should be corrected, but it will have an impact on compatibility for in production graphs, so it should be carefully documented.

Franck

I will now study the code :slight_smile:

Thanks a lot for your work !

Franck

Hi,
you are right. Bug has been fixed and component will work as I had written before, in next release.

Agata

Hi,

  1. if not set, allowSlavesDuplicates defaults to TRUE
    Documentation has been just updated. It is not compiliant wth the HashJoin component, because HashJoin stores slave records in the hash table and, when puting new record withe same key, the old record is replaced by the new one. In MergeJoin records are processing sequentialy and there is no reason to skip some of them.
  2. if a master record has a null value for join key, record will be filtered out, even if join type is set to LeftOuterJoin
    LeftOuterJoin means that even if for master record there is no slave, it is sent to output port. So when master record has a null value for join key there is no slave to join with and the master record is send to transformation with null record to join. Master records with null value on the key are not sent to output if the InnerJoin is set.
    If you expect null value on the master you have to use filter component after join.
    Agata

Hi,
problem was that in loadNext() method of MergeJoin component, master record is compared with records from all readers (reader[minIdx].compare(reader[i])). It means that with itself too. But the compare(…) method of RecordKey class returned -1, when key field was null independently of second record. Now when compare method compares record with itself it returns 0.
Agata

Hi,

  1. ok, the behaviour is fine if you know, so updating the doc is a solution

  2. what you describe is not what I see in my tests…

If you set joinType to LeftOuterJoin, master record with null in join key is NOT sent to transformation class nor to output0.

So I still think there is problem here.

Do you see that as well ?

Great ! Thank you.

Would you mind pointing me to the place (file mane / method) where you made the fix, so that next time I can submit a patch instead of just a bug report ?

Thanks again,

Franck