Hi,
I once mentioned problems with ExtMergeJoin but was unable to point precisely the problems… After a few trials, here they are :
- if not set, allowSlavesDuplicates defaults to TRUE
This behaviour is in contradiction with the documentation, with HashJoin’s one and with what users would expect…
- if a master record has a null value for join key, record will be filtered out, even if join type is set to LeftOuterJoin
This is not consistent with HashJoin behaviour, and above all against SQL left outer join behaviour.
If my findings are true, I think this should be corrected, but it will have an impact on compatibility for in production graphs, so it should be carefully documented.
Franck
I will now study the code
Thanks a lot for your work !
Franck
Hi,
you are right. Bug has been fixed and component will work as I had written before, in next release.
Agata
Hi,
problem was that in loadNext() method of MergeJoin component, master record is compared with records from all readers (reader[minIdx].compare(reader[i])). It means that with itself too. But the compare(…) method of RecordKey class returned -1, when key field was null independently of second record. Now when compare method compares record with itself it returns 0.
Agata
Hi,
-
ok, the behaviour is fine if you know, so updating the doc is a solution
-
what you describe is not what I see in my tests…
If you set joinType to LeftOuterJoin, master record with null in join key is NOT sent to transformation class nor to output0.
So I still think there is problem here.
Do you see that as well ?
Great ! Thank you.
Would you mind pointing me to the place (file mane / method) where you made the fix, so that next time I can submit a patch instead of just a bug report ?
Thanks again,
Franck