Multiple small joins VS one big join - performance-wise?!

Hello,

Has it been ever tested or known by reasoning that which one performs better in CloverETL: Multiple smaller joins OR one huge join?
(Assume ExtMergeJoin component)

Many Thanks,
Parsa

Hi Parsa,

ExtMergeJoin process sorted (by join key) records streams from input ports. Because of this fact, it is extremely fast and memory efficient - no matter whether process one large dataset or many smaller.

Most of the work is done prior ExtMergeJoin - read records into memory, sort them, etc. Here I can recommend to try read already sorted data (SORT BY in sql, sorted text files, …) and do not sort as part of ETL.

I hope this helps.