hashJoin in community edition

Support/help with CloverETL (4.9) and CloverDX (5.0 or newer) implementation problems

stargazer
Posts: 2
Joined: Tue Dec 06, 2011 7:37 pm

hashJoin in community edition

Postby stargazer » Wed Dec 07, 2011 12:32 am

So I have two files with 10 million+ records, couple of hundred columns each... both sorted by a common key. Slave file may have instances where key is not populated (if possible using outer join, otherwise i'll just not carry those). I get the impression that the hashJoin is very memory intensive, it's failing with a heap error ... is there any way I can 'leverage' my files pre-sorted status to optimize this (seems like it would run very fast with that assumption in mind) ? or is that another Join in the non-community edition?

If the answer is the latter... if I were to buy the Desktop edition... can I still use the runGraph feature using graphs I create with the Desktop edition? or is that an enterprise function?

Thanks for your help.

Jeff

jurban
Posts: 163
Joined: Fri Jul 20, 2007 9:25 am

Re: hashJoin in community edition

Postby jurban » Mon Dec 12, 2011 10:57 am

Hi Jeff,

you're correct that HashJoin can be very memory intensive - it caches slave records in memory. To join large data, you need to use MergeJoin component that would leverage the pre-sorted status of your data. Please see Joining Data for more info on joiners. However, the MergeJoin component is not available in CloverETL Community - it contains only the HashJoin joiner component.

CloverETL Desktop contains the RunGraph component - is this the runGraph feature you meant?

Best regards,
Jaro
Jaroslav Urban
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com

stargazer
Posts: 2
Joined: Tue Dec 06, 2011 7:37 pm

Re: hashJoin in community edition

Postby stargazer » Sat Dec 17, 2011 12:53 am

I just want to verify... that if I buy the desktop edition for windows, can I take my graph over to my linux box and run it via command line? (as I've been able to do with the community edition). I don't have any kind of GUI desktop setup to hit my more powerful linux box with.

Thanks!

jurban
Posts: 163
Joined: Fri Jul 20, 2007 9:25 am

Re: hashJoin in community edition

Postby jurban » Tue Dec 20, 2011 1:57 pm

Hi Jeff,

to be able to answer your question - how exactly were you running the graphs in Linux?

Best regards,
Jaro
Jaroslav Urban
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com