Parallelism in CloverETL

Does CloverETL have layout parallelism like Ab Initio. In Ab> Initio we can run as many processes of same component based on the depth of layout. For e.g. if there is a Reformat component which is reading a multifile which is 4 way partitioned then Reformat will also run 4-way which means internally it will fork 4 Reformat processes on the same server.

Looking at the help file, it seems like the parallelism in built at the server level and not at the component/layout/flow level.

Appreciate if someone can explain whether this is possible in CloverETL. Also, is there any chance to implement it in future versions

The CloverETL Cluster offers similar kind of parallelism to AbInitio - look at this presentation on Slideshare.

Thank you for your reply. However, I still see some huge difference. Can Clover Spawn multiple processes on a single server utilising multiple cores. I am just trying to compare it with our current Ab Initio graphs which works 8 way parallel.

I have a 8 way parallel file which is MFS i.e. multiple file system and the whole graph itself runs 8 way. In Clover, if I need to perform the same thing then it needs node allocation which means unless I dont have cluster environment , the whole graph cannot run in parallel mode.

Please correct me if I am wrong.

Regards,
Sameer

Clover is using “Allocation” to define how many copies of certain component will be started. Allocation can be defined:

- through “Partitioned sandbox” reference - that means how many partitions the sandbox has, so many copies and at those individual nodes included in the partitioned sandbox will be run - similar to MFS layout
- through list of nodes of cluster - so on each node one instance of such component(s) will be run. (one particular node can be listed more than once, it increases the n-way factor).
- through a number - number specifies how many instances will be run

So you don’t actually need a cluster of CloverETL nodes. One server (with clustering option enabled) is enough and you can just define (even dynamically through parameter) how many instances of component/set of components should be executed. Of course you need to use components which partition data and gather at the end, but your main processing flow can be run n-way parallel even on a single machine - just utilising more core and possibly writing in parallel to multiple attached disks (through the setup of Partitioned sandbox - a’la MFS, which can also exist just as individual disks/subdirs on single machine).

So essentially CloverETL offers similar MPP capabilities to AbInitio. If you are serious about Clover and its MPP, you can contact us at info@cloveretl.com and we can arrange a demo of the mentioned functionality.