Parallelism in CloverETL

sameerj365 · April 22, 2014, 12:00am

Does CloverETL have layout parallelism like Ab Initio. In Ab> Initio we can run as many processes of same component based on the depth of layout. For e.g. if there is a Reformat component which is reading a multifile which is 4 way partitioned then Reformat will also run 4-way which means internally it will fork 4 Reformat processes on the same server.

Looking at the help file, it seems like the parallelism in built at the server level and not at the component/layout/flow level.

Appreciate if someone can explain whether this is possible in CloverETL. Also, is there any chance to implement it in future versions

dpavlis · April 22, 2014, 10:04am

The CloverETL Cluster offers similar kind of parallelism to AbInitio - look at this presentation on Slideshare.

sameerj365 · May 2, 2014, 3:06am

Thank you for your reply. However, I still see some huge difference. Can Clover Spawn multiple processes on a single server utilising multiple cores. I am just trying to compare it with our current Ab Initio graphs which works 8 way parallel.

I have a 8 way parallel file which is MFS i.e. multiple file system and the whole graph itself runs 8 way. In Clover, if I need to perform the same thing then it needs node allocation which means unless I dont have cluster environment , the whole graph cannot run in parallel mode.

Please correct me if I am wrong.

Regards,
Sameer

dpavlis · May 2, 2014, 12:55pm

Clover is using “Allocation” to define how many copies of certain component will be started. Allocation can be defined:

- through “Partitioned sandbox” reference - that means how many partitions the sandbox has, so many copies and at those individual nodes included in the partitioned sandbox will be run - similar to MFS layout
- through list of nodes of cluster - so on each node one instance of such component(s) will be run. (one particular node can be listed more than once, it increases the n-way factor).
- through a number - number specifies how many instances will be run

So you don’t actually need a cluster of CloverETL nodes. One server (with clustering option enabled) is enough and you can just define (even dynamically through parameter) how many instances of component/set of components should be executed. Of course you need to use components which partition data and gather at the end, but your main processing flow can be run n-way parallel even on a single machine - just utilising more core and possibly writing in parallel to multiple attached disks (through the setup of Partitioned sandbox - a’la MFS, which can also exist just as individual disks/subdirs on single machine).

So essentially CloverETL offers similar MPP capabilities to AbInitio. If you are serious about Clover and its MPP, you can contact us at info@cloveretl.com and we can arrange a demo of the mentioned functionality.

Topic		Replies	Views
Performing multiple tasks in succession CloverDX Platform	2	9	July 16, 2007
Graph concurrency CloverDX Platform	1	4	July 7, 2011
Partition instance or partition number CloverDX Platform	6	3	October 5, 2015
Multithreaded Transformer Component? CloverDX Platform	4	14	January 12, 2009
Multi-processor Default? CloverDX Platform	4	0	July 20, 2012

Parallelism in CloverETL

Related topics