Partition instance or partition number

etl_1234 · September 24, 2015, 12:00am

If we run particular component in parallel using allocation workers lets say I have reformat with allocation of 10 workers then I see the PARTITION column (indicates the worker number) while viewing thru debug mode. Why CloverETL doesn’t have this.partition concept which can utilized in the CTL2 transformation. We have specific case where I need intelligently route output to a particular partition in the flow (not writing to file) based on the condition and I need “this.partition” or partition number to be used in the CTL2.

etl_1234 · September 24, 2015, 6:29am

Let me rephrase my question to avoid confusion. If I have reformat with allocation set to 3 workers then in the java or CTL2 code is there a way to fetch the current worker instance number i.e. 0 or 1 or 2.

I am using Cluster environment and able to set the component allocation. i.e. no of workers are getting set properly. I can also see no. of workers getting executed. However, I need worker instance number in my transformation.

dpavlis · September 24, 2015, 9:07am

It looks like you are mixing the concept of partitioning records in data flow into different output ports/edges in graph - for this Partition component can be used. It may be perceived as an “intelligent” CASE statement. You may send records out through different ports of the component based on some defined key, or ranges of values or use CTL function to decide.
See Partition component help.

Then there is a different concept - parallelizing processing of data where you can use ClusterPartition component to split the data flows into chunks which are then processed in parallel. These chunks undergo the same processing but on different nodes of a cluster. The distribution of records into individual chunks can be user influenced to certain extend (to which it make sense). You may use round-robin distribution or hashing (based on key), intervals or again user defined CTL but without explicit link to physical node.
Said that - you can actually write CTL partitioning function and “sort-of” link to physical node. If ClusterPartition has CTL defined partitioning then first init(integer partitionCount) function is called. The partitionCount is the number of “output ports” - different worker nodes which will be used. Their order (thus number) should correspond to the allocation definition - if you have listed cluster nodes there. But be warned - if you change your allocation definition then you need to update your partitioning function.

I have asked our developers to check whether if partitioning function was written in Java we could get the exact “match” between output port and worker node in ClusterPartition component - in such case you could have “robust” partitioning resistant to changes in allocation structure - will see.

Anyway, check documentation of ClusterPartition and you may also check presentation about CloverETL’s clustering concept.

etl_1234 · September 24, 2015, 9:20am

Thank you David. Let me explain the scenario again. Sorry for the confusion.

I have clusterCopy followed by Reformat (allocation of 3 workers) which means I am copying the same set of records to 3 reformat components. Now, In the reformat transformation I need the worker instance i.e. run time it should be able to interpret which worker instance number. I went thru Java code and saw there is CloverWorker.class.

Is there any way to get the particular worker instance number within the transformation.

Log :

Starting parallel worker on node “**” (1 of 3)
Starting parallel worker on node “**” (2 of 3)
Starting parallel worker on node “**” (3 of 3)

I am looking for the numbers which are highlighted in the above log. Is there any way to get the instance number.

dpavlis · September 29, 2015, 6:08pm

Hi,

after consulting with our R&D guys, you might use following two parameters which are globally set for any graph running in clustered mode:

WORKER_ID - ID (number) - of current worker - e.g. 1,2,3…up to WORKER_COUNT
WORKER_COUNT - how many workers are running/processing particular graph in total

However this parameter is set just “once” for the graph. If you use multiple allocations with different setups then these numbers might be misleading (would be valid only for one of the cases).

Hope this helps.

etl_1234 · October 4, 2015, 3:26pm

Thank you David. Can you please let me know how this WORKER_ID can be used in the transformation.

etl_1234 · October 5, 2015, 12:17pm

its working fine and getting resolved to correct value.

Thank you David for your valuable inputs.

Topic		Replies	Views
Parallelism in CloverETL CloverDX Platform	3	2	May 2, 2014
Partition data CloverDX Platform	1	8	February 8, 2019
Using allocation with DBInputTable CloverDX Platform	5	3	May 2, 2017
Partition Component CloverDX Platform	3	2	June 16, 2017
Dynamic number of partitions CloverDX Platform	3	6	February 23, 2017

Partition instance or partition number

Related topics