Hi - Can a partition component be used to separate data into three separate output ports based on the key value? I have incoming data with a field called TypeID (values 1,2 or 3). I want to separate the data into three separate ports, records with values 1 going to port 0, records with values 2 go to port 1, values 3 go to port 2. All I did was assign the key to equal the Type ID metadata column. Is there something else that needs to be done because doing that just seems to assign records randomly to the output ports?
Anyone have a simple example graph for this?
thanks.
Hello pintail,
as you rightly said, the Partition component does assign the filtered records to the output ports randomly when using the Partition key as the only property definition. This is, in fact, the easiest way how this component can be used to get the data forked into multiple output ports. If you need to assign a specific TypeID to the respective output port, you might need to take advantage of the CTL code partition definition (the Partition property of the component). In the situation that you described, the definition could be as simple as this:
function integer getOutputPort() {
if ($in.0.TypeId == 1) {
return 0;
} else if ($in.0.TypeId == 2) {
return 1;
} else
return 2;
}
Regards,
that worked great. thanks!
Just a small note:
Partition component does assign the filtered records to the output ports randomly when using the Partition key
That is not really true. In fact the Partition component in that case (partition key defined) calculates a HASH of that key (which is a 32bit number) and then based on that hash value sends the data record out through particular port which represents a “bucket” into which that value belongs. Very much like hash table works.
Why is this important ? Simple - the same key value gets sent out through the same output port - which means you are essentially grouping records with the same partition key values. Which may become important in certain cases. However this does not guarantee that, for example, the value “A” would be sent out through the first port and “B” through the second. Just guarantees that all "A"s would be sent through the same port.