Multithreaded Transformer Component?

I’m trying to create a transformer component that runs a Java method against each record that passes through the component. The problem is that this method is slow, taking about 1-10 seconds per invocation.

Is there a way to process multiple records simultaneously (maybe set a max number of threads that can be executing it at a time)?

Thanks!

Hi,
currently this is not possible with the reformat component. What you could do is use the Partition component to split the records into multiple paralel “flows” (by a round robin algorithm, which is the default), and process each flow with a Reformat (each of them doing the same transformation). Finally you would merge the flows with a SimpleGather component. Each Reformat runs in a separate thread, so you would achieve something very similar to what you need.

Jaro

Sorry for offtopic. I’m reading transformation_concept and not sure if I understand this clearly:

There can be only one graph object created / running at the same time (singleton pattern is used).

Is it mean, that I can run only one graph in time?

Sorry for offtopic. I’m reading transformation_concept and not sure if I understand this clearly:

There can be only one graph object created / running at the same time (singleton pattern is used).

Is it mean, that I can run only one graph in time?

“schabluk”

Hi,
sorry but the documentation is outdated, we will fix that. To see how is a graph created, initialized and run, please see the “main” method in the “runGraph” class (note the lowercase start of the classname).

Jaro

Hi,

I managed to accomplish something like this, whereby I created my own custom Reformat node. It had similar functionality to standard reformat node apart from a ‘threads’ args parameter could specify the required number of threads.

In summary:
1. The node created by the graph I called the master node. All access to edges (and other shared resources) were co-ordinated (synchronized) thru this node. This involved overridding a number of methods in the base Node class.

2. On the call of “setEnabled” in my node by the graph engine (which happens after the phase is known, but before graph is initialised - once initialised you cannot change anything in the graph) I create the required number of child nodes and register them with the master node and add the child node to the same phase as the master node.

I have created infrastructure to manage conditions like EOF on edges across all the child nodes.

One thing I never sorted was the details in the graph log that displays the stats for each node. These stats are by edge but reported by node. Because I create X child nodes of my master, each instance reports records/bytes in/out - i.e. the same stats for all nodes, which is not what an individual node actually processed.

Of course this could all break in the next version because of the stage in the graph execution sequence when ‘setEnabled’ is enabled.