I’m considering three different ways of performing multiple tasks in succession. The first way is to create a separate graph file for each task and then run them one at a time. The second way is to use a single graph file and implement one or more nodes to perform each task and run them all at once in one phase. The third way is to use a single graph file with multiple phases consisting of one or more nodes each to perform the tasks. I’d like to hear opinions on the pros or cons of any or all of these approaches. What criteria would you use to decide when to use any of these methods? Does it matter, as long as the overall process gets done? Thanks for your input.
--Joe
The first and last (third) options are equivalent as to performance. Actually using multiple phases can be a bit faster as you don’t need to restart JVM several times. Second option can produce results faster especially if your computer has more CPUs or single transformation does not consume all the resources.
The other way of looking at this could be from the maintenance point - in the first case, you need to keep three (or more) separate files.
Said that, it assumes that you don’t need extra checking for dependance or timinig amongst those transformations.
Thanks for the feedback, David. I do have multiple processors available and I’ve tried to make each node failsafe so that an error on one record doesn’t stop the whole process. I’m also using edges to pipe the data from node to node so there shouldn’t be any timing or dependence issues. Yesterday, I ran a process with 8 nodes connected end-to-end and pushed 1,000,000 records through with no problems. The more I use CloverETL, the more impressed I am with the work you’ve done. Thanks.