Hello,
We are working with a large amount of data and have a number of people working on the project. There is a logical piece of processing that we want to put in a subgraph to promote reuse. It performs a large number of generic transformations, but we would like to use them on separate data streams.
Our idea right now is to have a series of specialized graphs to cover the specific loading concerns of each data source, then run the data through a transformation graph that will take place of the data mapping portion of the processing in a generic, centrally-defined way.
My immediate concern is that the only way I can see to pass data to the subgraph is via the graph name and some parameters. So if I have 300 GB of data to pass through, I’m likely going to have to write it to a temp Clover Metadata file, and pass that filename to the subgraph. That obviously incurs a 300 GB write/read cycle, something we are obviously loathe to do.
Several times I feel like I’m not understanding the Tao of CloverETL. How should I be approaching this problem in a Clover-y way?
Thanks,
Brad