We are using clover ETL graph for uploading given CSV datafeeds to DB tables.While testing the CSV load on a performance tool, we come across that clover ETL load process take 20% of CPU usage only on the whole from the start to the end of the load process.Is clover restricted to use only in 1 CPU?
Please share info related to this.
Hello,
CloverETL can use more CPUs, but DataReader runs as one thread, so it uses only one CPU. In commercial versions of Clover family software there is available ParallelReader, on which you can set number of threads for execution.
I guess the functional part of parallel reader component uses multiple thread defined(level of parallelism) to read the given input file.But my question is total CPU usage is between 20% and 50% when we run the clover graph(which contains reader,reformat and universal writer component), even when memory mostly are free.Is there any particular reason behind this one?
Each component runs in its own thread and jvm rules how the jobs are distributed on the CPUs. I run the more complicated graph and monitor my both processors - both were working and for the while both works for 80%.
As it was already mentioned, each component runs in separate thread and JVM take care about physical processors allocation. For runtime environment with more processors than graph components is very likely that you don’t exploit all the processors grid, because the IO operations are probably the bottleneck of whole processing and that’s why the processors are bored. According our experiences usage of ParallelReader should be very helpful to improve the performance of this type of graphs. Try it and let us know what was the result.
Hi,
Thanks,
Started using parallel reader component in my Graph instead of universal data reader.
Currently facing one issue whenever i load a CSV file of 150 records in the parallel reader, it throws error “ERROR - [PARALLEL_READER0].fileURL - Input file ‘CSV FILE NAME’ is too small and/or level of parallelism is too high.”.(default parallelism value 2 is used to test)
Please let me know is there a way to configure parallel reader to read small files.
Unfortunately this is one of limitation of ParallelReader. We have already removed this limitation, however no public release still contains this update. In fact there is no performance improvement for this so small input file while you are using parallel reader. I expected you have big amount of data when you are experimenting with processors load.