Sorting in the DB instead of the graph

I am new to CloverETL and I have inherited an existing ETL process which was built by Clover consultants. To improve performance on our ETL process we are planning to eliminate FastSorts in our graphs by pushing the sorting to the DB. I have noticed the following comment in one of the graphs:

“//Intersection needs to have sorted inputs, different from DB sort unfortunately”

Can someone explain the reason why we should sort the data in the graph rather than push this task to the DB? What are the implications of pushing the sort to the DB instead of the graph?

Thanks in advance,
- Rama

Hi Rama,

CloverETL defines order of items based on Java comparator (http://docs.oracle.com/javase/6/docs/api/java/lang/Comparable.html). So for string there are rules like http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#compareTo(java.lang.String)

On the other hand database may define another order. For example MySQL compare by default in case-insensitive way - see http://dev.mysql.com/doc/refman/5.0/en/ … -rows.html . That causes that CloverETL sees data unsorted. Usage of “BINARY” operator should fix that.

In general, I would recommend storing keys ordered by database into file and separately by CloverETL into another file. Then diff this files to see differences. Only in case the files will be equal CloverETL accepts the order.

I hope that helps.

Hi Jaroslav,
Thank you for the explanation. In our instance we only use numeric keys for sorts and hence based on your explanation and also based on the testing that I have done thus far, I presume we are safe even though we are pushing the sort’s to the DB layer.

BTW can you please elaborate more on what you meant by the “BINARY” operator?

Thanks,
- Rama

Hi Rama,

please see link in my answer - http://dev.mysql.com/doc/refman/5.0/en/sorting-rows.html - and its part:

On character type columns, sorting—like all other comparison operations—is normally performed in a case-insensitive fashion. This means that the order is undefined for columns that are identical except for their case. You can force a case-sensitive sort for a column by using BINARY like so: ORDER BY BINARY col_name.