I have some experience with Pentaho and Talend ETL tools and just getting started trying out Clover. Trying to understand if something is possible. I’ve written a small test transform but no luck so far
I’m sure the completed transformation will be possible
if so, how to accomplish it.
Basically I have a 2 column input file (tab delimited) in my data input folder and referenced in Data reader which seems ok in itself:
I wanted to read in row by row and use CTL to generate a prepared SQL statement (unless there is a better way)…this is what I have so far but being completely new not sure if it will work (it’s not so far):
// Transforms input record into output record.
function integer transform() {
string SQL;
$out.0.SQL = concat("SELECT COUNT(DISTINCT ", $in.0.ColName,") FROM ", $in.0.Tablename);
return ALL;
}
Ideally the end state I am trying to achieve is to return the SQL query result back to the column/field value of “SQL” but started only with trying to produce the dynamic SQL statement.
This should be quite straghtforward, see the example graph. The bottom line is that SQL does not have to be defined in CTL as a local variable, it should be a metadata field.
thanks, I got the result(sort of). However, one small issue which I’m sure is a noob issue, but I don’t understand why it’s happening. The file written back out looks like this:
1||
1||
1||
10||
So, it wrote the value retieved in SQL, but I lost my other two columns and values, any idea why that would be? Do you have to explicitly write out columns from a previous step? I thought the auto propogation would do that.
I did figure out one small fix according to the docs…didn’t realize I needed to explicitly assign the result back to the “SQL” metadata column which I did by changing my string to (it seemed to work):
$out.0.SQL = concat("SELECT ","$SQL:=","COUNT(DISTINCT ", $in.0.Column,") FROM ", $in.0.Table);
However, I’m not sure still how to avoid losing the original Table and Column name values from the input port data?
OK, I managed with your suggestion to copy port 0 Table and column and Port 1 SQL but this is concerning to me as there is no key matching happening between data sets is there? How certain can I be that the original table and column name will be matched correctly to the SQL value returned? Like if one entry fails then I assume it’s working only off an internal row sequence number- how reliable is this in terms of retaining the proper order and row counts?
Combine component should be used in cases where you are hundred percent sure that data comes in tuples. However, without any further check data are not checked to be in required order etc. Therefore, if you need to introduce joining based on a key you might simply use a key in the query and then use ExtMergeJoin instead of Combine. This is very simple to use and prevent any “incorrect” data joining. See the example for more information.