This is the 3rd day so far since I started evaluating CloverETL server and I am looking for help to implement some basic file operations tasks using FTP and other remote file protocols.
The basic scenario is:
1. List files using input pattern filter or expression.
2. Process/copy every files over to target remote location.
3. Rename each output filename, using expressions e.g. appending datetime to the filename.
4. Move the original remote file to done directory.
The JobFlow design patterns in the user guide seems to suggest similiar pattern here: http://doc.cloveretl.com/documentation/ … rence.html
However, I am not sure how achieve the steps above, and in particular renaming output file names using expressions. Camel has short semantics to achive this http://camel.apache.org/file2.html, is there something similiar in CloverETL?
Regards,
M. Elshami
Hi M. Elshami,
What you need can be done using 3 components in CloverETL. Please see attached JobFlow. It works like this:
-
ListFiles - list files using pattern, in my case *.csv
-
In output mapping of ListFiles I map URL of each matched file to custom metadata which contains only from and to fields.
-
In Reformat I fill to field. In my case it is just replacing folder csvs by csvs2 in URL. But you can do any complex logic you want - put there another server, change filename, add timestamp to name, …
-
So in output metadata we have both original URL of file and desired new URL.
-
Now we can use CopyFiles or MoveFiles with proper input mapping to copy/move files.
If you need to process files on Server then process would be:
Or alternatively use Supported File URL Formats for Readers and Supported File URL Formats for Writers for direct file access from Readers/Writers. You would load remote file for processing and save remote file with results - so no local copy would be necessary.
You can use JobFlow to simplify processing logic, for example:
-
Prepare master JobFlow which only list files and then for each executes sub JobFlow (passing file URL as parameter)
-
In sub JobFlow download file from original location to local file, execute processing graph, and move/copy result to destination server/folder
-
In processing graph you just process local file and save result into local file
I hope this helps.
Thanks a lot Jaroslav, this was helpful.
I am very new to CloverETL, not sure how the binding works in the reformat component, looks like you’ve assigned the out.0.from and out.0.to in the ListFiles component?
function integer transform() {
$out.0.from = $in.0.from;
$out.0.to = $in.0.from.replace(“DailyTrans”, “DailyTrans_20150102.csv” );
printLog(info, "out.0.from: " + $out.0.from);
return ALL;
}
Also, how do I specify more dynamic input pattern? so instead of wildcards, I would like to specify input pattern based on the date, e.g. DailyTrans_*.csv
Regards,
Mohamed
Hi Mohamed,
As you can see Clover components have input and/or output ports. And in order to work with data coming through these ports in CTL you work with the following values:
$in.0 - Which represents the first input port. Zero is the index of the input port and the word in tells clover it is input port. Numbering of port indexes begins with 0 (0 – first port, 1 – second port etc.)
$out.0 - Similarly to previous example, this stands for the first (index 0) output port (out)
Regarding the pattern you would like to use, there are two ways:
-
For simple patterns (just like you have) you may still use wildcards (just like DailyTrans_???-??-??*.csv – which can handles strings like DailyTrans_2012-11-27_Monaco.csv etc.).
-
For more complicated patterns you may at first list all files from a folder (using ListFiles) and after that use ExtFilter component to filter out unwanted records based on regular expression comparison.
Hope this helps.
Jan
Thanks a lot Jan
I made another attempt to create basic ListFiles → CopyFiles flow.
I can’t figure out how URL metadata is propagated, I thought it’s automatically recognised, but then looking at your example it seems that I have to do input mapping and output mapping.
I’ve trying the below CTL2 the CopyFiles input mapping:
// Transforms input record into output record.
function integer transform() {
$out.0.sourceURL = $in.0.URL;
return ALL;
}
I am getting the following error:
Caused by: java.lang.IllegalArgumentException: Copy source is empty
at org.jetel.component.fileoperation.FileManager.copy(FileManager.java:271)
I’ve attached the jobflow example.
Regards,
Mohamed
Hi Mohamend,
as you can see you are getting “Copy source is empty” message, which means the URL string is empty. If you enable debug on the edge between ListFiles and CopyFiles you can view the data that goes through it. In your case it is only empty record. The reason why the record does not contain any data is that you haven’t defined Output mapping in ListFiles component. I’ve prepared a short example for you (copies all files from data-in to data-out).
local-files-copy.jbf
Hi - do you have a jobflow example graph that runs a series of graphs in sequence assuming that the previous graph executes successfully for the next one to run? I’m looking to automate a series of graphs that take a long time to run, so trying to break them up into smaller running parts to the memory can flush itself out as well as join small partitions of data to speed it up a little bit. I can’t find a good example out there of how to use jobflow to call more graphs once one has finished without error.
thanks for any help!
Hi pintail,
CloverETL Server comes with set of examples in which you may find the answers to your questions. You might want to start with jobflows in JobflowExamples sandbox.
Hope this helps.
It does thanks - I didn’t even think to look in the example sandboxes…completely slipped my mind. thanks!