Code to pass file names into metadata from data reader

Hi

I would like to read the filename(s) of input file(s) used in my data reader component and pass the filename through as an additional field using a subsequent reformat component - could anyone provide me with a simple example of how this can be done? Example of the filename structure would be FILENAME_20130823.txt

Thanks in advance

Nikki

Hi Nikki,

This can be easily done using the metadata connected to reader’s output port. All you need to do is to edit the metadata, add new field and set its Autofilling property to “source_name” (please see attached screenshot). This will extract the source name (the File URL from your reader) and place it to a particular record. I have created a short sample for you, so please find it attached below.

Hi Jan,

Many thanks for this.

Is there a way to reformat the file_name value to remove the file directory details from the field so that you are left with only the file name and not the whole route path plus the file name?

Thanks again
Nikki

Hi Nikki,

Unfortunately, we do not provide this option directly. You can either split the value by “/” character (or “\” in case of OS Windows) and take only the value after the last slash or (if you have CloverETL Server and jobflows) you can use ListFiles component which provides information about both filename and filename + path.

Kind regards,

Hi Lubos,

I’m sorry but my clover skills aren’t that good so I don’t know how to strip out the file path using the “/” or “\” values?

If an example path and filename were “C:\NH\KPI2\Requests\Reactive\New\Filename123.csv” could you provide an example of what the function would be to strip out the “\” values and get to the last part?

Thanks
Nikki

Dear Nikki,

you may extract filename by this CTL2 code:


$out.0.field1 = find('C:\NH\KPI2\Requests\Reactive\New\Filename123.csv', '([^/\\]+)$')[0];

It should be platform independent. If you need this functionality in more places you can define function (http://doc.cloveretl.com/documentation/ … -ctl2.html) and import (http://doc.cloveretl.com/documentation/ … -ctl2.html) into your transformations.

Function would look like:


//#CTL2

function string extractFileName(string path) {
	string[] res = find(path, '([^/\\]+)$');
	
	if (res.length()<1) {
		raiseError("Cannot find filename in path: "+path);
	}
	
	return res[0];
}

function integer generate() {

	$out.0.field1 = extractFileName('C:\NH\KPI2\Requests\Reactive\New\Filename123.csv');

	return ALL;
}

I hope this helps.

That’s perfect!

Thank you for your help!

Nikki

Hello, I tried to use the example and it worked really well. I would like to ask you about some explanation of usage of the last part - ‘([\\]+)$’)[0]
I understand the first part but I’m not able to reuse this type of regular expression and I was not able to find deeper detail in manual. Thanks s lot.

Hi,

the regex was actually:

([^/\\]+)$

And its semantic is: take one or more non-slash and non-backslash chars from end of string. So from string “/my/path/to/file.txt” or “c:\my\other\path\file.txt” is extracted “file.txt”.

[^/\\] = all chars except / or \ (so the ^ inverts meaning)

Missing ^ is probably issue in your regex. Its semantics actually is: Take one or more \ from end of string and that is not what you are looking for, I guess. Try to rewrite it as

'([^\\]+)$')[0]

CTL uses Java regexes format, so besides of our manual (http://doc.cloveretl.com/documentation/ … sions.html) and links there you can find plenty of examples online, e.g. http://www.vogella.com/articles/JavaReg … ticle.html

I hope this helps.

Please help me
I used to always use source_name and sheet_name in metadata for excel files, but past source codes are unreacheable, it was 2 years ago.
Now I’ve changed direct file name setting from ${data_in}\1.xls to port:$0.Field1:source and source_name autofilling stopped working, while sheet_name continues doing it.

As I remember, source_name always worked regardless of filename setting way. Am I wrong and what should I do?

And on other hand, getting file names through port fails on one of three XLSreaders in my graph with very scanty message

ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node XLS_READER2 finished with status: Component has finished and input port 0 still contains some unread records.
ERROR [WatchDog] - Node XLS_READER2 error details:
INFO  [exNode_0_1390551613757_LEVEL1_LINES] - Number of commited records: 0

while direct filename setting is also OK

Version: 3.3.0.021P commercial

Hello,

Unfortunately, there is an issue in your version of CloverETL related to this functionality, please see the issue report here: https://bug.javlin.eu/browse/CL-2626

It is fixed since CloverETL version 3.3.2, so if you upgrade your CloverETL instalation, it should work as you expect.