Having a lot of trouble getting started with cloveretl

Hi,
I’m really new to ETL and am trying to evaluate a bunch of different tools to see what fits our needs.

One of the things I’m trying to do is modify a CSV file, but so far I’m running into some issues.

I have the CSV file (about 700MB) loaded in via universaldatareader. I’m trying to use extfilter (I think this is what I want) to search one column - named domain, for a particular domain name - lets say test.com.

I can’t figure out how to use extfilter to do this. I tried using the string find(string,string) function, but I can’t really find any documentation that tells me how to really use it. I tried doing something similar to below but this didn’t work.

//#CTL2
$0.Domain=find(TEST.COM, )

Is there any documentation that specifically deals with how to do string manipulation?

Totally stuck in ETL land…

Cheers,
-Derek

Hi Derek,

the CTL documentation is available at http://www.cloveretl.com/documentation/ … /ctl2.html. In particular, the documentation for string functions is available at http://www.cloveretl.com/documentation/ … -ctl2.html .

The code that should work for you is:


//#CTL2
indexOf(upperCase(nvl($0.Domain, "")), "TEST.COM") > -1

In case you need to be able to use regular expression, find function would be a better option:


//#CTL2
length(find(upperCase(nvl($0.Domain, "")), "TEST.COM")) > 0

nvl() and upperCase() functions are used just to ensure that there are no issues with empty values and cases hen domain is in mixed- or lower-case.

Regards,
Jan

Hi Jan,
Thanks.
I’m surprised that find can’t do it on it’s own, or perhaps I’m just using it wrong. The first one is what you suggested, and it works, but finds things like blahblahrim.com instead of just rim.com.

length(find(upperCase(nvl($0.Domain, “”)), “RIM.COM”)) > 0

I tried removing length and uppercase/nvl but neither seemed to work.

find(upperCase(nvl($0.Domain, “”)), “RIM.COM”)
find($0.Domain, “RIM.COM”)

I also tried to add a regex since you mentioned using find would be better for regex but ^RIM.COM$ doesn’t work.

What am I doing wrong?

Ok, I see the issue. I expected that you want to match anything that contains rim.com. If you just want to test whether the value is rim.com or not, you can use simple expression:


upperCase(nvl($0.Domain, "")) == "RIM.COM"

If all the data is in lowercase, you can also use just


$0.Domain == "rim.com"

For the sake of completeness, function find() is used to search for an occurrences of a string (regular expression) within text. That is the reason why it matches blahblahrim.com for search term rim.com.

How are regex’s defined? I would expect the below to work, but it doesn’t. The query is valid, but doesn’t filter any results?

//#CTL2
length(find(upperCase(nvl($0.Domain, “”)), “^RIM.COM$”)) > 0

also since you mentioned…
$0.Domain == “rim.com
does that mean that if it’s uppercase RIM.COM, I need to use upperCase?

Hi Derek,

Regular expressions used in CloverETL use the same syntax as those in java. Here is a link to documentation: http://docs.oracle.com/javase/6/docs/ap … ttern.html .

For what you are trying to achieve, I would recommend using function matches(string, regex), instead of find(). Function matches() returns true if whole input matches regular expression.

matches(upperCase(nvl($0.Domain, "")), "RIM.COM")

You can also use case insensitive regular expression matching by specifying case-insensitive flag:

matches(nvl($0.Domain, ""), "(?i)RIM.COM")

Using regular expressions for matching against fixed strings has potentially little lower performance that comparison using equality sign. However, unless you are working on millions of records you won’t really see any difference.

Equality operator == is case sensitive. Therefore, if the value is lowercase, you can use

$0.Domain == "rim.com"

If you know that data you are testing against are uppr-case, you can use

$0.Domain == "RIM.COM"

in other cases (mixed case),

upperCase(nvl($0.Domain, "")) == "RIM.COM"