Hi,
I’m really new to ETL and am trying to evaluate a bunch of different tools to see what fits our needs.
One of the things I’m trying to do is modify a CSV file, but so far I’m running into some issues.
I have the CSV file (about 700MB) loaded in via universaldatareader. I’m trying to use extfilter (I think this is what I want) to search one column - named domain, for a particular domain name - lets say test.com.
I can’t figure out how to use extfilter to do this. I tried using the string find(string,string) function, but I can’t really find any documentation that tells me how to really use it. I tried doing something similar to below but this didn’t work.
Hi Jan,
Thanks.
I’m surprised that find can’t do it on it’s own, or perhaps I’m just using it wrong. The first one is what you suggested, and it works, but finds things like blahblahrim.com instead of just rim.com.
Ok, I see the issue. I expected that you want to match anything that contains rim.com. If you just want to test whether the value is rim.com or not, you can use simple expression:
upperCase(nvl($0.Domain, "")) == "RIM.COM"
If all the data is in lowercase, you can also use just
$0.Domain == "rim.com"
For the sake of completeness, function find() is used to search for an occurrences of a string (regular expression) within text. That is the reason why it matches blahblahrim.com for search term rim.com.
For what you are trying to achieve, I would recommend using function matches(string, regex), instead of find(). Function matches() returns true if whole input matches regular expression.
matches(upperCase(nvl($0.Domain, "")), "RIM.COM")
You can also use case insensitive regular expression matching by specifying case-insensitive flag:
matches(nvl($0.Domain, ""), "(?i)RIM.COM")
Using regular expressions for matching against fixed strings has potentially little lower performance that comparison using equality sign. However, unless you are working on millions of records you won’t really see any difference.
Equality operator == is case sensitive. Therefore, if the value is lowercase, you can use
$0.Domain == "rim.com"
If you know that data you are testing against are uppr-case, you can use