Filtering nodes do not support regular expressions are intended to characters in the \ w \ s, and so on, does not support multi-language character sets.
eg.
$Field1 ~= “^[0-9]\w.*”
substring($Field2,0,2) == “黄威”
Can support?
Filtering nodes do not support regular expressions are intended to characters in the \ w \ s, and so on, does not support multi-language character sets.
eg.
$Field1 ~= “^[0-9]\w.*”
substring($Field2,0,2) == “黄威”
Can support?
Can you try to use unicode escape sequence in place of the characters (both for the regex and the substring) ?
Also, I am not sure what problem are you describing - is it that 黄威 are not recognized as \w ?
Sorry, I am not familiar with Asian alphabets and need a hint here.
table structure:
create table t1 (f1 varchar(50), f2 varchar(50));
record content:
黄威 20071976北京
huangwei 20071976beijing
extFilter node expression:
$f2 ~= ‘^[0-9]{8}[a-z]*’
outPort (0) output record
huangwei 20071976beijing
outPort (1) output record
黄威 20071976北京
----------------------------------------------------------
I want outPort (0) to output content below
黄威 20071976北京
----------------------------------------------------------
extFilter node expression:
$f2 ~= ‘^[0-9]{8}\p{InHanzi}*’
output error info:
ERROR [WatchDog] - EXT_FILTER_0 …FAILED !
Parser error when parsing expression: Encountered “\'^[0-9]{8}” at line 1, column
Was expecting:
<STRING_LITERAL> …
extFilter node expression:
substring($f3,8,2)==‘北京’
outPort (0) output record 0
outPort (1) output record 2
黄威 20071976北京
huangwei 20071976beijing
If you use \ (backslash) in your regex string in transform language, you have to escape it - like this:
$f2 ~= '^[0-9]{8}\\\\p{InHanzi}*'
The reason why is that the backslash gets preprocessed twice - first when the expression is read from XML and \\ is preprocess to \ and then again the TL language parser preprocesses \\ to \ - then it gets to Java’s regex evaluator.
We will try to fix this nuisance (in 2.3.x and earlier) in next release of Clover.
I will check the rest of the problem too, but check the updated expression above.
ERROR [WatchDog] - EXT_FILTER_0 …FAILED !
Error when parsing expression: Illegal repetition near index 11
^[0-9]{8}\\p{InHanzi}*
--------------------------------
substring($f2,8,2)==‘北京’
Substring function Why not support the “北京”?
Well,interesting problem with the regex… I will see to it …
As for the substring - try to use unicode escape (\uxxxx) in place of the two chars - you will have to find their unicode numbers.
I’ve found that such regex does not throw an exception:
“^[0-9]{8}[\\p{InHanzi}]*”
Thank you for your response,Substring function issue has been resolved
Cool,
can I ask you how did you solve it ?
Solutions to the inconvenient, the process is this.
D:\javasoft\Jdk1.5.0_04\bin>native2ascii
北京
\u5317\u4eac
extFilter node expression:
substring($f2,8,2)==‘\u5317\u4eac’
====================================
extFilter node expression:
$f2 ~= ‘^[0-9]{8}[\u4e00-\u9fa5]*’
[\u4e00-\u9fa5] On behalf of the Asian Regional Character Set,This realization is some trouble