REPLACING non-ascii/non printable characters

Hello,

I have some records I need to consume that may contain accent marks and non-ascii (non-printable) characters.
I was using something like this:

function integer transform() {
$out.0.content = removeDiacritic(removeNonAscii($in.0.content));
return OK;
}

but while it replaces the letter with the accent mark with a non-accent letter, it also removes the character rather then allowing one to replace a nonAscii character…

I was hoping to tweak things a little to replace anything outside a characterset and use ‘!’ as a negation indicator…

function integer transform() {
$out.0.content = replace($in.0.content,“![a-zA-Z 0-9]” ,“*”);

return OK;
}

Is there an indicator I can use as a regex in the formula to indicate nonAscii and non printable characters - that would probably be the best choice for me…I tried

“[\w]” - “[^\W]” - “[^\w]”

Am I close?
:slight_smile:

Hi,

I guess you used wrong order of operations :slight_smile: It should be:


$out.0.content = removeNonAscii(removeDiacritic($in.0.content));

So removeDiacritic transforms “Žluťoučký kůň :yin_yang:” to “Zlutoucky kun :yin_yang:” first. Then removeNonAscii transform “Zlutoucky kun :yin_yang:” to "'Zlutoucky kun ".

My input content is of fixed length…
$out.0.content = removeNonAscii(removeDiacritic($in.0.content));

Is there a way to remove the nonAscii and maintain the spacing?

So…if the content is a name field with an extra non ASCII character I need to have the field length stay the same or it will throw off the rest of the record field assignments in the metadata.
so…in effect i need “Zlutoucky kun :yin_yang:” to change to "Zlutoucky kun " or “Zlutoucky kun *” - basically I just need to maintain the spacing.

Any ideas? Is replace() my best bet?

Hello, Shakespeare101,

I still do not get why you want to use remove functions. These functions remove characters without any replacement so your result strings would obviously be shorter. If I get you right, function replace(string, string, string) would be much better for you. The second string argument of this function is a regular expression.

For example, this function replaces any NonAscii character with asterisk character:

$out.0.content = replace($in.0.content,"[^\\x00-\\x7F]","*");

Try to google something like “regexp non ascii characters” for more examples. Also, keep in mind that special characters in regular expression in CTL has to be escaped with backslashes.

And the last thing, I believe space is an Ascii character, see http://www.asciitable.com/index/asciifull.gif
Therefore removeNonAscii(string) maintains spacing automatically.

I hope I helped you.

Best regards,