Emoji Unicode Characters in a File

Support/help with CloverETL (4.9) and CloverDX (5.0 or newer) implementation problems

hneff1
Posts: 16
Joined: Fri May 22, 2015 7:53 pm

Emoji Unicode Characters in a File

Postby hneff1 » Mon Nov 14, 2016 5:48 pm

Does clover have a function to determine is a file contains emoji unicode characters? I want to raise an error if a file contains such characters.

http://apps.timwhitlock.info/emoji/tables/unicode

I cannot find anything about this in the documentation.

Thanks!

imriskal
Posts: 397
Joined: Wed Aug 15, 2012 8:18 am

Re: Emoji Unicode Characters in a File

Postby imriskal » Wed Nov 16, 2016 4:16 pm

Unfortunately, CloverETL does not support emojis by default. No CTL function can help you directly with emojis. Also, I am not sure how the emojis are represented in your files, what encoding is used etc.

1) Do you have any sample file containing emojis, please? Could you post it here or send via email?
2) How big your input files are?
3) Do the input files contain just text or also some structured data?
4) Are they binary files or plain text files?

There is a java library that could be useful if you decide to write your own java transformation.
You can also use find function of CTL and look for regular expressions
---
Lubos Imriska
CloverCARE Support
CloverDX

Visit us online at http://www.cloverdx.com

hneff1
Posts: 16
Joined: Fri May 22, 2015 7:53 pm

Re: Emoji Unicode Characters in a File

Postby hneff1 » Sat Nov 19, 2016 7:02 pm

ClientFile_20161102.txt
(7.7 KiB) Downloaded 245 times


Thanks for the response.

In regard to your questions:

1) Do you have any sample file containing emojis, please? Could you post it here or send via email? I will attach here.
2) How big your input files are? Input files can range from KB to a few MB
3) Do the input files contain just text or also some structured data? The files are structured, typically pipe delimited. The emoji characters have been found in various fields
Other emoji characters have been found in other fields in other files, and our goal is to just raise an error when any such character is found and return the file to the client to be corrected as they see fit, sent back to us, and reprocessed.
4) Are they binary files or plain text files? Delimited text files

Thanks!
Heather

imriskal
Posts: 397
Joined: Wed Aug 15, 2012 8:18 am

Re: Emoji Unicode Characters in a File

Postby imriskal » Mon Nov 21, 2016 2:15 pm

Thanks for the responses. Do you really want to focus only on emojis? Are other non-ascii characters allowed?

If you want to check for all non-ascii characters, we have CTL functions like isAscii(string arg) or even removeNonAscii(string arg) that could be useful.

If other non-ascii characters are allowed and you want to remove only emojis, I am affraid that the suggested java library or the CTL functions like find(string arg, string regex) or replace(string arg, string regex, string replacement) with regular expressions are the only reasonable options that come to my mind.
---
Lubos Imriska
CloverCARE Support
CloverDX

Visit us online at http://www.cloverdx.com