Reading special characters from input file

Hi,

Could you please tell how to read special characters from an input file. for eg. " 'snarasimhamurthy " or " #abcd".
Is there any functionailty in clover etl to read special characters.
We have tried setting the character set to USASCII, UTF-8 and ISO8859-1 but it is not working and we are getting error when we set character set to UTF-8. Please find the error below.

ERROR [DATA_READER0_0] - An error occured while skipping records in file C:\TOF_nullvalues.csv, the file will be ignored
org.jetel.exception.JetelException: Can not find a record delimiter. caused by: java.io.IOException: MALFORMED[1] when converting from UTF-8
at org.jetel.data.parser.DataParser.findFirstRecordDelimiter(DataParser.java:695)
at org.jetel.data.parser.DataParser.skip(DataParser.java:862)
at org.jetel.util.MultiFileReader.skip(MultiFileReader.java:342)
at org.jetel.util.MultiFileReader.nextSource(MultiFileReader.java:286)
at org.jetel.util.MultiFileReader.init(MultiFileReader.java:119)
at org.jetel.component.DataReader.preExecute(DataReader.java:231)
at org.jetel.graph.Node.run(Node.java:416)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: MALFORMED[1] when converting from UTF-8
at org.jetel.data.parser.DataParser.readChar(DataParser.java:587)
at org.jetel.data.parser.DataParser.findFirstRecordDelimiter(DataParser.java:687)
… 7 more
ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node DATA_READER0 finished with status: ERROR caused by: MALFORMED[1] when converting from UTF-8 when parsing record #1 field USER_NAME
ERROR [WatchDog] - Node DATA_READER0 error details:
java.lang.RuntimeException: MALFORMED[1] when converting from UTF-8 when parsing record #1 field USER_NAME
at org.jetel.data.parser.DataParser.parseNext(DataParser.java:445)
at org.jetel.data.parser.DataParser.getNext(DataParser.java:168)
at org.jetel.util.MultiFileReader.getNext(MultiFileReader.java:415)
at org.jetel.component.DataReader.execute(DataReader.java:261)
at org.jetel.graph.Node.run(Node.java:425)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: MALFORMED[1] when converting from UTF-8
at org.jetel.data.parser.DataParser.readChar(DataParser.java:587)
at org.jetel.data.parser.DataParser.parseNext(DataParser.java:368)
… 5 more
INFO [WatchDog] - [Clover] Post-execute phase finalization: 0
INFO [WatchDog] - [Clover] phase: 0 post-execute finalization successfully.
INFO [WatchDog] - Execution of phase [0] finished with error - elapsed time(sec): 0
ERROR [WatchDog] - !!! Phase finished with error - stopping graph run !!!
INFO [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB)
INFO [WatchDog] - 0 ERROR 0 430680
INFO [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO [WatchDog] - WatchDog thread finished - total execution time: 0 (sec)
INFO [main] - Freeing graph resources.
ERROR [main] - Execution of graph failed !

Regards,
Prerana

I’m afraid if those special characters (byte values) are not valid characters under selected encoding, you will get such errors.

For example, any byte value above 127 is not valid in USASCII encoding. In UTF-8 only certain combinations of >127 byte values are valid UTF-8 character. But for example ISO-8859-1 accepts all byte values successfully and should not produce the error you’ve shown.

Another option would be to parse such strings as byte(array) in CloverETL and then further process it for example in Reformat component with some of the Conversion Functions (e.g. byte2str() on preprocessed byte array) or maybe String Functions (removeNonAscii() or removeNonPrintable(), etc.).