DATA_READER0: MALFORMED[1] when converting from US-ASCII

Hello all,

I am a complete CloverETL noob. I’ve gotten a graph that works when I run it on a dataset from a flat file of 100 records.

However, when I try to run it on a larger flat file with 20996684 records (top 100 records are identical on larger file to sample file mentioned above), the graph errors out on me with:

LOG [main_file.grf-55-1363103297772.log]

ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node DATA_READER0 finished with status: ERROR caused by: MALFORMED[1] when converting from US-ASCII when parsing record #715030 field ID
ERROR [WatchDog] - Node DATA_READER0 error details:
java.lang.RuntimeException: MALFORMED[1] when converting from US-ASCII when parsing record #715030 field ID
at org.jetel.data.parser.DataParser.parseNext(DataParser.java:446)
at org.jetel.data.parser.DataParser.getNext(DataParser.java:169)
at org.jetel.util.MultiFileReader.getNext(MultiFileReader.java:321)
at org.jetel.component.DataReader.execute(DataReader.java:217)
at org.jetel.graph.Node.run(Node.java:388)
at java.lang.Thread.run(Thread.java:736)
Caused by:
java.io.IOException: MALFORMED[1] when converting from US-ASCII
at org.jetel.data.parser.DataParser.readChar(DataParser.java:588)
at org.jetel.data.parser.DataParser.parseNext(DataParser.java:383)
… 5 more
ERROR [WatchDog] - !!! Phase finished with error - stopping graph run !!!
INFO [WatchDog] - ----------------------** Final tracking Log for phase [0] **---------------------
INFO [WatchDog] - Time: 12/03/13 10:48:29
INFO [WatchDog] - Node Status Port #Records #KB aRec/s aKB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - DATA_READER0 ERROR
INFO [WatchDog] - %cpu:0.52 Out:0 368424 51927 40936 5769
INFO [WatchDog] - DATA_WRITER1 ABORTED
INFO [WatchDog] - %cpu:… In:0 0 0 0 0
INFO [WatchDog] - EXT_MERGE_JOIN0 ABORTED
INFO [WatchDog] - %cpu:… In:0 0 0 0 0
INFO [WatchDog] - In:1 0 0 0 0
INFO [WatchDog] - Out:0 0 0 0 0
INFO [WatchDog] - EXT_SORT0 ABORTED
INFO [WatchDog] - %cpu:0.58 In:0 368424 51927 40936 5769
INFO [WatchDog] - Out:0 0 0 0 0
INFO [WatchDog] - MEMSEARCH0 ABORTED
INFO [WatchDog] - %cpu:… In:0 0 0 0 0
INFO [WatchDog] - Out:0 0 0 0 0
INFO [WatchDog] - REFORMAT0 ABORTED
INFO [WatchDog] - %cpu:… In:0 0 0 0 0
INFO [WatchDog] - Out:0 0 0 0 0
INFO [WatchDog] - SIMPLE_COPY0 ABORTED
INFO [WatchDog] - %cpu:… In:0 0 0 0 0
INFO [WatchDog] - Out:0 0 0 0 0
INFO [WatchDog] - Out:1 0 0 0 0
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - Execution of phase [0] finished with error - elapsed time(sec): 9
INFO [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB)
INFO [WatchDog] - 0 ERROR 9 48603
INFO [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO [WatchDog] - WatchDog thread finished - total execution time: 10 (sec)
INFO [main] - Freeing graph resources.
ERROR [main] - Execution of graph failed !

Here’s where it might get tricky to help me.
I don’t know what version of CloverETL I’m running, it is the version that came packaged with IBM Initiate version 9.5. I’m creating the graphs withing the clover portion of the Workbench product from IBM (based on Eclipse)…then, moving the job over to our server (RHEL 5) and firing off the job to be run by the Initiate engine, which must contain the Clover engine too? Anyway, that is a batch process.

The person that developed this graph, is a noob to Clover too, and unfortunately out of town for the next few weeks and I need to try to get this to run.

This graph ran fine from him on the small sample, but barfed on the larger one originally with an error about the ExtMergeJoin not receiving sorted data. I researched and added in an ExtSort object into the flow right after the data read, that seemed to clear that up. I set the temp directories to /tmp on the Linux server this is run on.

It now seems to run and stop on record # 715030

I’ve manually examined record 715030 in the flat file and it doesn’t seem to be malformed in any way…just a number. I’ve looked at the records before and after it, they look fine too. I’ve done a hex dump of these records and all look fine with \n as the line terminator, and that is defined in the graph as well. This flat file is pipe delimited.

Anyway, I tried a run on this same flat file, but with record #715030 removed…it stopped with same error complaining about 715030 which would be a different record this time, leading me to believe this error has more to do with file size than data problems?

If anyone has suggestions on where I might look to continue to troubleshoot, it would be greatly appreciated.

Thank you,

cayenne

Hi Cayenne,

It seems you are trying to read characters that are not valid under US-ASCII - similar issue was discussed in another thread (http://forum.cloveretl.com/viewtopic.php?f=4&t=5080). However, since you said the issue occurs even when you remove the record from the batch, you might want to send the graph and data to our email (support@cloveretl.com) so that we can look closer at this issue.

Hi Cayenne,

It seems you are trying to read characters that are not valid under US-ASCII - similar issue was discussed in another thread (http://forum.cloveretl.com/viewtopic.php?f=4&t=5080). However, since you said the issue occurs even when you remove the record from the batch, you might want to send the graph and data to our email (support@cloveretl.com) so that we can look closer at this issue.

“slechtaj”

Would the graph or anything I sent be posted publicly? Some information (fields, etc) are proprietary in the graph and if this is something that would be posted on the public facing internet, I’d have to go through and sanitize field names, etc…

Thank you,

C

Hi C,

we prefer to keep things on forum public, so other users can benefit from our answers. Therefore please adjust your graph so it can be published.

If you would be interested in solving this in private, we have also payed support. You can contact sales@cloveretl.com

Thank you.

Hi C,

we prefer to keep things on forum public, so other users can benefit from our answers. Therefore please adjust your graph so it can be published.

If you would be interested in solving this in private, we have also payed support. You can contact sales@cloveretl.com

Thank you.

“kubosj”

I’m trying to clean this graph up…but as I go along changing fields, it seems to almost be causing more errors showing. I thought I was pretty close and went into the actual .grf file, and WOW…that thing had all kinds of personally identifiable information in there, including connect information (user/pass) and server names. I’ve been trying to correct that by hand, but I’m afraid I’ll mess so much up there, that you won’t be able to tell what is broken on the actual item, vs what I’m breaking by changing parameter names, etc.

I guess I’ll contact sales. Again, we got this install of Clover through our purchase of IBM Initiate. Is there not some support that comes through that purchase and licensing from ya’ll?

Thank you,

C

Hi C,

When CloverETL is embedded in another application, the support is provided by provider of such an application. Therefore in this case, support should be primarily handled by IBM.