EOF marker x1A throwing csv input

I’ve downloaded a pkzip’d file from the mainframe and unzipped it. The final record is completed with \r\nx1A. The end of file marker is causing the delimited csv parser to fail. The first field of the record is a number. And UniversalDataReader says it cannot process the x1A in a number field.

If I strip off the x1A, everything process correctly. But I am unable to do that when my file contains 2.4mm records.

Thank you for having a really good parser. Some of my numbers having trailing spaces, which your parser handles. Other products choke on it, forcing me to prepass the data to strip off the whitespace.

Thanks again for a good product,
dvn

Hello,

thank you for the praise, I’ll forward it to our development team.

As for your problem, I’ve compiled an example graph which shows two different workarounds for your problem. I think the best approach is to set the policy on DataReader to lenient and then maybe dump the bad records into an extra file so you can manually review them later.

I tried your suggestions this morning. The failure is not treated as an error record, it is not written to the error output, and the count is zero. Below is the log output.

It appears all of the records in the final block do not get committed. 2,349,304 rows are read, the last one is in error, and only 2,346,741 are committed.

One more praise for you. I’ve been testing this load against dm/express (v 4.4). It has no driver for sqlite and is unable to bulk load like Clover. What took about two minutes for Clover — I killed dm/express after 20 minutes. It must have been trying to commit after every read. Completely unacceptable. Kudos to you, again.


INFO  [main] - ***  CloverETL framework/transformation graph, (c) 2002-2012 Javlin a.s, released under GNU Lesser General Public License  ***
INFO  [main] - Running with CloverETL library version 3.3.0.M2 build#074 compiled 02/05/2012 13:21:39
INFO  [main] - Running on 2 CPU(s), OS Windows 2003, architecture x86, Java version 1.6.0_20, max available memory for JVM 253440 KB
INFO  [main] - Loading default properties from: defaultProperties
INFO  [main] - Graph definition file: graph/load_cdi_keys.grf
INFO  [main] - Graph revision: 1.13 Modified by: dnielsen Modified: Fri May 25 06:55:45 EDT 2012
INFO  [main] - Checking graph configuration...
INFO  [main] - Graph configuration is valid.
WARN  [main] - Incompatible Clover & JDBC field types - field seqno. Clover type: integer, sql type: VARCHAR
INFO  [main] - Graph initialization (load_cdi_keys)
INFO  [main] - Initializing connection:
INFO  [main] - DBConnection driver[org.jetel.connection.jdbc.driver.JdbcDriver@1185844]:jndi[null]:url[jdbc:sqlite:d:/cloveretl/projects/test_ii/data-out/test.db]:user[null] ... OK
INFO  [main] - [Clover] Initializing phase: 0
INFO  [main] - drop table if exists cdi_keys
INFO  [main] - 
create table cdi_keys (
  seqno integer,
  addr integer,
  resi integer,
  hhld integer,
  indv integer
)
INFO  [main] - [Clover] phase: 0 initialized successfully.
INFO  [main] - [Clover] Initializing phase: 1
INFO  [main] - [Clover] phase: 1 initialized successfully.
INFO  [main] - register MBean with name:org.jetel.graph.runtime:type=CLOVERJMX_1337799034753_0
INFO  [WatchDog] - Pre-execute initialization of connection:
INFO  [WatchDog] - DBConnection driver[org.jetel.connection.jdbc.driver.JdbcDriver@1185844]:jndi[null]:url[jdbc:sqlite:d:/cloveretl/projects/test_ii/data-out/test.db]:user[null] ... OK
INFO  [WatchDog] - Starting up all nodes in phase [0]
INFO  [WatchDog] - Successfully started all nodes in phase!
INFO  [WatchDog] - [Clover] Post-execute phase finalization: 0
INFO  [WatchDog] - [Clover] phase: 0 post-execute finalization successfully.
...

ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node DATA_READER0 finished with status: ERROR caused by: Parsing error: Unexpected end of file in record 2349304, field 1 ("seqno"), metadata "cdi_keys_csv"; value: ''
ERROR [WatchDog] - Node DATA_READER0 error details:
org.jetel.exception.BadDataFormatException: Parsing error: Unexpected end of file in record 2349304, field 1 ("seqno"), metadata "cdi_keys_csv"; value: ''
	at org.jetel.data.parser.DataParser.parsingErrorFound(DataParser.java:560)
	at org.jetel.data.parser.DataParser.parseNext(DataParser.java:538)
	at org.jetel.data.parser.DataParser.getNext(DataParser.java:179)
	at org.jetel.util.MultiFileReader.getNext(MultiFileReader.java:416)
	at org.jetel.component.DataReader.execute(DataReader.java:269)
	at org.jetel.graph.Node.run(Node.java:416)
	at java.lang.Thread.run(Thread.java:619)
INFO  [exNode_0_1337799034753_SQLITE_CDI_KEYS] - Number of commited records: 2346701
INFO  [WatchDog] - [Clover] Post-execute phase finalization: 1
INFO  [WatchDog] - [Clover] phase: 1 post-execute finalization successfully.
INFO  [WatchDog] - Execution of phase [1] finished with error - elapsed time(sec): 217
ERROR [WatchDog] - !!! Phase finished with error - stopping graph run !!!
INFO  [WatchDog] - Post-execute finalization of connection:
INFO  [WatchDog] - DBConnection driver[org.jetel.connection.jdbc.driver.JdbcDriver@1185844]:jndi[null]:url[jdbc:sqlite:d:/cloveretl/projects/test_ii/data-out/test.db]:user[null] ... OK
INFO  [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO  [WatchDog] - Phase#            Finished Status         RunTime(sec)    MemoryAllocation(KB)
INFO  [WatchDog] - 0                 FINISHED_OK                        0              5145
INFO  [WatchDog] - 1                 ERROR                            216              9476
INFO  [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO  [WatchDog] - WatchDog thread finished - total execution time: 217 (sec)
INFO  [main] - Freeing graph resources.
ERROR [main] - Execution of graph failed !

I’ve attached a small file that trips the problem, and the graph I’ve employed to load it.

dm/express tripped on the character, also. But it flashed a warning, and completed the processing without dropping any records.

Notepad++ loads the file and identifies the character. It labels it as SUB; but its value is x1A. Now I am starting to wonder if there are multiple instances of that character.

dvn

Hello,

you need to select the last field of your metadata, scroll to the very bottom of the properties on the right and set true on “EOF as delimiter”. You can view that in the example as well. Otherwise the line is not recognized as record, since it doesn’t contain any delimiter.
The omitted commit is correct. If there’s an error in the batch, it’s not commited. That’s what happened.
If you want to commit after every record you have to check the “Atomic SQL query” on DBOutputTable component.

This is weird. It has been set; it is set in the graph sent to you.

<Field eofAsDelimiter="true" label="individual" name="indv" type="integer">
<attr name="description"><![CDATA[individual level key]]></attr>
</Field>
</Record>

The only time I get this to successfully load is if I edit the file from the mainframe, make sure the last record ends crlf, and I delete the eof marker. Any other variation fails.

Is eof not used anymore? I look at other files on the network and they do not end with x1A.

Hello,

I see the problem, I made a typo in my last post. Sorry for that. You need to turn on the “EOF as delimiter” on the FIRST field of your metadata, not last as you have.
The “EOF as delimiter” presents an alternative option for your delimiter. If the universal data reader finds no delimiters on the input, it fails this as “not a record”. It’s only evaluated after at least one delimiter is recognized. It’s a fine difference. In your case the first delimiter is set as a semicolon which is not there, but if you set the first delimiter as semicolon or EOF, it will go through.
I turned on the “EOF as delimiter” option on the “seqno” field and set the data policy as “controlled” and your example graph runs well.