Quoting field and universal data reader

I am using 2.4.1. In this version with the univeral data reader, using a delimited file with quoted string will always fail with an error like this:

“WARN - Parsing error: Bad quote format in record # 19 in field # 1”

If I remove the quotes from the strings and remove the ‘quotedStrings=“true”’ attribute from the Reader, the same file loads with no errors. Can you please check this out? I can send sample data if you need it, but it’s really trivial to reproduce.

Thanks!
Peter

Hello Peter!

The warn message is probably caused by invalid quoting format. According the current parsing algorithm, immediately after finishing quoting character has to follow an appropriate field delimiter. I suppose
your data file doesn’t satisfy this condition. Can you please confirm this
idea?

For instance the white space gap:

“Martin” ;"Zatopek

Thanks, Martin.

Hello Martin,

Thanks for moving this conversation from email to the forum. Here are some additional details. I do not believe it is invalid or incorrect data.

Let me paste my file definition below, and a sample:

File Definition:


<Record name="PersonCSVWithProfession" type="delimited">
	<Field delimiter="," name="Profession" nullable="true" size="50" type="string"/>
	<Field delimiter="," name="LastName" nullable="false" size="35" type="string"/>
 	<Field delimiter="," name="FirstName" nullable="false" size="15" type="string"/>
	<Field delimiter="," name="MiddleName" nullable="true" size="12" type="string"/>
	<Field delimiter="," name="Suffix" nullable="true" size="3" type="string"/>
	<Field delimiter="," name="Address1" nullable="false" size="100" type="string"/>
	<Field delimiter="," name="Address2" nullable="true" size="100" type="string"/>
	<Field delimiter="," name="Address3" nullable="true" size="100" type="string"/>
	<Field delimiter="," name="City" nullable="false" shift="0" size="25" type="string"/>
	<Field delimiter="," name="StateAbbrev" nullable="false" size="2" type="string"/>
	<Field delimiter="\n" name="ZipCode" nullable="false" shift="0" size="5" type="string"/> </Record>

Sample Record:


"XY","SAMPLE","SAMPLE","J","","123 ANYWHERE ST","","","FALMOUTH","MA","02540"

Reader Definition:


		<Node id="inputFile" type="DATA_READER"
		    fileURL="${INPUT_FILE}"
	        dataPolicy="controlled"
	        skipFirstLine="true"
		quotedStrings="true"
		/> 

As you can see, the delimiter for every field directly follows the quotes. If I remove the quotes from the input file, and remove the ‘quotedStrings=“true”’, everything works as expected. Thus, I believe it’s a bug. I am using CloverETL 2.4.1.

Thanks in advance!

Peter, only quick response. It seems it’s really bug. If you need quick workaround, try to use the Delimited data reader instead. Tomorrow I’ll give you more info about this issue.

Martin

Peter, only quick response. It seems it’s really bug. If you need quick workaround, try to use the Delimited data reader instead. Tomorrow I’ll give you more info about this issue.

Martin

“mzatopek”

Thanks, unfortunately the delimited data reader didn’t work either - I don’t recall the reason now, but it was different. Let me know if you need any further information and I can supply. Thanks!

So, as I said that’s really bug in the Universal Data Reader component. We will release a fix update 2.4.2 next week. If you need to solve this issue earlier, you can download appropriate branch 2.4 from our public svn repository or I can also offer you to send the binary package as “pre-release” of 2.4.2. It’s up to you.

Martin

Peter, only quick response. It seems it’s really bug. If you need quick workaround, try to use the Delimited data reader instead. Tomorrow I’ll give you more info about this issue.

Martin

“mzatopek”

Thanks, unfortunately the delimited data reader didn’t work either - I don’t recall the reason now, but it was different. Let me know if you need any further information and I can supply. Thanks!

“pmularien”

We are also really interested in the mentioned bug of the Delimited Data Reader. If you again run into this issue, please post me a bug report. Thanks.

So, as I said that’s really bug in the Universal Data Reader component. We will release a fix update 2.4.2 next week. If you need to solve this issue earlier, you can download appropriate branch 2.4 from our public svn repository or I can also offer you to send the binary package as “pre-release” of 2.4.2. It’s up to you.

Martin

“mzatopek”

Thanks Martin, I will confirm the fix in SVN and then wait for the official 2.4.2 release. I appreciate the quick follow-up :slight_smile:

We are also really interested in the mentioned bug of the Delimited Data Reader. If you again run into this issue, please post me a bug report. Thanks.

“mzatopek”

I will try to narrow this down and report a bug.

Just to follow up - the quotes bug with the universal data reader is indeed fixed in 2.4.2 - thank you!

Well, hopefully you are still reading this. I ran into an issue with the quoted characters today - it turns out that a single quote is considered a quote character, so if I have a field like this:

“O’Neil”

The single quote is considered the “end of string” and the file parsing falls apart. I have never seen a product behave like this - is it a bug, and/or is there any way to alter this behavior?

Hello,
it was really bug, was solved and will be in today’s release.