DataReader: error parsing data that contains double quotes

Hi,

If my data looks like this:

“ref”,“buyer”,“amount”
“1000-1"a”,“albert”,“10.00”
“1000-2"b”,“tony”,“20.00”

looks like DataReader cannot parse the record correctly since i have an embedded double-quote in the my first field “ref”, e.g 1000-1"a…

i read from this posting (http://forum.cloveretl.org/viewtopic.php?t=358) that the embedded single quote in the data was fixed, but how about double-quotes?

i thought Clover just need to read the start of a quote to the quote that is right before the delimiter and treat the string read as input data, right?

or is there workaround?

thanks,
al

Hello,
only different type of quote can be embedded into field value (double quote can be embedded in single quoted string and vice versa) in other cases value is regarded as imparseable.
The work around is to set quotedStrings attribute to false and strip the quotes in Reformat.

Hi Agata,

I cannot set quotedString to false because I have embedded deliimiter in my quoted data also… something like this:

“ref”,“buyer”,“amount”
“10,00-1"“a”,“albert”,“10.00”
“10,00-2”'b”,“tony”,“20.00”

This type of data file is created when I export from Excel to a CSV file…

Is there any other option to handle this case?

Thanks,
al

What about setting “,” as delimiter and stripping quotation in Reformat?

Hi Agata,

setting “,” as delimiter would not work if my numeric fields are non-quoted, right?

say my data is:

“ref”,“buyer”,“amount”
“10,00-1"“a”,“albert”,10
“10,00-2”'b”,“tony”,20

i will still have a problem with reading the data in “amount” field…

any other suggestions?

Thanks,
al

Hi,
my last idea is to set default delimiter as “,” and for numeric fields only , and for string fields before numeric field ",; something like that:

<Record fieldDelimiter="&quot;,&quot;" name="quotedData" recordDelimiter="\n" type="delimited">
<Field name="field1" type="string"/>
<Field delimiter="&quot;," name="field2" type="string"/>
<Field delimiter="," name="field3" type="numeric"/>
<Field name="field4" type="string"/>
</Record>

Hi Agata,

your suggestions would all be good if there is a strict format for the data…

for my case, setting individual field delimiter would not work if the data i am getting can have these possibilities:

1. for numeric fields, they can be quoted or unquoted… e.g “200” or 200…
2. for string fields, they might contain a embedded quotes or embedded comma (which is also my field delimiter)… e.g “abc"def” or “abc’def” or “abcdef” or “abc,def”

is my only option read the each record as 1 big field and parse it myself??

Thanks,
al

Hi,
we talked about it in our development team and conceded, that there couldn’t be universal parser for such data.
If you don’t have strict format, you have to parse it yourself :frowning:

Hi Agata,

Ok. I will parse it.

Thanks for all your suggestions.

al