Applying Conditions to Delimiter

megreddy · November 2, 2010, 12:00am

I’m attempting to use the “Extract from flat file” Metadata creator, on a set of flat, column-delimited data. The automatic field parsing works for the most part, but I’m running into an issue:

Some of the entries in my Field #10 are phrases contained in double quotes. So when this string contains a column (e.g. “The time is now 12:00pm”), the parser incorrectly breaks the phrase into a new field:

Field10             | Field11
"The time is now 12 | 00pm"

And consequently detects too many fields in the record. If all entries in Field #10 were quoted strings, I would simply set the delimiters as:

 #  | Name   | Type   | Delimiter
 9  | Field9 | string | :"
10 | Field10| string | ":

But since only SOME entries of Field #10 are double-quoted phrases (the rest are blank/null), that won’t work. So, is there a way to set up an exclusion rule that basically says “Ignore the delimiter if it’s contained in the RegEx of double quotes”?

Thanks for the help!

avackova · November 3, 2010, 8:26am

Hello,
“Extract from flat file” Metadata creator can’t recognize this correctly, but the DataReader component can parse such data properly. Just modify the number of fields detected by Wizard and set quotedStrings attribute in DataReader to true.
I’ve also created a request for improving the Metadata wizard (http://bug.cloveretl.org/view.php?id=5351).

megreddy · November 3, 2010, 4:54pm

Thanks again for the help! That did the trick.

I had actually read the manual’s definition of UniversalDataReader->Quoted Strings, but from the description it just sounded like it removes the single/double quotes from a phrase. I didn’t realize that setting it to TRUE would also ignore any delimiters found inside.

sam · July 28, 2012, 1:23pm

Hi, other than cleaning up the data manually in source before bringing into clover how do I deal with quotes inside a field without any other characters either before the opening quote or after the closing quote? E.g. “John Smith”,““Middletown””,“Petersborough”,
or “CXAWAY”,“Customer “Gone Away””,

This issue is of course currently causing clover to misinterpret the number of fields within a row giving inconsistent row lengths which causes a parse error.

I could use regular expressions in notepad++ to clean up the data file before loading into clover however this is a non ideal solution because some files I have to work with are too big to open in fully in memory and also it’s an additional manual step that needs to be done and really would like all the data manipulations to be done within clover.

Many thanks in advance!

Sam

sam · July 28, 2012, 1:27pm

Hi, other than cleaning up the data manually in source before bringing into clover how do I deal with quotes inside a field without any other characters either before the opening quote or after the closing quote? E.g.

"John Smith","[color=#0080FF]"Middletown"[/color]","Petersborough",

or

 "CXAWAY","Customer [color=#0080FF]"Gone Away"[/color]",

This issue is of course currently causing clover to misinterpret the number of fields within a row giving inconsistent row lengths which causes a parse error.

I could use regular expressions in notepad++ to clean up the data file before loading into clover however this is a non ideal solution because some files I have to work with are too big to open in fully in memory and also it’s an additional manual step that needs to be done and really would like all the data manipulations to be done within clover.

Many thanks in advance!

Sam

Topic		Replies	Views
DataReader: error parsing data that contains double quotes CloverDX Platform	8	27	November 20, 2008
Old fashioned CSV parse error CloverDX Platform	3	2	November 11, 2010
Delimited FlatFile Reader - New to Clover CloverDX Platform	4	7	July 16, 2007
Delimited FlatFile reader - Question from New User CloverDX Platform	4	6	March 5, 2018
CSV file with non-standard delimiters and quote characters CloverDX Platform	1	7	March 16, 2010

Applying Conditions to Delimiter

Related topics