I’m attempting to use the “Extract from flat file” Metadata creator, on a set of flat, column-delimited data. The automatic field parsing works for the most part, but I’m running into an issue:
Some of the entries in my Field #10 are phrases contained in double quotes. So when this string contains a column (e.g. “The time is now 12:00pm”), the parser incorrectly breaks the phrase into a new field:
Field10 | Field11
"The time is now 12 | 00pm"
And consequently detects too many fields in the record. If all entries in Field #10 were quoted strings, I would simply set the delimiters as:
# | Name | Type | Delimiter
9 | Field9 | string | :"
10 | Field10| string | ":
But since only SOME entries of Field #10 are double-quoted phrases (the rest are blank/null), that won’t work. So, is there a way to set up an exclusion rule that basically says “Ignore the delimiter if it’s contained in the RegEx of double quotes”?
Hello,
“Extract from flat file” Metadata creator can’t recognize this correctly, but the DataReader component can parse such data properly. Just modify the number of fields detected by Wizard and set quotedStrings attribute in DataReader to true.
I’ve also created a request for improving the Metadata wizard (http://bug.cloveretl.org/view.php?id=5351).
I had actually read the manual’s definition of UniversalDataReader->Quoted Strings, but from the description it just sounded like it removes the single/double quotes from a phrase. I didn’t realize that setting it to TRUE would also ignore any delimiters found inside.
Hi, other than cleaning up the data manually in source before bringing into clover how do I deal with quotes inside a field without any other characters either before the opening quote or after the closing quote? E.g. “John Smith”,““Middletown””,“Petersborough”,
or “CXAWAY”,“Customer “Gone Away””,
This issue is of course currently causing clover to misinterpret the number of fields within a row giving inconsistent row lengths which causes a parse error.
I could use regular expressions in notepad++ to clean up the data file before loading into clover however this is a non ideal solution because some files I have to work with are too big to open in fully in memory and also it’s an additional manual step that needs to be done and really would like all the data manipulations to be done within clover.
Hi, other than cleaning up the data manually in source before bringing into clover how do I deal with quotes inside a field without any other characters either before the opening quote or after the closing quote? E.g.
This issue is of course currently causing clover to misinterpret the number of fields within a row giving inconsistent row lengths which causes a parse error.
I could use regular expressions in notepad++ to clean up the data file before loading into clover however this is a non ideal solution because some files I have to work with are too big to open in fully in memory and also it’s an additional manual step that needs to be done and really would like all the data manipulations to be done within clover.