Read in a line of text into each record

Support/help with CloverETL (4.9) and CloverDX (5.0 or newer) implementation problems

Paulhbartosik
Posts: 20
Joined: Wed Sep 06, 2017 4:14 pm

Read in a line of text into each record

Postby Paulhbartosik » Mon Apr 08, 2019 5:58 pm

This seems a simple question. Sorry if this has been answered elsewhere.

I want to load lines of text from a flat file. The data is delimited, but is not clean. So, I want to load each line as a record, and then use some reformats to clean it up and parse it into fields.

Can I do this with the Universal_data_reader component? It looks like I must set up my metadata with the correct combination of field and record delimiters, maybe?

I keep on getting the error
" Component [UniversalDataReader:UNIVERSAL_DATA_READER] finished with status ERROR. (Out0: 0 recs)
Parsing error: Field delimiter was not found in record 1, field 1 ("TextLine"), metadata "rawText"; value: 'SimpleDataParser does not provide raw record.'

or

Component [UniversalDataReader:UNIVERSAL_DATA_READER] finished with status ERROR. (Out0: 0 recs)
Parsing error: Unexpected default field delimiter, probably record has too many fields. in record 1, field 1 ("TextLine"), metadata "rawText"; value: '<Raw record data is not available, please turn on verbose mode.>'

darvehng
Posts: 12
Joined: Mon Apr 08, 2019 4:40 pm

Re: Read in a line of text into each record

Postby darvehng » Tue Apr 09, 2019 4:22 pm

Hi Paul,

Thank you for reaching out to us. Yes, you can use the Universal Data Reader to accompish this task. I've attached a graph that somewhat replicates your use-case. As you can see from the graph, Paul, I read in a simple text file using the FlatFileReader; then I created a user-defined meta-data with one field (field1); I want each record to occupy field1 and thus I changed the default delimiter property from "|" to a blank value. I also changed the EOF (End of File) property to the value "true" because as a requirement, all valid CSV files read by the Universal Data Reader must contain an empty line at the end, per CSV specification. Since it's not a necessity for our text file, we need to set the EOF as an alternative record delimiter to be able to parse the file correctly. With that being said, Paul, kindly change your meta-data configuration to reflect the one in the attached graph and please let me know if that fixes your issue.

Best Regards,
George Darvehn
Attachments
input.txt
(1.03 KiB) Downloaded 105 times
read_by_lines.grf
(982 Bytes) Downloaded 109 times
---
George Darvehn
CloverCARE Support
CloverDX

Visit us online at http://www.cloverdx.com

Paulhbartosik
Posts: 20
Joined: Wed Sep 06, 2017 4:14 pm

Re: Read in a line of text into each record - Please Help. Still failing

Postby Paulhbartosik » Thu Dec 12, 2019 6:31 pm

I have been banging away at this for a day. This has got to be something stupid that I am missing.

My goal is to read a file that has newlines instead of NL/CR. This should be simple. Every line in the input file should come in as a single record. I will later parse these single records out.

I can't get past the first step. Please see the included nonworking example. Any help would be greatly appreciated.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<Graph author="phb05" created="Thu Dec 12 09:39:35 EST 2019" guiVersion="5.2.0.30" id="1576163409713" licenseCode="CLP1DHEALT19932837BY" name="readnewlinedelimited" showComponentDetails="true">
<Global>
<Metadata id="Metadata1">
<Record fieldDelimiter="\r\n" name="inputFormat" previewAttachmentCharset="UTF-8" recordDelimiter="\r\n" type="delimited">
<Field delimiter=" " name="outputStream" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata0">
<Record fieldDelimiter="\n" name="outputFormat" previewAttachmentCharset="UTF-8" recordDelimiter="\r\n" type="delimited">
<Field delimiter=" " name="outputStream" type="string"/>
</Record>
</Metadata>
<GraphParameters>
<GraphParameterFile fileURL="workspace.prm"/>
</GraphParameters>
<RichTextNote backgroundColor="FAF6D6" folded="false" fontSize="medium" height="275" id="Note0" textColor="444444" width="202" x="433" y="108">
<attr name="text"><![CDATA[h3. Generate a single line of text with multiple new line characters
]]></attr>
</RichTextNote>
<RichTextNote backgroundColor="FAF6D6" folded="false" fontSize="medium" height="275" id="Note1" textColor="444444" width="275" x="712" y="98">
<attr name="text"><![CDATA[h3. Try to read this in as multiple records]]></attr>
</RichTextNote>
<Dictionary/>
</Global>
<Phase number="0">
<Node fileURL="port:$0.outputStream:discrete" guiName="FlatFileReader" guiX="752" guiY="211" id="FLAT_FILE_READER" type="FLAT_FILE_READER"/>
<Node guiName="Single record" guiX="456" guiY="222" id="SINGLE_RECORD" type="GET_JOB_INPUT">
<attr name="mapping"><![CDATA[//#CTL2

// Transforms input record into output record.
function integer transform() {
    $out.0.outputStream = "Line One"+"\n"+ "Line Two"+ "\n"+ "Line Four"+ "\n";

   return ALL;
}

// Called during component initialization.
// function boolean init() {}

// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
// required by the transform. All resources allocated within this method should be released
// by the postExecute() method.
// function void preExecute() {}

// Called only if transform() throws an exception.
// function integer transformOnError(string errorMessage, string stackTrace) {}

// Called during each graph run after the entire transform was executed. Should be used to free any resources
// allocated within the preExecute() method.
// function void postExecute() {}

// Called to return a user-defined error message when an error occurs.
// function string getMessage() {}
]]></attr>
</Node>
<Node guiName="Trash" guiX="1084" guiY="171" id="TRASH" type="TRASH"/>
<Edge fromNode="FLAT_FILE_READER:0" guiBendpoints="" guiRouter="Manhattan" id="Edge3" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (output)" toNode="TRASH:0"/>
<Edge fromNode="SINGLE_RECORD:0" guiBendpoints="" guiRouter="Manhattan" id="Edge0" inPort="Port 0 (input)" metadata="Metadata1" outPort="Port 0 (out)" toNode="FLAT_FILE_READER:0"/>
</Phase>
</Graph>

cholastal
Posts: 137
Joined: Tue Sep 01, 2015 1:22 pm

Re: Read in a line of text into each record

Postby cholastal » Tue Dec 17, 2019 3:39 pm

Hi Paul,

Yes, the devil is in the details. The issue was that you defined the "\n" as the field delimiter while you really need it as the record delimiter. That's why the parser complained about too many fields, etc. So you you need to set the record delimiter to "\n" and the field delimiter to nothing (empty string) as you don't want to separate any fields. PFA your modified graph to see what I changed on the metadata and let me know if it works for you.

Also, as a side note please notice this: https://doc.cloverdx.com/latest/designer/defining-non-default-delimiter-for-field.html#defining-non-default-delimiter-for-field, mainly the important block as this behavior has the potential to confuse some users.

Best regards.
Attachments
readnewlinedelimited.grf
(2.98 KiB) Downloaded 69 times

---
Lukas Cholasta
CloverCARE Support
CloverDX

Visit us online at http://www.cloverdx.com

Paulhbartosik
Posts: 20
Joined: Wed Sep 06, 2017 4:14 pm

Re: Read in a line of text into each record

Postby Paulhbartosik » Wed Dec 18, 2019 4:11 pm

Lukas,
This fixed the problem. Thank you for your help.


cron