ASCII file with 2 Level and optional delimiters

devmaha · April 12, 2014, 12:00am

I am trying to parse an ASCII file (example below) that has 2 levels of grouping. At the first level are headers like ‘HEADING’, ‘TITLE’, ‘PARA’ and ‘FOOTNOTE’ and within each of these groups there are sub-headers like ‘author’. ‘date’ etc.
Each record always starts with ‘HEADING’ and while the sequence of these headers is fixed they may or may not be present {so after HEADING, TITLE could be missing and the next heading would then be PARA}


HEADING
data

TITLE
anchor data
file data

PARA
author data
date data
file data

FOOTNOTE
citation data

HEADING
data

TITLE
anchor data
file data
...
...

Here is how I am processing this
Step1: <> Reads the file and creates records with each first level header’s content as separate fields. Since each record always starts with ‘HEADING’ I am using that as record level delimiter (and other headings like FOOTNOTE as field level delimiter)

Step2:<> feeds to multiple <> to extract level2 data from each input record {So here one of the readers with break down say field ‘TITLE’ to a record containing fields ‘anchor’ and ‘file’, another reader will break down field ‘PARA’ to a record containing ‘author’, ‘date’ and ‘file’}

Step3: Each <> writes data to appropriate table using <>

Now I have 2 questions
Some of the headings like ‘FOOTNOTE’ may or may not be present. And when they are not present, the processing fails saying that ‘too few records’. In the metadata, I tried setting ‘nullable’ as true for such fields but the error persisted. How do I tell clover metadata that a field is optional

My 2nd level Universal Readers have input set to something like this

port:$0.field2:discrete

but in each of these readers I also want a portion of

port:$0.field1

(which is the unique identifier). This is how to ensure that each second level record will have a unique ID identifying the parent record.

May be my whole approach is incorrect; would you be able to suggest a better one
CloverETL Designer Community Version: 3.5.0.058

imriskal · April 17, 2014, 12:23pm

Hi,

We have a special component ComplexDataReader for such non-homogenous data in commercial Designer. It serves for exactly this purpose.

http://doc.cloveretl.com/documentation/ … eader.html

It would be quite difficult and inconvenient to emulate functionality of this component with UniversalDataReaders and Reformats. I would read the input file using metadata with one string field and new line character as a record delimiter and then parse the lines with some set of rules in Reformat. You will probably also need a few global variables to store headers you already read.

Regards,

Topic		Replies	Views
Applying Conditions to Delimiter CloverDX Platform	4	0	July 28, 2012
Metadata Question CloverDX Platform	8	2	May 19, 2010
Processing of Headerless Flat File CloverDX Platform	1	0	December 7, 2009
Is this a bug in the DataParser? CloverDX Platform	1	3	July 29, 2011
Bug in DataParser CloverDX Platform	2	1	December 13, 2011

ASCII file with 2 Level and optional delimiters

Related topics