I am trying to parse an ASCII file (example below) that has 2 levels of grouping. At the first level are headers like ‘HEADING’, ‘TITLE’, ‘PARA’ and ‘FOOTNOTE’ and within each of these groups there are sub-headers like ‘author’. ‘date’ etc.
Each record always starts with ‘HEADING’ and while the sequence of these headers is fixed they may or may not be present {so after HEADING, TITLE could be missing and the next heading would then be PARA}
HEADING
data
TITLE
anchor data
file data
PARA
author data
date data
file data
FOOTNOTE
citation data
HEADING
data
TITLE
anchor data
file data
...
...
Here is how I am processing this
Step1: <> Reads the file and creates records with each first level header’s content as separate fields. Since each record always starts with ‘HEADING’ I am using that as record level delimiter (and other headings like FOOTNOTE as field level delimiter)
Step2:<> feeds to multiple <> to extract level2 data from each input record {So here one of the readers with break down say field ‘TITLE’ to a record containing fields ‘anchor’ and ‘file’, another reader will break down field ‘PARA’ to a record containing ‘author’, ‘date’ and ‘file’}
Step3: Each <> writes data to appropriate table using <>
Now I have 2 questions
Some of the headings like ‘FOOTNOTE’ may or may not be present. And when they are not present, the processing fails saying that ‘too few records’. In the metadata, I tried setting ‘nullable’ as true for such fields but the error persisted. How do I tell clover metadata that a field is optional
My 2nd level Universal Readers have input set to something like this
port:$0.field2:discrete
but in each of these readers I also want a portion of
port:$0.field1
(which is the unique identifier). This is how to ensure that each second level record will have a unique ID identifying the parent record.
May be my whole approach is incorrect; would you be able to suggest a better one
CloverETL Designer Community Version: 3.5.0.058