Heya,
It has been a while since we used it, but I believe “Lenient” will skip the row with formatting errors, “Controlled” will log the record but still skip it, and “Strict” aborts the graph.
The common example given to me by our business folks is when they get a comma separated file (CSV) that was created via Excel (not sure if it’s an older version or not); it tends to not put commas in for empty cells, so when they get data like:
CUSTOMER_ID|LAST_NAME |FIRST_NAME |MIDDLE_INITIAL |ADDRESS
001 |Smith |Joe | |
002 |Jones |Mary |P |
003 |Jackson |Ken |W |Mr.
004 |Hill |Rose | |Mrs.
it gets converted to a CSV file like:
CUSTOMER_ID,LAST_NAME,FIRST_NAME,MIDDLE_INITIAL,ADDRESS
001,Smith,Joe
002,Jones,Mary,P
003,Jackson,Ken,W,Mr.
004,Hill,Rose,Mrs.
With all three policies, rows 001 and 002 get rejected with formatting errors.
Our legacy ETL tool allows us to mark MIDDLE_INITIAL and ADDRESS fields as “not required/treat them as null” - we tend to mark all fields that way. The rows are not rejected and carry through the transform. For our Clover graph, we use a custom transform to parse the record as a single string, break it up into the fields, and then treat the missing fields as NULL (it does log an error and abort the graph if the field in question is required). We get the same results as if the CSV file had been:
CUSTOMER_ID,LAST_NAME,FIRST_NAME,MIDDLE_INITIAL,ADDRESS
001,Smith,Joe,
002,Jones,Mary,P,
003,Jackson,Ken,W,Mr.
004,Hill,Rose,Mrs.
That is the way we use it - I looked at our legacy ETL tool and they provide the following actions:
Global “Missing Columns” Actions:
-
Pad with “null” value: Missing columns are initialized to zero or spaces, depending on the data type.
-
Pad with empty value: Missing columns are initialized to an empty string.
-
Pad with NULL: Missing columns are initialized to NULL.
For each column, you can specify:
Column Contains Insufficient Data to Match the Meta Data
-
Error: Abort the job as soon as such a row is found.
-
Discard + Warn: Discard the current data row and issue a warning.
-
Replace + Warn: Pad the short column in accordance with missing columns action if missing; write a warning to the log file.
-
Retain + Warn: Pass the data on as it is, but issue a warning.
-
Replace: Pad a short column in accordance with missing columns action if missing .
-
Retain: Pass the data on as it is.
Thanks,
Anna