Hello,
The following question come up regarding ‘Character Encoding’ error on FlatFileReader component.
As far as i see, if its encounter with ‘character encoding’ problem, its stops.
Example error message:
‘Error when parsing record #22746 field XYZ
Character decoding error occurred. Set correct charset. Current charset is windows-1257’
I know, charset need to be okay. But in real life, sometimes not okay…
Question 1.) Its stops even it set to ‘Controlled mode’ (and not sent to the ‘error’ port)
Even its stop when the ‘Data policy’ for FlatFileReader set to ‘Controlled’, it’s not send out to the ‘Error’ port.
But may some case its can be good to pass thru, to know the ‘rows’, where problem detected.
Its any way to pass thru this rows on Error port (in ‘Controlled mode’) ?
Question 2.) If i set the FlatFilereader ‘Data Policy’ to ‘Lenient’ (in this char encoding case) - What happen?
But what happen exactly?
As i see, all the rows ‘pass thru’, and its try to do charset decoding anyhow (‘forced’)
I’m right?
Note: the help not provide too much info regarding lenient mode: (‘Lenient. This data policy means that incorrect records are only skipped and data parsing continues.’) …
For example, in ‘char encoding problem’ case its definitely not skip the record (but may it’s good in this case)…
Thanks, Andrase,
Hello Andrase,
I do agree that the behavior of the system is not correct. Therefore, I have logged a new ticket into our JIRA describing this bug that you reported. Our development team will review and address this issue in a future release of CloverDX. Thank you for pointing this to our attention.
Kind regards,
I like to add a note:
I think is good (and in real life is very-very helpfull !!!) to have a ‘forced’ checkbox feature for char decoding
(work with booth in ‘Controlled’/Lenient’ cases).
The main reason for that:
Usually the problematic cases is limited to a few field only (usually just one field), and not happen all the times (usually less than 1-2% of all the rows) => And mostly other fields are just okay.
Example fields:
‘Price’:Ok;‘Color’:Okay;‘Size’:Okay;‘Name of item’:Okay;‘Short Description of item’:Okay;
‘Long description of the item’:Wrong decoding.
Is better to load all the rows, all the fields (even with wrong encoding), as the other fields is just okay.
‘Long description of the item’ may have strange characters… => less problem, ‘description’ field is just description, life goes on… => as important fields (like ‘price’;‘color’;‘size’) just okay.
And with additional one FlatFileReader (in not ‘Forced’ mode) we can also provide log to customer where is problems (and they can fix it in ‘post-process’ manually, whatever).
Andrase,
Hi Andrase,
thank you for the provided suggestion. I will pass this forward to our development team to consider.
Best,