Flatfile Reader - Character decoding error - Data Policy - Controlled and Lenient mode question

andras.csore · November 9, 2018, 12:00am

Hello,

The following question come up regarding ‘Character Encoding’ error on FlatFileReader component.
As far as i see, if its encounter with ‘character encoding’ problem, its stops.
Example error message:
‘Error when parsing record #22746 field XYZ
Character decoding error occurred. Set correct charset. Current charset is windows-1257’

I know, charset need to be okay. But in real life, sometimes not okay…

Question 1.) Its stops even it set to ‘Controlled mode’ (and not sent to the ‘error’ port)
Even its stop when the ‘Data policy’ for FlatFileReader set to ‘Controlled’, it’s not send out to the ‘Error’ port.
But may some case its can be good to pass thru, to know the ‘rows’, where problem detected.
Its any way to pass thru this rows on Error port (in ‘Controlled mode’) ?

Question 2.) If i set the FlatFilereader ‘Data Policy’ to ‘Lenient’ (in this char encoding case) - What happen?
But what happen exactly?
As i see, all the rows ‘pass thru’, and its try to do charset decoding anyhow (‘forced’)
I’m right?

Note: the help not provide too much info regarding lenient mode: (‘Lenient. This data policy means that incorrect records are only skipped and data parsing continues.’) …
For example, in ‘char encoding problem’ case its definitely not skip the record (but may it’s good in this case)…

Thanks, Andrase,

Vladimir_Barton · November 16, 2018, 3:32pm

Hello Andrase,
I do agree that the behavior of the system is not correct. Therefore, I have logged a new ticket into our JIRA describing this bug that you reported. Our development team will review and address this issue in a future release of CloverDX. Thank you for pointing this to our attention.
Kind regards,

andras.csore · November 22, 2018, 9:16am

I like to add a note:

I think is good (and in real life is very-very helpfull !!!) to have a ‘forced’ checkbox feature for char decoding
(work with booth in ‘Controlled’/Lenient’ cases).

The main reason for that:
Usually the problematic cases is limited to a few field only (usually just one field), and not happen all the times (usually less than 1-2% of all the rows) => And mostly other fields are just okay.

Example fields:
‘Price’:Ok;‘Color’:Okay;‘Size’:Okay;‘Name of item’:Okay;‘Short Description of item’:Okay;
‘Long description of the item’:Wrong decoding.

Is better to load all the rows, all the fields (even with wrong encoding), as the other fields is just okay.
‘Long description of the item’ may have strange characters… => less problem, ‘description’ field is just description, life goes on… => as important fields (like ‘price’;‘color’;‘size’) just okay.
And with additional one FlatFileReader (in not ‘Forced’ mode) we can also provide log to customer where is problems (and they can fix it in ‘post-process’ manually, whatever).

Andrase,

Vladimir_Barton · November 23, 2018, 7:48am

Hi Andrase,
thank you for the provided suggestion. I will pass this forward to our development team to consider.
Best,

Topic		Replies	Views
DataReader Policy CloverDX Platform	2	14	September 21, 2009
Character decoding error occurred. Set correct charset. Current charset is UTF-8 CloverDX Platform	1	28	December 13, 2018
UTF 8 Reading Error CloverDX Platform	1	17	February 9, 2015
DATA_READER, dataPolicy, and fixed width files CloverDX Platform	1	5	February 1, 2010
Writing Controlled Data Policy errors CloverDX Platform	2	0	November 18, 2014

Flatfile Reader - Character decoding error - Data Policy - Controlled and Lenient mode question

Related topics