EXT_SORT some problems

EXT_SORT some problems, we can not repeat the sortKey a record value of the sort, can be modified into a repeat of the sort?
DELIMITED_DATA_READER → EXT_SORT → DELIMITED_DATA_WRITER

source.txt documents have two fields, namely field0 and field1, to field0 sort, but the contents of documents field0 duplication, such as 0000000

Below are the contents of
0000000;005
0000001;006
0000000;007
0000002;004

target_sort.txt output
0000000;007
0000001;006
0000002;004
0000000;005

And the result is that I have to
0000000;005
0000000;007
0000001;006
0000002;004

I tried 2.5.0 version, this issue has been resolved :slight_smile:

1)But also found other problems, metadata document does not support unicode encoded field name.
public void setName(String _name) {
if (!StringUtils.isValidObjectName(_name)) {
throw new InvalidGraphObjectNameException(_name, “FIELD”);
}
this.name = _name;
}

private final static String OBJECT_NAME_PATTERN = “[_A-Za-z]+[_A-Za-z0-9]*”;

2)ACCESS databases do not support the methods setHoldability

connection.setHoldability(ResultSet.CLOSE_CURSORS_AT_COMMIT);

Sorry,DELIMITED_DATA_READER → EXT_SORT → DELIMITED_DATA_WRITER the problem has yet to be resolved.
===========================================
I found that in DELIMITED_DATA_READER read utf-8 encoded documents in question, the source coding format is utf-8, says 0000000;007

I do a test
DELIMITED_DATA_READER ( source file code UTF-8 ) - > DELIMITED_DATA_WRITER ( target file code GB2312 )

Output at the contents of more than a question mark
UNMAPPABLE[1] when converting to GB2312: ‘**?**0000000;007’
=============================================
I think the problem may be in EXT_SORT to read the data encoded on the

Don’t you have an “invisible” character on before the record? I can’t reproduce the problem.

I have the relevant information to your mailbox, please help me see.

Hi,
there are some “invisible” characters in your source file and it causes both problems. To make the graph working properly you need to put a Reformat component after reader. Transformation should look like:

function transform() {
	$0.Field0 := replace($0.Field0, '[^\p{ASCII}]*', '');
	$0.Field1 := $0.Field1;
}

- this removes characters, which cause the problem.

The original documents are windows notepad generated through the Save as to generate the UTF-8 encoded file, this function should belong to a node DELIMITED_DATA_READER attribute settings,
The attributes similar to trim function, it can remove all “Invisible” characters.

<DELIMITED_DATA_READER removeInvisibleCharacters=“true”/>