EXT_SORT some problems, we can not repeat the sortKey a record value of the sort, can be modified into a repeat of the sort?
DELIMITED_DATA_READER → EXT_SORT → DELIMITED_DATA_WRITER
source.txt documents have two fields, namely field0 and field1, to field0 sort, but the contents of documents field0 duplication, such as 0000000
Below are the contents of
0000000;005
0000001;006
0000000;007
0000002;004
I tried 2.5.0 version, this issue has been resolved
1)But also found other problems, metadata document does not support unicode encoded field name.
public void setName(String _name) {
if (!StringUtils.isValidObjectName(_name)) {
throw new InvalidGraphObjectNameException(_name, “FIELD”);
}
this.name = _name;
}
private final static String OBJECT_NAME_PATTERN = “[_A-Za-z]+[_A-Za-z0-9]*”;
2)ACCESS databases do not support the methods setHoldability
Sorry,DELIMITED_DATA_READER → EXT_SORT → DELIMITED_DATA_WRITER the problem has yet to be resolved.
===========================================
I found that in DELIMITED_DATA_READER read utf-8 encoded documents in question, the source coding format is utf-8, says 0000000;007
I do a test
DELIMITED_DATA_READER ( source file code UTF-8 ) - > DELIMITED_DATA_WRITER ( target file code GB2312 )
Output at the contents of more than a question mark
UNMAPPABLE[1] when converting to GB2312: ‘**?**0000000;007’
=============================================
I think the problem may be in EXT_SORT to read the data encoded on the
Hi,
there are some “invisible” characters in your source file and it causes both problems. To make the graph working properly you need to put a Reformat component after reader. Transformation should look like:
The original documents are windows notepad generated through the Save as to generate the UTF-8 encoded file, this function should belong to a node DELIMITED_DATA_READER attribute settings,
The attributes similar to trim function, it can remove all “Invisible” characters.