EXT_SORT some problems

hwhwhw · August 26, 2008, 12:00am

EXT_SORT some problems, we can not repeat the sortKey a record value of the sort, can be modified into a repeat of the sort?
DELIMITED_DATA_READER → EXT_SORT → DELIMITED_DATA_WRITER

source.txt documents have two fields, namely field0 and field1, to field0 sort, but the contents of documents field0 duplication, such as 0000000

Below are the contents of
0000000;005
0000001;006
0000000;007
0000002;004

target_sort.txt output
0000000;007
0000001;006
0000002;004
0000000;005

And the result is that I have to
0000000;005
0000000;007
0000001;006
0000002;004

hwhwhw · September 5, 2008, 1:09am

I tried 2.5.0 version, this issue has been resolved

1)But also found other problems, metadata document does not support unicode encoded field name.
public void setName(String _name) {
if (!StringUtils.isValidObjectName(_name)) {
throw new InvalidGraphObjectNameException(_name, “FIELD”);
}
this.name = _name;
}

private final static String OBJECT_NAME_PATTERN = “[_A-Za-z]+[_A-Za-z0-9]*”;

2)ACCESS databases do not support the methods setHoldability

connection.setHoldability(ResultSet.CLOSE_CURSORS_AT_COMMIT);

hwhwhw · September 8, 2008, 6:46am

Sorry,DELIMITED_DATA_READER → EXT_SORT → DELIMITED_DATA_WRITER the problem has yet to be resolved.
===========================================
I found that in DELIMITED_DATA_READER read utf-8 encoded documents in question, the source coding format is utf-8, says 0000000;007

I do a test
DELIMITED_DATA_READER ( source file code UTF-8 ) - > DELIMITED_DATA_WRITER ( target file code GB2312 )

Output at the contents of more than a question mark
UNMAPPABLE[1] when converting to GB2312: ‘**?**0000000;007’
=============================================
I think the problem may be in EXT_SORT to read the data encoded on the

avackova · September 8, 2008, 8:15am

Don’t you have an “invisible” character on before the record? I can’t reproduce the problem.

hwhwhw · September 8, 2008, 9:35am

I have the relevant information to your mailbox, please help me see.

avackova · September 9, 2008, 7:36am

Hi,
there are some “invisible” characters in your source file and it causes both problems. To make the graph working properly you need to put a Reformat component after reader. Transformation should look like:

function transform() {
	$0.Field0 := replace($0.Field0, '[^\p{ASCII}]*', '');
	$0.Field1 := $0.Field1;
}

- this removes characters, which cause the problem.

hwhwhw · September 9, 2008, 8:21am

The original documents are windows notepad generated through the Save as to generate the UTF-8 encoded file, this function should belong to a node DELIMITED_DATA_READER attribute settings,
The attributes similar to trim function, it can remove all “Invisible” characters.

<DELIMITED_DATA_READER removeInvisibleCharacters=“true”/>

Topic		Replies	Views
Preserve encoding CloverDX Platform	2	1	July 16, 2007
Metadata Question CloverDX Platform	8	5	May 19, 2010
Using Complex Data Reader to select Metadata based on fields CloverDX Platform	3	1	July 1, 2014
DelimitedDataReaderNIO-->java.nio.BufferOverf CloverDX Platform	1	2	July 16, 2007
Multikey sort with different sort criteria CloverDX Platform	3	3	April 7, 2009

EXT_SORT some problems

Related topics