I have a flat file with content like this (CSV):
field_A;field_B
1;a
2;b
3;c
4;d
5;e
6;f
7;g
I need to convert this file into a flat file with (for instance) three sets of columns:
field_A1;field_B1;field_A2;field_B2;field_A3;field_B3
1;a;4;d;7;g
2;b;5;e;;
3;c;6;f;;
Normally to achieve the goal I perform following activities:
1. Check number of records/lines in a file
2. If the number of records/lines is not divisible by 3(in my case - but it can be any number) append missing blank recods/lines (in my case 2)
3. Take first 1/3 records(lines) and put them into colum set nr 1, take second 1/3 records(lines) and put them into colum set nr 2, take third 1/3 records(lines) and put them into colum set nr 3
now we have structure like this (copy of the example above):
field_A1;field_B1;field_A2;field_B2;field_A3;field_B3
1;a;4;d;7;g
2;b;5;e;;
3;c;6;f;;
Do anybody has an idea how to do this in CloverETL - presently I have to do this in an external tool.
Uff, I’ve got it , although it was not easy at all.
Attached graph implements following algorithm:
-
read input data
-
add column number to each input record
-
sort the records according to this number
-
format each group of records to one record
[list:8htuk5jf][*:8htuk5jf] add the empty record if needed
[/*8htuk5jf]
-
store data in new format
[/list:u:8htuk5jf]
-
Parallizing.grf
Thank you your fast response but I am having an error when execute the graph:
INFO [main] - *** CloverETL framework/transformation graph, (c) 2002-2011 Javlin a.s, released under GNU Lesser General Public License ***
INFO [main] - Running with CloverETL library version 3.1.0 build#17 compiled 16/06/2011 16:06:35
INFO [main] - Running on 2 CPU(s), OS Windows XP, architecture x86, Java version 1.6.0_26, max available memory for JVM 253440 KB
INFO [main] - Loading default properties from: defaultProperties
INFO [main] - Graph definition file: graph/Parallizing.grf
INFO [main] - Graph revision: 1.48 Modified by: user Modified: Thu Jun 30 17:32:23 CEST 2011
INFO [main] - Checking graph configuration...
INFO [main] - Graph configuration is valid.
INFO [main] - Graph initialization (Parallizing)
INFO [main] - [Clover] Initializing phase: 0
INFO [main] - Compiling dynamic class FormatInput...
ERROR [main] - Error during graph initialization !
Element [1309427766212:Parallizing]-Phase 0 can't be initilized.
at org.jetel.graph.TransformationGraph.init(TransformationGraph.java:458)
at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:202)
at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:165)
at org.jetel.main.runGraph.runGraph(runGraph.java:364)
at org.jetel.main.runGraph.main(runGraph.java:328)
Caused by: DENORMALIZER0 ...FATAL ERROR !
Reason: Used Java Platform doesn't provide any java compiler!
at org.jetel.graph.Phase.init(Phase.java:174)
at org.jetel.graph.TransformationGraph.init(TransformationGraph.java:456)
... 4 more
Caused by: java.lang.IllegalStateException: Used Java Platform doesn't provide any java compiler!
at org.jetel.util.compile.DynamicCompiler.compile(DynamicCompiler.java:109)
at org.jetel.util.compile.DynamicJavaClass.instantiate(DynamicJavaClass.java:66)
at org.jetel.component.Denormalizer.createDenormalizerDynamic(Denormalizer.java:216)
at org.jetel.component.Denormalizer.createRecordDenormalizer(Denormalizer.java:269)
at org.jetel.component.Denormalizer.init(Denormalizer.java:241)
at org.jetel.graph.Phase.init(Phase.java:165)
... 5 more
Hi,
are you running CloverETL with a JRE or JDK? A JDK is required to run Java tranformations - and such a tranformation is used in the graph provided by Agata.
Best regards,
Jaro
I set path to the JDK and it works. I tried out to understand the code written in java in the given example in object named ‘Format many to one’ (component type: Denormalilzer) and I think there is lack of documentation. For instance I can’t find references for classes like DataFormatter or ByteArrayOutputStream. I googled for cloveretl DataFormatter and I couldn’t find any information.
I am dealing mainly with utf-8 and when I use non ‘English’ characters in the source file (formatted as utf-8) I got error (I set ‘Denormalize source set’ to utf-8):
ERROR [WatchDog] - Node DENORMALIZER0 finished with status: Error occurred in nested transformation: ERROR caused by: Message: Denormalization failed! caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)
below is full content of my example data file:
field_A;field_B
1;a
2;b
3;c
4;d
5;e
6;f
7;g zażółć gęślą jaźń a koń pędź
well, I check input data (debug) on the DENORMALIZER0 object and it is OK, but at the output there is no data - for me it seems to be problem of a class that can’t handle multibyte characters (when remove all ‘Polish’ characters it works properly).
Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: ‘field_B’) to ISO-8859-1. (original cause: Input length = 1)
Please change the charset on Writer:UniversalDataWriter.png
Charset in Denormalizer is used just for decoding of external source of transformation.
Of course I did it and the the error still exists. The problem is in Node DENORMALIZER0 in my opinion. Try with my input file please.
INFO [main] - *** CloverETL framework/transformation graph, (c) 2002-2011 Javlin a.s, released under GNU Lesser General Public License ***
INFO [main] - Running with CloverETL library version 3.1.0 build#17 compiled 16/06/2011 16:06:35
INFO [main] - Running on 2 CPU(s), OS Windows XP, architecture x86, Java version 1.6.0_21, max available memory for JVM 253440 KB
INFO [main] - Loading default properties from: defaultProperties
INFO [main] - Graph definition file: graph/Parallizing.grf
INFO [main] - Graph revision: 1.66 Modified by: informatyk Modified: Wed Jul 13 13:23:27 CEST 2011
INFO [main] - Checking graph configuration...
INFO [main] - Graph configuration is valid.
INFO [main] - Graph initialization (Parallizing)
INFO [main] - [Clover] Initializing phase: 0
INFO [main] - Compiling dynamic class FormatInput...
INFO [main] - Dynamic class FormatInput successfully compiled and instantiated.
INFO [main] - [Clover] phase: 0 initialized successfully.
INFO [main] - register MBean with name:org.jetel.graph.runtime:type=CLOVERJMX_1309427766212_0
INFO [WatchDog] - Starting up all nodes in phase [0]
INFO [WatchDog] - Successfully started all nodes in phase!
ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node DENORMALIZER0 finished with status: Error occurred in nested transformation: ERROR caused by: Message: Denormalization failed! caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)
Record: #0|field_A|S->7
#1|field_B|S->g zażółć gęślą jaźń a koń pędź
#2|key|i->0
ERROR [WatchDog] - Node DENORMALIZER0 error details:
org.jetel.exception.TransformException: Message: Denormalization failed! caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)
Record: #0|field_A|S->7
#1|field_B|S->g zażółć gęślą jaźń a koń pędź
#2|key|i->0
at org.jetel.component.denormalize.DataRecordDenormalize.appendOnError(DataRecordDenormalize.java:54)
at org.jetel.component.Denormalizer.processInput(Denormalizer.java:381)
at org.jetel.component.Denormalizer.execute(Denormalizer.java:452)
at org.jetel.graph.Node.run(Node.java:425)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)
Record: #0|field_A|S->7
#1|field_B|S->g zażółć gęślą jaźń a koń pędź
#2|key|i->0
at org.jetel.data.formatter.DataFormatter.write(DataFormatter.java:263)
at FormatInput.append(FormatInput.java from JavaSourceFileObject:57)
at org.jetel.component.Denormalizer.processInput(Denormalizer.java:379)
... 3 more
Caused by: java.nio.charset.UnmappableCharacterException: Input length = 1
at java.nio.charset.CoderResult.throwException(Unknown Source)
at java.nio.charset.CharsetEncoder.encode(Unknown Source)
at org.jetel.data.DataField.toByteBuffer(DataField.java:278)
at org.jetel.data.formatter.DataFormatter.write(DataFormatter.java:228)
... 5 more
INFO [WatchDog] - [Clover] Post-execute phase finalization: 0
INFO [WatchDog] - [Clover] phase: 0 post-execute finalization successfully.
INFO [WatchDog] - Execution of phase [0] finished with error - elapsed time(sec): 0
ERROR [WatchDog] - !!! Phase finished with error - stopping graph run !!!
INFO [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB)
INFO [WatchDog] - 0 ERROR 0 15867
INFO [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO [WatchDog] - WatchDog thread finished - total execution time: 5 (sec)
INFO [main] - Freeing graph resources.
ERROR [main] - Execution of graph failed !
Yes, you’re write. Change the line 22 of transformation to:
DataFormatter formatter = new DataFormatter("UTF-8");
now I have no errors but the result file contains data like double utf-8 formatted - when I set formatting to utf-8 in my editor I see:
1;a;4;d;7;g zażółć gęślą jaźń a koń pędź
2;b;5;e;;
3;c;6;f;;
for me the text looks like I use no utf-8 formatting
but when I copied the text above into a txt editor (with no utf-8 formatting) saved it and browse with utf-8 coding it is OK.
1;a;4;d;7;g zażółć gęślą jaźń a koń pędź
Do you have the same charset everywhere? Attached graph works for me.
still have wrong results when execute your graph
my input file (ANSI Windows, coding 1250 - when switch coding to utf-8 the content is presented properly):
field_A;field_B
1;a
2;b
3;c
4;d
5;e
6;f
7;g zażółć gęślą jaźń a koń pędź
output:
1;a
;4;d
;7;g zażółć gęślą jaźń a koń pędź
2;b
;5;e
;;
3;c
;6;f
;;
for me the problem is in DENORMALIZER - input is correct (I can see all the characters properly in debug mode) but the output is wrong
I think I’ve found where the problem is: in Denormalizer we need to format data with the same charset as we convert it from bytes for sending to the next Writer (and it doesn’t matter what charset is set on Reader or Writer) or we can send it as bytes. The first solution means, that charset used with DataFormater (line 22: DataFormatter formatter = new DataFormatter(“UTF-8”) needs to be the same as the charset used for converting ByteArrayOutputStream to string (line 75: value = output.toString**(“UTF-8”)**;).