Parallizing data in a flat file

Support/help with CloverETL (4.9) and CloverDX (5.0 or newer) implementation problems

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Parallizing data in a flat file

Postby blekota74 » Thu Jun 30, 2011 9:58 am

I have a flat file with content like this (CSV):

Code: Select all

field_A;field_B
1;a
2;b
3;c
4;d
5;e
6;f
7;g


I need to convert this file into a flat file with (for instance) three sets of columns:

Code: Select all

field_A1;field_B1;field_A2;field_B2;field_A3;field_B3
1;a;4;d;7;g
2;b;5;e;;
3;c;6;f;;


Normally to achieve the goal I perform following activities:
1. Check number of records/lines in a file
2. If the number of records/lines is not divisible by 3(in my case - but it can be any number) append missing blank recods/lines (in my case 2)
3. Take first 1/3 records(lines) and put them into colum set nr 1, take second 1/3 records(lines) and put them into colum set nr 2, take third 1/3 records(lines) and put them into colum set nr 3

now we have structure like this (copy of the example above):

Code: Select all

field_A1;field_B1;field_A2;field_B2;field_A3;field_B3
1;a;4;d;7;g
2;b;5;e;;
3;c;6;f;;


Do anybody has an idea how to do this in CloverETL - presently I have to do this in an external tool.

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: Parallizing data in a flat file

Postby avackova » Thu Jun 30, 2011 2:45 pm

Uff, I've got it :wink:, although it was not easy at all.
Attached graph implements following algorithm:
  • read input data
  • add column number to each input record
  • sort the records according to this number
  • format each group of records to one record
    • add the empty record if needed
  • store data in new format
Attachments
Parallizing.grf
(6.41 KiB) Downloaded 271 times
Agata Vackova
Javlin a.s.
[email protected]

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Re: Parallizing data in a flat file

Postby blekota74 » Fri Jul 01, 2011 3:43 pm

Thank you your fast response but I am having an error when execute the graph:

Code: Select all

INFO  [main] - ***  CloverETL framework/transformation graph, (c) 2002-2011 Javlin a.s, released under GNU Lesser General Public License  ***
INFO  [main] - Running with CloverETL library version 3.1.0 build#17 compiled 16/06/2011 16:06:35
INFO  [main] - Running on 2 CPU(s), OS Windows XP, architecture x86, Java version 1.6.0_26, max available memory for JVM 253440 KB
INFO  [main] - Loading default properties from: defaultProperties
INFO  [main] - Graph definition file: graph/Parallizing.grf
INFO  [main] - Graph revision: 1.48 Modified by: user Modified: Thu Jun 30 17:32:23 CEST 2011
INFO  [main] - Checking graph configuration...
INFO  [main] - Graph configuration is valid.
INFO  [main] - Graph initialization (Parallizing)
INFO  [main] - [Clover] Initializing phase: 0
INFO  [main] - Compiling dynamic class FormatInput...
ERROR [main] - Error during graph initialization !
Element [1309427766212:Parallizing]-Phase 0 can't be initilized.
   at org.jetel.graph.TransformationGraph.init(TransformationGraph.java:458)
   at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:202)
   at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:165)
   at org.jetel.main.runGraph.runGraph(runGraph.java:364)
   at org.jetel.main.runGraph.main(runGraph.java:328)
Caused by: DENORMALIZER0 ...FATAL ERROR !
Reason: Used Java Platform doesn't provide any java compiler!
   at org.jetel.graph.Phase.init(Phase.java:174)
   at org.jetel.graph.TransformationGraph.init(TransformationGraph.java:456)
   ... 4 more
Caused by: java.lang.IllegalStateException: Used Java Platform doesn't provide any java compiler!
   at org.jetel.util.compile.DynamicCompiler.compile(DynamicCompiler.java:109)
   at org.jetel.util.compile.DynamicJavaClass.instantiate(DynamicJavaClass.java:66)
   at org.jetel.component.Denormalizer.createDenormalizerDynamic(Denormalizer.java:216)
   at org.jetel.component.Denormalizer.createRecordDenormalizer(Denormalizer.java:269)
   at org.jetel.component.Denormalizer.init(Denormalizer.java:241)
   at org.jetel.graph.Phase.init(Phase.java:165)
   ... 5 more

jurban
Posts: 163
Joined: Fri Jul 20, 2007 9:25 am

Re: Parallizing data in a flat file

Postby jurban » Thu Jul 07, 2011 9:51 am

Hi,

are you running CloverETL with a JRE or JDK? A JDK is required to run Java tranformations - and such a tranformation is used in the graph provided by Agata.

Best regards,
Jaro
Jaroslav Urban
CloverCARE Support
CloverETL | Rapid Data Integration

Visit us online at http://www.cloveretl.com

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Re: Parallizing data in a flat file

Postby blekota74 » Wed Jul 13, 2011 10:43 am

I set path to the JDK and it works. I tried out to understand the code written in java in the given example in object named 'Format many to one' (component type: Denormalilzer) and I think there is lack of documentation. For instance I can't find references for classes like DataFormatter or ByteArrayOutputStream. I googled for cloveretl DataFormatter and I couldn't find any information.
I am dealing mainly with utf-8 and when I use non 'English' characters in the source file (formatted as utf-8) I got error (I set 'Denormalize source set' to utf-8):

Code: Select all

ERROR [WatchDog] - Node DENORMALIZER0 finished with status: Error occurred in nested transformation: ERROR caused by: Message: Denormalization failed! caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)


below is full content of my example data file:

Code: Select all

field_A;field_B
1;a
2;b
3;c
4;d
5;e
6;f
7;g zażółć gęślą jaźń a koń pędź

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: Parallizing data in a flat file

Postby avackova » Wed Jul 13, 2011 11:12 am

Hello,
  1. to handle Polish characters you need to set proper charset on Writer
  2. javadoc and source files (of the open source part of CloverETL Engine) can be downloaded from the CloverETL on Sourceforge page
Agata Vackova

Javlin a.s.

[email protected]

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Re: Parallizing data in a flat file

Postby blekota74 » Wed Jul 13, 2011 11:57 am

well, I check input data (debug) on the DENORMALIZER0 object and it is OK, but at the output there is no data - for me it seems to be problem of a class that can't handle multibyte characters (when remove all 'Polish' characters it works properly).
Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: Parallizing data in a flat file

Postby avackova » Wed Jul 13, 2011 12:14 pm

Please change the charset on Writer:
UniversalDataWriter.png
UniversalDataWriter.png (68.57 KiB) Viewed 9113 times

Charset in Denormalizer is used just for decoding of external source of transformation.
Agata Vackova

Javlin a.s.

[email protected]

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Re: Parallizing data in a flat file

Postby blekota74 » Wed Jul 13, 2011 12:26 pm

Of course I did it and the the error still exists. The problem is in Node DENORMALIZER0 in my opinion. Try with my input file please.

Code: Select all

INFO  [main] - ***  CloverETL framework/transformation graph, (c) 2002-2011 Javlin a.s, released under GNU Lesser General Public License  ***
INFO  [main] - Running with CloverETL library version 3.1.0 build#17 compiled 16/06/2011 16:06:35
INFO  [main] - Running on 2 CPU(s), OS Windows XP, architecture x86, Java version 1.6.0_21, max available memory for JVM 253440 KB
INFO  [main] - Loading default properties from: defaultProperties
INFO  [main] - Graph definition file: graph/Parallizing.grf
INFO  [main] - Graph revision: 1.66 Modified by: informatyk Modified: Wed Jul 13 13:23:27 CEST 2011
INFO  [main] - Checking graph configuration...
INFO  [main] - Graph configuration is valid.
INFO  [main] - Graph initialization (Parallizing)
INFO  [main] - [Clover] Initializing phase: 0
INFO  [main] - Compiling dynamic class FormatInput...
INFO  [main] - Dynamic class FormatInput successfully compiled and instantiated.
INFO  [main] - [Clover] phase: 0 initialized successfully.
INFO  [main] - register MBean with name:org.jetel.graph.runtime:type=CLOVERJMX_1309427766212_0
INFO  [WatchDog] - Starting up all nodes in phase [0]
INFO  [WatchDog] - Successfully started all nodes in phase!
ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node DENORMALIZER0 finished with status: Error occurred in nested transformation: ERROR caused by: Message: Denormalization failed! caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)

Record: #0|field_A|S->7
#1|field_B|S->g zażółć gęślą jaźń a koń pędź
#2|key|i->0

ERROR [WatchDog] - Node DENORMALIZER0 error details:
org.jetel.exception.TransformException: Message: Denormalization failed! caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)

Record: #0|field_A|S->7
#1|field_B|S->g zażółć gęślą jaźń a koń pędź
#2|key|i->0

   at org.jetel.component.denormalize.DataRecordDenormalize.appendOnError(DataRecordDenormalize.java:54)
   at org.jetel.component.Denormalizer.processInput(Denormalizer.java:381)
   at org.jetel.component.Denormalizer.execute(Denormalizer.java:452)
   at org.jetel.graph.Node.run(Node.java:425)
   at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Exception when converting the field value: g zażółć gęślą jaźń a koń pędź (field name: 'field_B') to ISO-8859-1. (original cause: Input length = 1)

Record: #0|field_A|S->7
#1|field_B|S->g zażółć gęślą jaźń a koń pędź
#2|key|i->0

   at org.jetel.data.formatter.DataFormatter.write(DataFormatter.java:263)
   at FormatInput.append(FormatInput.java from JavaSourceFileObject:57)
   at org.jetel.component.Denormalizer.processInput(Denormalizer.java:379)
   ... 3 more
Caused by: java.nio.charset.UnmappableCharacterException: Input length = 1
   at java.nio.charset.CoderResult.throwException(Unknown Source)
   at java.nio.charset.CharsetEncoder.encode(Unknown Source)
   at org.jetel.data.DataField.toByteBuffer(DataField.java:278)
   at org.jetel.data.formatter.DataFormatter.write(DataFormatter.java:228)
   ... 5 more
INFO  [WatchDog] - [Clover] Post-execute phase finalization: 0
INFO  [WatchDog] - [Clover] phase: 0 post-execute finalization successfully.
INFO  [WatchDog] - Execution of phase [0] finished with error - elapsed time(sec): 0
ERROR [WatchDog] - !!! Phase finished with error - stopping graph run !!!
INFO  [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO  [WatchDog] - Phase#            Finished Status         RunTime(sec)    MemoryAllocation(KB)
INFO  [WatchDog] - 0                 ERROR                              0             15867
INFO  [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO  [WatchDog] - WatchDog thread finished - total execution time: 5 (sec)
INFO  [main] - Freeing graph resources.
ERROR [main] - Execution of graph failed !

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: Parallizing data in a flat file

Postby avackova » Wed Jul 13, 2011 12:59 pm

Yes, you're write. Change the line 22 of transformation to:

Code: Select all

   DataFormatter formatter = new DataFormatter("UTF-8");
Agata Vackova

Javlin a.s.

[email protected]

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Re: Parallizing data in a flat file

Postby blekota74 » Wed Jul 13, 2011 1:47 pm

now I have no errors but the result file contains data like double utf-8 formatted - when I set formatting to utf-8 in my editor I see:
1;a;4;d;7;g zażółć gęślą jaźń a koń pędź
2;b;5;e;;
3;c;6;f;;

for me the text looks like I use no utf-8 formatting

but when I copied the text above into a txt editor (with no utf-8 formatting) saved it and browse with utf-8 coding it is OK.
1;a;4;d;7;g zażółć gęślą jaźń a koń pędź

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: Parallizing data in a flat file

Postby avackova » Wed Jul 13, 2011 2:31 pm

Do you have the same charset everywhere? Attached graph works for me.
Attachments
Parallizing.grf
(6.45 KiB) Downloaded 219 times
Agata Vackova

Javlin a.s.

[email protected]

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Re: Parallizing data in a flat file

Postby blekota74 » Thu Jul 14, 2011 10:10 am

still have wrong results when execute your graph

my input file (ANSI Windows, coding 1250 - when switch coding to utf-8 the content is presented properly):
field_A;field_B
1;a
2;b
3;c
4;d
5;e
6;f
7;g zażółć gęślą jaźń a koń pędź


output:
1;a
;4;d
;7;g zażółć gęślą jaźń a koń pędź
2;b
;5;e
;;
3;c
;6;f
;;


for me the problem is in DENORMALIZER - input is correct (I can see all the characters properly in debug mode) but the output is wrong

avackova
Posts: 841
Joined: Fri Jul 20, 2007 9:28 am

Re: Parallizing data in a flat file

Postby avackova » Thu Jul 14, 2011 12:05 pm

I think I've found where the problem is: in Denormalizer we need to format data with the same charset as we convert it from bytes for sending to the next Writer (and it doesn't matter what charset is set on Reader or Writer) or we can send it as bytes. The first solution means, that charset used with DataFormater (line 22: DataFormatter formatter = new DataFormatter("UTF-8");) needs to be the same as the charset used for converting ByteArrayOutputStream to string (line 75: value = output.toString("UTF-8");).
Agata Vackova

Javlin a.s.

[email protected]

blekota74
Posts: 11
Joined: Thu Jun 23, 2011 9:46 am

Re: Parallizing data in a flat file

Postby blekota74 » Thu Jul 14, 2011 1:22 pm

Now it is OK, :)
Dziękuję.


cron