Can I integrate CloverETL's functionality into my java app?

Hi,

I am new to this tool. I understand that the CloverGUI can be used to perform data transformations. I am working on an application that processes a bunch of flat files containing comma delimited records. The app would take one record at a time and transform it to an XML file and send it to a servlet for further processing. How can I accomplish this functionality without using the CloverETL GUI tool? Since this tool is java based, are there some jar files that I need to import into my classpath to start woking with my process? I just want to know how CloverETL can be integrated into my java application. I don’t want to use the standalone version. Any comments on this would be greatly appreciated :slight_smile:

thanks much!

Hi,
Clover can indeed by embedded into your Java application. Some starting points are documented on our wiki (http://wiki.clovergui.net/doku.php?id=embedding_clover). You should download the CloverETL Engine (http://www.cloveretl.com/download/clover-etl/) and add all its JARs to the classpath. Please note that in the Java code described on the wiki:

EngineInitializer.initEngine(pluginsRootDirectory, configFileName, logHost);

you need to the set path to the engine’s plugins directory.

Jaro

I have created an Eclipse project with my transformations set up and all working. Next step for me is to now create a Java project and integrate these graph files into the Java project. I have had a look around the Wiki and the mailing list, but the information appears a bit disjointed and out of date. Is there a decent explanation for all the classes and methods used somewhere?

The basic steps outlined in most of the text seem to be as shown below. The pieces that are missing in the explanations are:

  1. EngineInitializer.initEngine does not line up with what the parameters are in JavaDocs?
  2. Is there an example showing how to link all the work done in a CloverETL project in the code below? Stuff like workspace.prm paramaters, linked dB and MetaData, etc.
  3. Is there a working example to show how this is done? The code below has no definition for “in”, pluginRootDir, configFileName, etc.

I know all the info is there, but just asking for a working example if anyone has one.

Thanks
Des

// engine customization
GraphRuntimeContext runtimeContext = new GraphRuntimeContext();

// engine initialization - should be called only once
EngineInitializer.initEngine(pluginsRootDirectory, configFileName, logHost);

// graph loading
TransformationGraph graph = TransformationGraphXMLReaderWriter.loadGraph(in,runtimeContext.getAdditionalProperties());

// engine initialization
EngineInitializer.initGraph(graph, runtimeContext);

// graph running
IThreadManager threadManager = new SimpleThreadManager();
WatchDog watchDog = new WatchDog(graph, runtimeContext);
threadManager.executeWatchDog(watchDog);

Hello Des,
if you download examples, you can found there eclipse project called javaExamples. After adding some jars to the classpath examples are ready to run:

	<classpathentry kind="lib" path="$CLOVER_HOME/lib/lib/cloveretl.engine.jar"/>
	<classpathentry kind="lib" path="$CLOVER_HOME/lib/lib/log4j-1.2.12.jar"/>
	<classpathentry kind="lib" path="$CLOVER_HOME/lib/lib/commons-logging.jar"/>
	<classpathentry kind="lib" path="$CLOVER_HOME/lib/lib/javolution.jar"/>
	<classpathentry kind="lib" path="$CLOVER_HOME/lib/plugins/org.jetel.lookup/cloveretl.lookup.jar"/>
	<classpathentry kind="lib" path="$CLOVER_HOME/lib/plugins/org.jetel.connection/cloveretl.connection.jar"/>
	<classpathentry kind="lib" path="$CLOVER_HOME/lib/lib/icu4j-normalizer.jar"/>
	<classpathentry kind="lib" path="$CLOVER_HOME/lib/plugins/org.jetel.component/cloveretl.component.jar"/>

Agata

Thanks for the help. I downloaded the examples and I am testing with the Java examples. I have a few issues though and was wondering if you could clarify?

I ran the example and it works fine. I then incorporated the convention used in workspace.prm and moved these parameters into the params.txt file used in the example. This is shown below. The intention is to configure the example.grf graph with a generic input of the DataReader of the form ${DATAIN-DIR}/bonus.csv instead of the path as given before.

PROJECT= .
# (Please use slash ‘/’ character as a path delimiter in all path specifications, e.g. C:/Users/username/workspace/project)
#Project properties
#Fri May 15 08:30:33 EST 2009
CONN_DIR=${PROJECT}/conn
SEQ_DIR=${PROJECT}/seq
DATAOUT_DIR=${PROJECT}/data-out
GRAPH_DIR=${PROJECT}/graph
TRANS_DIR=${PROJECT}/trans
DATATMP_DIR=${PROJECT}/data-tmp
META_DIR=${PROJECT}/meta
LOOKUP_DIR=${PROJECT}/lookup
DATAIN_DIR=${PROJECT}/data-in

plugins=plugins
dataFile = data-in/bonus.csv
outputFile = data-out/bonus.sorted.csv
metadata = meta/bonus.fmt
graph_file = graph/example.grf
connection = postgre.cfg
query = select * from employee where department_id = ?
key = 11
sortKey = Contract_nr

However I get the following error. It looks like it does not resolve the name before reading the file?

NFO [main] - *** END OF GRAPH LIST ***
WARN [WatchDog] - Graph element [0] is not checked by checkConfig() method. Please call TransformationGraph.checkConfig() first.
INFO [WatchDog] - [Clover] Initializing phase: 0
DEBUG [WatchDog] - initializing edges:
WARN [WatchDog] - Graph element [Edge0] is not checked by checkConfig() method. Please call TransformationGraph.checkConfig() first.
DEBUG [WatchDog] - all edges initialized successfully…
DEBUG [WatchDog] - initializing nodes:
WARN [WatchDog] - Graph element [DATA_READER0] is not checked by checkConfig() method. Please call TransformationGraph.checkConfig() first.
DEBUG [WatchDog] - Opening input file ${DATAIN_DIR}/bonus.csv
ERROR [WatchDog] - Phase initialization failed with reason: DATA_READER0 …FAILED !
Reason: FileURL attribute (${DATAIN_DIR}/bonus.csv) doesn’t contain valid file url.
DATA_READER0 …FAILED !
Reason: FileURL attribute (${DATAIN_DIR}/bonus.csv) doesn’t contain valid file url.

Is there a way to do this?

Thanks
Des

Hello Des, have you added this property file to your graph?

<Property fileURL="params.txt" id="GraphParameter0"/>

Set the directories correctly (be careful of DATAIN_DIR :wink: ) :

PROJECT=.

#Mon Jun 01 13:20:20 CEST 2009
CONN_DIR=${PROJECT}/conn
SEQ_DIR=${PROJECT}/seq
DATAOUT_DIR=${PROJECT}/data-out
GRAPH_DIR=${PROJECT}/graph
TRANS_DIR=${PROJECT}/trans
DATATMP_DIR=${PROJECT}/data-tmp
META_DIR=${PROJECT}/meta
LOOKUP_DIR=${PROJECT}/lookup
DATAIN_DIR=${PROJECT}/data

Then it should work properly.

Agata

Thanks for the prompt reply. Not sure I understand how this works then? Do I add this to the Source of the graph as a separate line or is this the URL of the reader component.

Apologize for the silly questions, but still trying to get my head around the setup?

Thanks
Des

Hi Des,
if you define some graph parameters in external file (e.g. params.txt), you have to pass on the file to the graph. You can do it in CloverETL Designer (see External (Shared) Parameters), change manually graph source (see Property and property file) or pass the parameter’s file when running graph (-cfg swich - see Command line).
When the graph “knows” the parameters, you can use them, eg. in component’s properties:

<Node fileURL="${DATAIN_DIR}/bonus.csv"  id="DATA_READER0" type="DATA_READER"/>

or anywhere in graph:

<LookupTable dbConnection="Connection0" id="LookupTable0" name="test" type="dbLookup">
<attr name="sqlQuery"><![CDATA[${query}]]></attr>
</LookupTable>