DB_INPUT_TABLE execution

Hi guys,

Hope everything is going well.

I’ve run into another issue what I’d appreciate your input over.

I have a graph phase that takes input from two different DB_INPUT_TABLE components (and two different databases). Both of the queries can take quite a while to execute (DB is MySQL, 10m-30m rows). This can cause a problem because each Statement.executeQuery() call is made from the components init() method which are called serially as the phase initializes.

While the second query is executed, the other DB_INPUT_TABLE resultset is held open, but unread.

Occasionally the second query takes too long, and when the phase starts the database has timed out the other connection (because no-one has processed the result set) and the graph fails.

My solution to this problem is to move the SQLDataParser.open() call from DBInputTable.init() to DBInputTable.run() so the executeQuery() happens in each Nodes’ own Thread. This allows both queries to execute in parallel (and may even be quicker)

Was wondering what you thought of this solution, and whether you could fore-see any potential problems, or alternatives?

Regards,
Peter

Hello Peter !

Your solution seems to be OK. The only potential problem could be that if you have your connection info or whatever set up wrong, your component fails not at the beginning (as would now) but when the second phase is executed.

We are thinking about actually shifting the time when init() gets called. It is now used also to verify the correct set-up of component. This will change in the future as there is for a long time checkConfig() method which so far always returns true.

The goal is to call init() method on comonents before they are executed in particular phase. This should also fix your problem with timing out the DB connection.