Incremental data replication job from DB Views to DB Tables

Hi All,

Just curious if some best practice is available out there in regard to my topic title:

Description: I want to replicate data provided through multiple Oracle-VIEWS into multiple PostgreSQL-TABLES. (e.g per Oracle-VIEW, respectively one identical PostgreSQL-TABLE at the end over the day)

This etl-process shall be done INCREMENTALLY and as fast as possible in a short TIME INTERVAL ( e.g between 3 and 5 seconds)

Thoughts I have are:

  1. While reading data from Oracle-Views through DBInputTable, shall I set a simple query “SELECT * FROM Oracle-View” Or it is better to take-over the SQL Code of the View itself?

  2. How to determine if I shall create a BIG Graph, which does all the job or it is better to implement multiple graphs? ( e.g one graph per Oracle-View)

Rationale: I want to create a PostgreSQL Warehouse, which is used to supply a Reporing-Engine and a Real-time Dasboard-Engine to monitor activities and results. Therefore it is required to replicate the data as fast as possible.

Many thanks in advance for your inputs and experiences

Sabelm_m

Hi Sabelm_m,

CloverETL seem like great tool for what you want:

  1. Incremental replication. If your views contains increasing fields (like ID, timestamp, …) then you can use Incremental reading. If you combine this with limitation of number of records read from database you can easily control number of items in each turn. And therefore its time span.

  2. Because CloverETL access database through JDBC there is almost no difference between views and SQL query results. So you may pick way which is faster on DB server or the one you like more.

  3. Big graph or more smaller? Well, typically is better to create one purpose, smaller, graphs. There is a few good reasons for that
    * clarity - big graphs are hard to maintain, debug, …
    * error handling - in big graphs with many concurrent tasks is hard to handle all possible error states correctly
    * resources - big graphs may consume more memory than is available

It may also help to use Subgraphs for reused functionality. Please take a look on CloverETL Server which can help a lot with scheduling and controlling of your tasks.

I hope this helps. Please let me know if I should elaborate more on some points.

Many thanks Jaroslav for the inputs.

I will follow your recommendations and make some testings.
The request is satisfactory answered for me and can be declared as resolved.

THX