I noticed in the postgres logs that DBOutputTable uses individual inserts rather than multi-row inserts (with batch mode).
With a test example and batch size = 5, I see something like the following in the logs:
LOG: execute S_1: BEGIN
LOG: execute <unnamed>: SAVEPOINT "svpnt"
LOG: execute S_2: insert into test_out values($1,$2)
DETAIL: parameters: $1 = '1', $2 = 'TEST'
LOG: execute S_2: insert into test_out values($1,$2)
DETAIL: parameters: $1 = '2', $2 = 'TEST'
LOG: execute S_2: insert into test_out values($1,$2)
DETAIL: parameters: $1 = '3', $2 = 'TEST'
LOG: execute S_2: insert into test_out values($1,$2)
DETAIL: parameters: $1 = '4', $2 = 'TEST'
LOG: execute S_2: insert into test_out values($1,$2)
DETAIL: parameters: $1 = '5', $2 = 'TEST'
LOG: execute <unnamed>: SAVEPOINT "svpnt"
... etc.
LOG: execute S_3: COMMIT
So, we get savepoints every 5 records in this example (to match the batch size), but it would be nice (i.e., faster, in general) if there was an option for multi-row inserts (e.g., insert into test_out values ($1, $2), ($3, $4), ($5, $6), ($7, $8), ($9, $10))… or perhaps this is supported and I’m just not understanding how to make use of it?
I do realize that the postgres data writer is available and makes use of \copy (which will be fast), but that’s more of an “all or nothing” component.
See the following for a discussion of the efficacy of multi-row inserts:
http://www.depesz.com/2007/07/05/how-to … -possible/