고속 insert

DB 2013. 4. 3. 12:23


http://www.easysoft.com/products/data_access/odbc_odbc_bridge/performance_white_paper.html#3_1_2

3.1.2 Inserting data

Unsurprisingly, much of the examples in the previous section apply to inserting data.

These examples use the same table structure as in the previous section but this time we want to populate the table with 50000 rows.

The simplest pseudo code to populate our table is:

SQLCHAR sql[1024]
for (i = 1; i <= 50000; i++)
{
    sql =  "insert into perf values " +
          "(" + i + ", 'this is a char thirty', " +
          "'this is a varchar 255', " +
          "{ts '2002-08-01 11:12:13.123'}, " + i * 1.1 + ")"
    SQLExecDirect(sql);
}

Here we construct the SQL insert statement once for each row and call SQLExecDirect once for each row. When we run this code it takes 5 minutes and 10 seconds. Why? The first problem with this code is the database has to parse and prepare the insert statement once per row; this can be time consuming.

A much more efficient method is to used a parameterized insert. Here parameters are using in place of the column data. Instead of the above SQL we would use:

insert into perf values (?,?,?,?,?)

We prepare this SQL once (by calling SQLPrepare), bind the parameters with SQLBindParameter and then just keep calling SQLExecute, changing the data we bound each time. e.g.

SQLINTEGER idata
SQLCHAR cdata1[256]
SQLCHAR cdata2[256]
SQL_TIMESTAMP_STRUCT ts
SQLREAL fdata

SQLPrepare(insert into perf values (?,?,?,?,?)

SQLBindParameter(1, idata)
SQLBindParameter(2, cdata1)
SQLBindParameter(3, cdata2)
SQLBindParameter(4, ts)
SQLBindParameter(5, fdata)

for (i = 0; i <= 50000; i++)
{
    set idata, cdata1, cdata2, ts, fdata to whatever values you
    want to insert
    SQLExecute()
}

However, if you run this through OOB you'll find it does not make any real difference. The reason for this is that in the first case with straight forward SQL insert you are making one network call per insert whereas with the parameterised inserts there are two network calls; one for SQLExecute and one to send the parameters (OOB also has some extra work to pack up the parameters).

With many databases a very easy speed up if you are doing bulk inserts, say copying large quantities of data from one place to another is to turn autocommit off and commit the inserts at the end. This can half the insert time.

e.g.

Add the above pseudo code:

SQLSetConnectAttr(SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF)
<pseduo code above>
SQLEndTran(SQL_HANDLE_DBC, connection_handle, SQL_COMMIT)

When we apply this change and run to MS SQL Server, DB2 or Oracle (others may be the same) the time comes down to around 2 minutes 30 seconds; this is nearly half of what we started with. There are two things to remember about this:

  1. If you only commit at the end of all your inserts then if you abort when any SQLExecute fails none of your inserts are committed.
  2. Individual inserts will not be visible to other users until you have finished all your inserts and committed them. If this bothers you then perhaps you can still get improved speed by committing less often.

One final change we can make is to use arrays of parameters (as we did in row fetching). To do this you call SQLSetStmtAttr() and set the SQL_ATTR_PARAMSET_SIZE to a number > 1. e.g.

#define ROWS 10
SQLINTEGER idata[ROWS]
SQLCHAR cdata1[ROWS][256]
SQLCHAR cdata2{ROWS][256]
SQL_TIMESTAMP_STRUCT ts[ROWS]
SQLREAL fdata[ROWS]

SQLSetStmtAttr(SQL_ATTR_PARAMSET_SIZE, ROWS)

SQLPrepare(insert into perf values (?,?,?,?,?)

SQLBindParameter(1, idata)
SQLBindParameter(2, cdata1)
SQLBindParameter(3, cdata2)
SQLBindParameter(4, ts)
SQLBindParameter(5, fdata)

for (i = 0; i <= 5000; i++)
{
    set idata, cdata1, cdata2, ts, fdata to whatever values you
    want to insert (* 10)
    SQLExecute()
}

Note this code only does 5000 iterations as you are inserting 10 rows at a time. The time for this version was 34 seconds. If you take this further inserting 50 rows at a time it falls again to 18 seconds but there is usually a limit at which the speed increase bottoms out. This final figure works out as 2777 inserts a second.




http://msdn.microsoft.com/en-us/library/ms709287(v=vs.85).aspx