The above approach is slow for a couple of reasons:
- If you are not wrapping the loop in a transaction then each call to happens in its own transaction. That is going to be really slow!
- There is a decent amount of Python logic getting in your way, and each
InsertQuery
must be generated and parsed into SQL. - That’s a lot of data (in terms of raw bytes of SQL) you are sending to your database to parse.
- We are retrieving the last insert id, which causes an additional query to be executed in some cases.
You can get a significant speedup by simply wrapping this in a transaction with atomic()
.
# This is much faster.
for data_dict in data_source:
Depending on the number of rows in your data source, you may need to break it up into chunks:
# Insert rows 100 at a time.
with db.atomic():
for idx in range(0, len(data_source), 100):
If won’t work for your use-case, you can also use the helper to process chunks of rows inside transactions:
SQLite users should be aware of some caveats when using bulk inserts. Specifically, your SQLite3 version must be 3.7.11.0 or newer to take advantage of the bulk insert API. Additionally, by default SQLite limits the number of bound variables in a SQL query to 999
. This value can be modified by setting the SQLITE_MAX_VARIABLE_NUMBER
flag.
If the data you would like to bulk load is stored in another table, you can also create INSERT queries whose source is a SELECT query. Use the method:
query = (TweetArchive
.insert_from(
Tweet.select(Tweet.user, Tweet.message),
.execute())