SQLite is a popular, relational database that you embed into your application. With an increasing amount of data in your database, you need to apply SQLite performance tuning. This article discusses indices and its pitfalls, the use of the query planner, the Write-Ahead-Logging (WAL) journal mode and increasing the cache size. It also elaborates on the importance of measuring the impact of your tweaks, using automated tests.
Introduction
SQLite is a popular, relational database (DB) system. Unlike its bigger, client-server based brothers, such as MySQL, SQLite can be embedded into your application as a library. SQLite has a very similar feature set and can also handle millions of rows, given that you know a few tips and tricks about performance tuning. As the following sections will show, there is more to know about SQLite performance tuning than just creating indices.
Create indices, but with caution
The basic idea of an index is to speed up reading of specific data, that is, SELECT
statements with a WHERE
clause. Indices also accelerate sorting data (ORDER BY
), or JOIN
ing tables. Unfortunately, indices are a double-edged sword, because they consume additional disk space and they slow down data manipulation (INSERT
, UPDATE
, DELETE
).
The general advice is to create as few indices as possible, but as many as necessary. Also, indices only make sense for larger databases, with thousands or millions of rows.
Use the query planner to analyze your queries
The way indices are used internally by SQLite are documented, but not very easy to understand. As further elaborated by this article, it is a good idea to analyze a query by prefixing it with EXPLAIN QUERY PLAN
. Take a look at every output line, of which there are three basic variants:
SEARCH table ...
lines are a good sign – SQLite uses one of your indices!SCAN table ... USING INDEX
is a bad sign,SCAN table ...
is even worse!
Try to avoid SCAN table [using index]
entries in the output of EXPLAIN QUERY PLAN
whenever possible, because you will run into performance problems on larger databases. Use EXPLAIN QUERY PLAN
to iterative add or modify your indices until no more SCAN table
entries appear.
Optimize queries that involve IS NOT
Checking for IS NOT ...
is expensive because SQLite will have to scan all rows of the table, even if the affected column has an index. Indices are only useful if you look for specific values, i.e. comparisons involving < (smaller), > (greater), or = (equal), but they don’t apply for != (unequal).
A neat little trick is that you can replace WHERE column != value
with WHERE column > value OR column < value
. This will use the column’s index and effectively affect all rows whose value are not equal to value
. Similarly, a WHERE stringColumn != ''
can be replaced by WHERE stringColumn > ''
, because strings are sortable. When applying this trick however, make sure you know how SQLite handles NULL
comparisons. For instance, SQLite evaluates NULL > ''
as FALSE
.
If you do use such a comparison trick, there is another caveat in case your query contains WHERE
and ORDER BY
, each with a different column: this will make the query inefficient again. If possible, use the same column in WHERE
and ORDER BY
, or build a covering index that involves both the WHERE
and ORDER BY
column.
Improve write speed with the Write-Ahead-Log
The Write-Ahead-Logging (WAL) journal mode significantly improves the write/update performance, compared to the default rollback journal mode. However, as documented here, there are a few caveats. For instance, WAL mode is not available on certain operating systems. Also, there are reduced data consistency guarantees in case of hardware failure. Make sure to read the first few pages to understand what you are doing.
I found that the command PRAGMA synchronous = NORMAL
provides a 3-4x speedup. Setting journal_mode
to WAL
then improves performance again significantly (approximately 10x or more, depending on the operating system).
Apart from the caveats I already mentioned, you should also be aware of the following:
- Using the WAL journal mode, there will be two additional files next to the database file on your filesystem, which have the same name as the database, but suffixed “-shm” and “-wal”. Normally you don’t need to care, but if you were to send the database to another machine while your application is running, do not forget to include those two files. SQLite will compact these two files into the main file whenever you ordinarily closed all open database connections.
- The insert or update performance will drop occasionally, whenever the query triggers the merging of the WAL log file content into the main database file. This is called checkpointing, see here.
- I found that
PRAGMA
s that changejournal_mode
andsynchronous
do not seem to be persistently stored in the database. Thus, I always re-execute them whenever I open a new database connection, rather than just executing them when creating the tables for the first time.
Measure everything
Whenever you add performance tweaks, make sure to measure the impact. Automated (unit) tests are a great approach for this. They help to document your findings for your team, and they will uncover deviating behavior over time, e.g. when you update to a newer SQLite version. Examples for what you can measure:
- What is the effect of using the WAL journal mode over the rollback mode? What is the effect of other (supposedly) performance-enhancing
PRAGMA
s? - Once you add/change/remove an index, how much faster do
SELECT
statements become? How much slower doINSERT/DELETE/UPDATE
statements become? - How much additional disk space do the indices consume?
For any of these tests, consider repeating them with varying database sizes. E.g. run them on an empty database, and also on a database that already contains thousands (or millions) of entries. You should also run the tests on different devices and operating systems, especially if your development and production environment is substantially different.
Tune the cache size
SQLite stores temporary information in a cache (in the RAM), e.g. while building the results of a SELECT
query, or when manipulating data which has not yet been committed. By default this size is a measly 2 MB. Modern desktop machines can spare much more. Run PRAGMA cache_size = -kibibytes
to increase this value (mind the minus sign in front of the value!). See here for further information. Again, measure what impact this setting has on performance!
Use REPLACE INTO to create or update a row
This may not be so much of a performance tweak as it is a neat little trick. Suppose you need to update a row in table t
, or create a row if it doesn’t exist yet. Rather than using two queries (SELECT
followed by INSERT
or UPDATE
), use the REPLACE INTO
(official docs).
For this to work, add an additional dummy column (e.g. replacer
) to table t
, which has a UNIQUE
constrain. The column’s declaration could e.g. be ... replacer INTEGER UNIQUE ...
that is part of your CREATE TABLE
statement. Then use a query such as
REPLACE INTO t (col1, col2, ..., replacer) VALUES (?,?,...,1)
Code language: SQL (Structured Query Language) (sql)
When this query runs for the first time, it will simply perform an INSERT
. When it is run the second time, the UNIQUE
constraint of the replacer
column will trigger, and the conflict resolution behavior will cause the old row to be dropped, creating a new one automatically. You may also find the related UPSERT command to be useful.
Conclusion
Once the number of rows in your database grows, performance tweaks become a necessity. Indices are the most common solution. They trade improved time complexity for decreased space-complexity, improving read speeds, while negatively affecting data modification performance. A I’ve demonstrated, you need to be extra careful when comparing for inequality in SELECT
statements, because SQLite cannot use indices for such kinds of comparisons. I generally recommend using the query planner that explains what happens internally for each SQL query. Whenever you tweak something, measure the impact!