SQLite is a popular, relational database that you embed into your application. With an increasing amount of data in your database, you need to apply SQLite performance tuning. This article discusses indices and its pitfalls, the use of the query planner, the Write-Ahead-Logging (WAL) journal mode and increasing the cache size. It also elaborates on the importance of measuring the impact of your tweaks, using automated tests.
SQLite is a popular, relational database (DB) system. Unlike its bigger, client-server based brothers, such as MySQL, SQLite can be embedded into your application as a library. SQLite has a very similar feature set and can also handle millions of rows, given that you know a few tips and tricks about performance tuning. As the following sections will show, there is more to know about SQLite performance tuning than just creating indices.
Create indices, but with caution
The basic idea of an index is to speed up reading of specific data, that is,
SELECT statements with a
WHERE clause. Indices also accelerate sorting data (
ORDER BY), or
JOINing tables. Unfortunately, indices are a double-edged sword, because they consume additional disk space and they slow down data manipulation (
The general advice is to create as few indices as possible, but as many as necessary. Also, indices only make sense for larger databases, with thousands or millions of rows.
Use the query planner to analyze your queries
The way indices are used internally by SQLite are documented, but not very easy to understand. As further elaborated by this article, it is a good idea to analyze a query by prefixing it with
EXPLAIN QUERY PLAN. Take a look at every output line, of which there are three basic variants:
SEARCH table ...lines are a good sign – SQLite uses one of your indices!
SCAN table ... USING INDEXis a bad sign,
SCAN table ...is even worse!
Try to avoid
SCAN table [using index] entries in the output of
EXPLAIN QUERY PLAN whenever possible, because you will run into performance problems on larger databases. Use
EXPLAIN QUERY PLAN to iterative add or modify your indices until no more
SCAN table entries appear.
Optimize queries that involve
IS NOT ... is expensive because SQLite will have to scan all rows of the table, even if the affected column has an index. Indices are only useful if you look for specific values, i.e. comparisons involving < (smaller), > (greater), or = (equal), but they don’t apply for != (unequal).
A neat little trick is that you can replace
WHERE column != value with
WHERE column > value OR column < value. This will use the column’s index and effectively affect all rows whose value are not equal to
value. Similarly, a
WHERE stringColumn != '' can be replaced by
WHERE stringColumn > '', because strings are sortable. When applying this trick however, make sure you know how SQLite handles
NULL comparisons. For instance, SQLite evaluates
NULL > '' as
If you do use such a comparison trick, there is another caveat in case your query contains
ORDER BY, each with a different column: this will make the query inefficient again. If possible, use the same column in
ORDER BY, or build a covering index that involves both the
ORDER BY column.
Improve write speed with the Write-Ahead-Log
The Write-Ahead-Logging (WAL) journal mode significantly improves the write/update performance, compared to the default rollback journal mode. However, as documented here, there are a few caveats. For instance, WAL mode is not available on certain operating systems. Also, there are reduced data consistency guarantees in case of hardware failure. Make sure to read the first few pages to understand what you are doing.
I found that the command
PRAGMA synchronous = NORMAL provides a 3-4x speedup. Setting
WAL then improves performance again significantly (approximately 10x or more, depending on the operating system).
Apart from the caveats I already mentioned, you should also be aware of the following:
- Using the WAL journal mode, there will be two additional files next to the database file on your filesystem, which have the same name as the database, but suffixed “-shm” and “-wal”. Normally you don’t need to care, but if you were to send the database to another machine while your application is running, do not forget to include those two files. SQLite will compact these two files into the main file whenever you ordinarily closed all open database connections.
- The insert or update performance will drop occasionally, whenever the query triggers the merging of the WAL log file content into the main database file. This is called checkpointing, see here.
- I found that
PRAGMAs that change
synchronousdo not seem to be persistently stored in the database. Thus, I always re-execute them whenever I open a new database connection, rather than just executing them when creating the tables for the first time.
Whenever you add performance tweaks, make sure to measure the impact. Automated (unit) tests are a great approach for this. They help to document your findings for your team, and they will uncover deviating behavior over time, e.g. when you update to a newer SQLite version. Examples for what you can measure:
- What is the effect of using the WAL journal mode over the rollback mode? What is the effect of other (supposedly) performance-enhancing
- Once you add/change/remove an index, how much faster do
SELECTstatements become? How much slower do
- How much additional disk space do the indices consume?
For any of these tests, consider repeating them with varying database sizes. E.g. run them on an empty database, and also on a database that already contains thousands (or millions) of entries. You should also run the tests on different devices and operating systems, especially if your development and production environment is substantially different.
Tune the cache size
SQLite stores temporary information in a cache (in the RAM), e.g. while building the results of a
SELECT query, or when manipulating data which has not yet been committed. By default this size is a measly 2 MB. Modern desktop machines can spare much more. Run
PRAGMA cache_size = -kibibytes to increase this value (mind the minus sign in front of the value!). See here for further information. Again, measure what impact this setting has on performance!
Use REPLACE INTO to create or update a row
This may not be so much of a performance tweak as it is a neat little trick. Suppose you need to update a row in table
t, or create a row if it doesn’t exist yet. Rather than using two queries (
SELECT followed by
UPDATE), use the
REPLACE INTO (official docs).
For this to work, add an additional dummy column (e.g.
replacer) to table
t, which has a
UNIQUE constrain. The column’s declaration could e.g. be
... replacer INTEGER UNIQUE ... that is part of your
CREATE TABLE statement. Then use a query such as
REPLACE INTO t (col1, col2, ..., replacer) VALUES (?,?,...,1)
When this query runs for the first time, it will simply perform an
INSERT. When it is run the second time, the
UNIQUE constraint of the
replacer column will trigger, and the conflict resolution behavior will cause the old row to be dropped, creating a new one automatically. You may also find the related UPSERT command to be useful.
Once the number of rows in your database grows, performance tweaks become a necessity. Indices are the most common solution. They trade improved time complexity for decreased space-complexity, improving read speeds, while negatively affecting data modification performance. A I’ve demonstrated, you need to be extra careful when comparing for inequality in
SELECT statements, because SQLite cannot use indices for such kinds of comparisons. I generally recommend using the query planner that explains what happens internally for each SQL query. Whenever you tweak something, measure the impact!