SQLite performance tuning

SQLite is a popular, relational database that you embed into your application. With an increasing amount of data in your database, you need to apply SQLite performance tuning. This article discusses indices and its pitfalls, the use of the query planner, the Write-Ahead-Logging (WAL) journal mode and increasing the cache size. It also elaborates on the importance of measuring the impact of your tweaks, using automated tests.

Introduction

SQLite is a popular, relational database (DB) system. Unlike its bigger, client-server based brothers, such as MySQL, SQLite can be embedded into your application as a library. SQLite has a very similar feature set and can also handle millions of rows, given that you know a few tips and tricks about performance tuning. As the following sections will show, there is more to know about SQLite performance tuning than just creating indices.

Create indices, but with caution

The basic idea of an index is to speed up reading of specific data, that is, SELECT statements with a WHERE clause. Indices also accelerate sorting data (ORDER BY), or JOINing tables. Unfortunately, indices are a double-edged sword, because they consume additional disk space and they slow down data manipulation (INSERT, UPDATE, DELETE).

The general advice is to create as few indices as possible, but as many as necessary. Also, indices only make sense for larger databases, with thousands or millions of rows.

Use the query planner to analyze your queries

The way indices are used internally by SQLite are documented, but not very easy to understand. As further elaborated by this article, it is a good idea to analyze a query by prefixing it with EXPLAIN QUERY PLAN. Take a look at every output line, of which there are three basic variants:

  • SEARCH table ... lines are a good sign – SQLite uses one of your indices!
  • SCAN table ... USING INDEX is a bad sign,
  • SCAN table ... is even worse!

Try to avoid SCAN table [using index] entries in the output of EXPLAIN QUERY PLAN whenever possible, because you will run into performance problems on larger databases. Use EXPLAIN QUERY PLAN to iterative add or modify your indices until no more SCAN table entries appear.

Optimize queries that involve IS NOT

Checking for IS NOT ... is expensive because SQLite will have to scan all rows of the table, even if the affected column has an index. Indices are only useful if you look for specific values, i.e. comparisons involving < (smaller), > (greater), or = (equal), but they don’t apply for != (unequal).

A neat little trick is that you can replace WHERE column != value with WHERE column > value OR column < value. This will use the column’s index and effectively affect all rows whose value are not equal to value. Similarly, a WHERE stringColumn != '' can be replaced by WHERE stringColumn > '', because strings are sortable. When applying this trick however, make sure you know how SQLite handles NULL comparisons. For instance, SQLite evaluates NULL > '' as FALSE.

If you do use such a comparison trick, there is another caveat in case your query contains WHERE and ORDER BY, each with a different column: this will make the query inefficient again. If possible, use the same column in WHERE and ORDER BY, or build a covering index that involves both the WHERE and ORDER BY column.

Improve write speed with the Write-Ahead-Log

The Write-Ahead-Logging (WAL) journal mode significantly improves the write/update performance, compared to the default rollback journal mode. However, as documented here, there are a few caveats. For instance, WAL mode is not available on certain operating systems. Also, there are reduced data consistency guarantees in case of hardware failure. Make sure to read the first few pages to understand what you are doing.

I found that the command PRAGMA synchronous = NORMAL provides a 3-4x speedup. Setting journal_mode to WAL then improves performance again significantly (approximately 10x or more, depending on the operating system).

Apart from the caveats I already mentioned, you should also be aware of the following:

  • Using the WAL journal mode, there will be two additional files next to the database file on your filesystem, which have the same name as the database, but suffixed “-shm” and “-wal”. Normally you don’t need to care, but if you were to send the database to another machine while your application is running, do not forget to include those two files. SQLite will compact these two files into the main file whenever you ordinarily closed all open database connections.
  • The insert or update performance will drop occasionally, whenever the query triggers the merging of the WAL log file content into the main database file. This is called checkpointing, see here.
  • I found that PRAGMAs that change journal_mode and synchronous do not seem to be persistently stored in the database. Thus, I always re-execute them whenever I open a new database connection, rather than just executing them when creating the tables for the first time.

Measure everything

Whenever you add performance tweaks, make sure to measure the impact. Automated (unit) tests are a great approach for this. They help to document your findings for your team, and they will uncover deviating behavior over time, e.g. when you update to a newer SQLite version. Examples for what you can measure:

  • What is the effect of using the WAL journal mode over the rollback mode? What is the effect of other (supposedly) performance-enhancing PRAGMAs?
  • Once you add/change/remove an index, how much faster do SELECT statements become? How much slower do INSERT/DELETE/UPDATE statements become?
  • How much additional disk space do the indices consume?

For any of these tests, consider repeating them with varying database sizes. E.g. run them on an empty database, and also on a database that already contains thousands (or millions) of entries. You should also run the tests on different devices and operating systems, especially if your development and production environment is substantially different.

Tune the cache size

SQLite stores temporary information in a cache (in the RAM), e.g. while building the results of a SELECT query, or when manipulating data which has not yet been committed. By default this size is a measly 2 MB. Modern desktop machines can spare much more. Run PRAGMA cache_size = -kibibytes to increase this value (mind the minus sign in front of the value!). See here for further information. Again, measure what impact this setting has on performance!

Use REPLACE INTO to create or update a row

This may not be so much of a performance tweak as it is a neat little trick. Suppose you need to update a row in table t, or create a row if it doesn’t exist yet. Rather than using two queries (SELECT followed by INSERT or UPDATE), use the REPLACE INTO (official docs).

For this to work, add an additional dummy column (e.g. replacer) to table t, which has a UNIQUE constrain. The column’s declaration could e.g. be ... replacer INTEGER UNIQUE ... that is part of your CREATE TABLE statement. Then use a query such as


REPLACE INTO t (col1, col2, ..., replacer) VALUES (?,?,...,1)

When this query runs for the first time, it will simply perform an INSERT. When it is run the second time, the UNIQUE constraint of the replacer column will trigger, and the conflict resolution behavior will cause the old row to be dropped, creating a new one automatically. You may also find the related UPSERT command to be useful.

Conclusion

Once the number of rows in your database grows, performance tweaks become a necessity. Indices are the most common solution. They trade improved time complexity for decreased space-complexity, improving read speeds, while negatively affecting data modification performance. A I’ve demonstrated, you need to be extra careful when comparing for inequality in SELECT statements, because SQLite cannot use indices for such kinds of comparisons. I generally recommend using the query planner that explains what happens internally for each SQL query. Whenever you tweak something, measure the impact!

Leave a Comment