Python and SQLite caveats

SQLite is a popular, relational database that you embed into your application. Python comes with official bindings to SQLite. This article examines caveats of using SQLite in Python. It demonstrates problems different versions of linked SQLite libraries can cause, how datetime objects are not properly stored, and how you need to be extra careful when relying on Python’s with connection context manager to commit your data.

Introduction

SQLite is a popular, relational database (DB) system. Unlike its bigger, client-server based brothers, such as MySQL, SQLite can be embedded into your application as a library. Python officially supports SQLite via bindings (official docs). However, working with those bindings is not always straightforward. Apart from the generic SQLite caveats I discussed previously, there are several Python-specific issues we will examine in this article.

Version incompatibilities with deployment target

It is quite common that developers create and test code on a machine that is (very) different to the one where the code is deployed, in terms of operating system (OS) and hardware. This causes three kinds of problems:

  • The application behaves differently due to OS or hardware differences. For instance, you may run into performance problems when the target machine has less memory than your machine. Or SQLite might execute some operations slower on one OS than on others, because the underlying low-level OS APIs it uses are different.
  • The SQLite version on the deployment target differs from the developer machine’s version. This can cause problems in both directions, because new features are added (and behavior changes) over time, see the official changelog. For instance, an outdated deployed SQLite version may lack features that worked fine in development. Also, a newer SQLite version in deployment maybe behave differently than an older version you use on your development machine, e.g. when the SQLite team changes some default values.
  • Either the Python bindings of SQLite, or the C library, may be missing entirely on the deployment target. This is a Linux-distribution-specific problem. Official Windows and macOS distributions will contain a bundled version of the SQLite C library. On Linux, the SQLite library is a separate package. If you compile Python yourself, e.g. because you use a Debian/Raspbian/etc. distribution which ships with ancient feature versions, the Python make build script will only build Python’s SQLite bindings if an installed SQLite C library was detected during Python’s compilation process. If you do such a re-compilation of Python yourself, then you should make sure that the installed SQLite C library is recent. This, again, is not the case for Debian etc. when installing SQLite via apt, so you may also have to build and install SQLite yourself, prior to building Python.

To find out which version of the SQLite C library is used by your Python interpreter, run this command:


python3 -c "import sqlite3; print(sqlite3.sqlite_version)"

Replacing sqlite3.sqlite_version with sqlite3.version will give you the version of Python’s SQLite bindings.

Updating the underlying SQLite C library

If you want to profit from features or bug fixes of the most recent SQLite version, you are in luck. The SQLite C library is typically linked at run-time and thus can be replaced without any changes to your installed Python interpreter. The concrete steps depend on your OS (tested for Python 3.6+):

1) Windows: Download the x86 or x64 precompiled binaries from the SQLite download page and replace the sqlite3.dll file found in the DLLs folder of your Python installation with the one you just downloaded.

2) Linux: from the SQLite download page get the autoconf sources, extract the archive, and run ./configure && make && make install which will install the library into /usr/local/lib by default.
Then add the line export LD_LIBRARY_PATH=/usr/local/lib at the beginning of the shell script that starts your Python script, which forces your Python interpreter to use the self-built library.

3) macOS: from my analysis, it seems the SQLite C library is compiled into the Python bindings binary (_sqlite3.cpython-36m-darwin.so). If you want to replace it, you will likely need to get the Python source code matching your installed Python installation (e.g. 3.7.6 or whatever version you use). Compile Python from source, using the macOS build script. This script includes downloading and building SQLite’s C library, so make sure to edit the script to reference the most recent SQLite version. Finally, use the compiled bindings file (e.g. _sqlite3.cpython-37m-darwin.so), to replace the outdated one.

Working with timezone-aware datetime objects

Most Python developers typically use datetime objects when working with timestamps. There are naive datetime objects which do not know about their timezone, and non-naive ones, which are timezone-aware. It is well-known that Python’s datetime module is quirky, making it difficult to even create timezone-aware datetime.datetime objects. For instance, the call datetime.datetime.utcnow() creates a naive object, which is counter-intuitive for developers who are new to the datetime APIs, expecting Python to use the UTC timezone! Third-party libraries, such as python-dateutil, facilitate this task. To create a timezone-aware object, you can use code such as this:

from dateutil.tz import tzutc import datetime timezone_aware_dt = datetime.datetime.now(tzutc())

Unfortunately, the official Python documentation of the sqlite3 module is misleading when it comes to handling of timestamps. As described here, datetime objects are automatically converted when using PARSE_DECLTYPES (and declaring a TIMESTAMP column). While this is technically correct, the conversion will lose the timezone information! Consequently, if you are actually using timezone-aware datetime.datetime objects, you must register your own converters, which retain timezone information, as follows:

def convert_timestamp_to_tzaware(timestamp: bytes) -> datetime.datetime: # sqlite3 provides the timestamp as byte-string return dateutil.parser.parse(timestamp.decode("utf-8")) def convert_timestamp_to_sqlite(dt: datetime.datetime) -> str: return dt.isoformat() # includes the timezone information at the end of the string sqlite3.register_converter("timestamp", convert_timestamp_to_tzaware) sqlite3.register_adapter(datetime.datetime, convert_timestamp_to_sqlite)

As you can see, the timestamp is just stored as TEXT in the end. There is no real “date” or “datetime” data type in SQLite.

Transactions and auto-commit

Python’s sqlite3 module does not automatically commit data that is modified by your queries. When you perform queries that somehow change the database, you either have to issue an explicit COMMIT statement, or you use the connection as context manager object, as shown in the following example:

with connection: # this uses the connection as context manager # do something with it, e.g. connection.execute("SOME QUERY")

Once the above block was exited, sqlite3 implicitly calls connection.commit(), but only does so if a transaction is ongoing. DML (Data Modification Language) statements automatically start a transaction, but queries involving DROP or CREATE TABLE / INDEX statements do not, because they don’t count as DML according to the documentation. This is counter-intuitive, because these statements clearly do modify data.

Thus, if you run any DROP or CREATE TABLE / INDEX statements inside the context manager, it’s good practice to explicitly execute a BEGIN TRANSACTION statement first, so that the context manager will actually call connection.commit() for you.

Handling of 64-bit integers

In a previous article I already discussed that SQLite has issues with large integers that are smaller than -2^63, or greater or equal than 2^63. If you try to use them in query parameters (with the ? symbol), Python’s sqlite3 module will raise an OverflowError: Python int too large to convert to SQLite INTEGER, protecting you from accidental data loss.

To properly handle very large integers, you must:

  1. Use the TEXT type for the corresponding table column, and
  2. Convert the number to str already in Python, prior to using it as a parameter.
  3. Convert the strings back to int in Python, when SELECTing data

Conclusion

Python’s official sqlite3 module is an excellent binding to SQLite. However, developers new to SQLite need to understand that there is a difference between the Python bindings and the underlying SQLite C library. There is danger lurking in the shadows, because of version differences of SQLite. These can happen even if you run the same Python version on two different machines, because the SQLite C library might still be using a different version. I also discussed other problems such as handling datetime objects and persistently changing data using transactions. I was not aware of them myself, which caused data loss for users of my applications, so I hope that you can avoid the same mistakes I made.

Leave a Comment