SQLite is a popular, relational database that you embed into your application. Python comes with official bindings to SQLite. This article examines caveats of using SQLite in Python. It demonstrates problems different versions of linked SQLite libraries can cause, how
datetime objects are not properly stored, and how you need to be extra careful when relying on Python’s
with connection context manager to commit your data.
SQLite is a popular, relational database (DB) system. Unlike its bigger, client-server based brothers, such as MySQL, SQLite can be embedded into your application as a library. Python officially supports SQLite via bindings (official docs). However, working with those bindings is not always straightforward. Apart from the generic SQLite caveats I discussed previously, there are several Python-specific issues we will examine in this article.
Version incompatibilities with deployment target
It is quite common that developers create and test code on a machine that is (very) different to the one where the code is deployed, in terms of operating system (OS) and hardware. This causes three kinds of problems:
- The application behaves differently due to OS or hardware differences. For instance, you may run into performance problems when the target machine has less memory than your machine. Or SQLite might execute some operations slower on one OS than on others, because the underlying low-level OS APIs it uses are different.
- The SQLite version on the deployment target differs from the developer machine’s version. This can cause problems in both directions, because new features are added (and behavior changes) over time, see the official changelog. For instance, an outdated deployed SQLite version may lack features that worked fine in development. Also, a newer SQLite version in deployment maybe behave differently than an older version you use on your development machine, e.g. when the SQLite team changes some default values.
- Either the Python bindings of SQLite, or the C library, may be missing entirely on the deployment target. This is a Linux-distribution-specific problem. Official Windows and macOS distributions will contain a bundled version of the SQLite C library. On Linux, the SQLite library is a separate package. If you compile Python yourself, e.g. because you use a Debian/Raspbian/etc. distribution which ships with ancient feature versions, the Python
makebuild script will only build Python’s SQLite bindings if an installed SQLite C library was detected during Python’s compilation process. If you do such a re-compilation of Python yourself, then you should make sure that the installed SQLite C library is recent. This, again, is not the case for Debian etc. when installing SQLite via
apt, so you may also have to build and install SQLite yourself, prior to building Python.
To find out which version of the SQLite C library is used by your Python interpreter, run this command:
python3 -c "import sqlite3; print(sqlite3.sqlite_version)"
sqlite3.version will give you the version of Python’s SQLite bindings.
Updating the underlying SQLite C library
Working with timezone-aware
Most Python developers typically use
datetime objects when working with timestamps. There are naive
datetime objects which do not know about their timezone, and non-naive ones, which are timezone-aware. It is well-known that Python’s
datetime module is quirky, making it difficult to even create timezone-aware
datetime.datetime objects. For instance, the call
datetime.datetime.utcnow() creates a naive object, which is counter-intuitive for developers who are new to the
datetime APIs, expecting Python to use the UTC timezone! Third-party libraries, such as python-dateutil, facilitate this task. To create a timezone-aware object, you can use code such as this:
from dateutil.tz import tzutc import datetime timezone_aware_dt = datetime.datetime.now(tzutc())
Unfortunately, the official Python documentation of the
sqlite3 module is misleading when it comes to handling of timestamps. As described here,
datetime objects are automatically converted when using
PARSE_DECLTYPES (and declaring a
TIMESTAMP column). While this is technically correct, the conversion will lose the timezone information! Consequently, if you are actually using timezone-aware
datetime.datetime objects, you must register your own converters, which retain timezone information, as follows:
def convert_timestamp_to_tzaware(timestamp: bytes) -> datetime.datetime: # sqlite3 provides the timestamp as byte-string return dateutil.parser.parse(timestamp.decode("utf-8")) def convert_timestamp_to_sqlite(dt: datetime.datetime) -> str: return dt.isoformat() # includes the timezone information at the end of the string sqlite3.register_converter("timestamp", convert_timestamp_to_tzaware) sqlite3.register_adapter(datetime.datetime, convert_timestamp_to_sqlite)
As you can see, the timestamp is just stored as
TEXT in the end. There is no real “date” or “datetime” data type in SQLite.
Transactions and auto-commit
sqlite3 module does not automatically commit data that is modified by your queries. When you perform queries that somehow change the database, you either have to issue an explicit
COMMIT statement, or you use the connection as context manager object, as shown in the following example:
with connection: # this uses the connection as context manager # do something with it, e.g. connection.execute("SOME QUERY")
Once the above block was exited,
sqlite3 implicitly calls
connection.commit(), but only does so if a transaction is ongoing. DML (Data Modification Language) statements automatically start a transaction, but queries involving
INDEX statements do not, because they don’t count as DML according to the documentation. This is counter-intuitive, because these statements clearly do modify data.
Thus, if you run any
INDEX statements inside the context manager, it’s good practice to explicitly execute a
BEGIN TRANSACTION statement first, so that the context manager will actually call
connection.commit() for you.
Handling of 64-bit integers
In a previous article I already discussed that SQLite has issues with large integers that are smaller than
-2^63, or greater or equal than
2^63. If you try to use them in query parameters (with the
? symbol), Python’s
sqlite3 module will raise an
OverflowError: Python int too large to convert to SQLite INTEGER, protecting you from accidental data loss.
To properly handle very large integers, you must:
- Use the
TEXTtype for the corresponding table column, and
- Convert the number to
stralready in Python, prior to using it as a parameter.
- Convert the strings back to
intin Python, when
sqlite3 module is an excellent binding to SQLite. However, developers new to SQLite need to understand that there is a difference between the Python bindings and the underlying SQLite C library. There is danger lurking in the shadows, because of version differences of SQLite. These can happen even if you run the same Python version on two different machines, because the SQLite C library might still be using a different version. I also discussed other problems such as handling datetime objects and persistently changing data using transactions. I was not aware of them myself, which caused data loss for users of my applications, so I hope that you can avoid the same mistakes I made.