Advanced Python link collection

This article is a link collection for Python developers who want to go beyond the basics. It presents my personal favorite articles and libraries for various use cases.

Introduction

Whenever you fall in love with a programming language, you will wonder what you can do with it, beyond the basics. Most modern programming languages come with a considerable runtime library full of functionalities for things like string manipulation, concurrency or I/O handling. Usually, there is a large ecosystem of third party libraries which extend what you can do with the language. Unless you are familiar with this ecosystem you cannot really consider yourself to be proficient in that language.

In this article I’m focusing on Python, presenting my personal favorite libraries, articles and learning resources. I hope that they will help you solve your daily problems faster.

Links by topic

Python’s standard library

Python has a -H-U-G-E- standard library. You have undoubtedly used quite a few popular modules, such as sys, os, or json. However, there are many more modules with features you’d never have thought are part of Python. For instance, did you know that Python has a module to process audio files, or one to diff strings?

While the official library reference site has the full list, I recommend the Python 3 Module Of The Week site. It is well structured and offers an introduction to many of the standard library modules, with extensive tutorials which illustrate their use.

As for sqlite3, take a look at my SQLite-related articles, where I discuss pitfalls and performance improvements for SQLite.

Static type hints

While Python is a dynamically typed language, it is possible to add static type hints to your code. This makes it more readable and allows for static analyzers to check your code for type errors, e.g. during development or in a CI pipeline. In my Static Python type hints article I present an introduction into the topic.

Once you have a code base with static type annotations, there are further libraries which make use of them. For instance, you can use typer to build a Command Line Interface (CLI) with minimal effort, because the provided CLI arguments are parsed according to your type annotations. Also, if you use Sphinx to generate documentation (I discuss Sphinx in this article), then projects like sphinx-autodoc-typehints can make use of your type hints for the generated docs.

Web development – performance & security

Python is a very popular language to build websites and backends, with popular frameworks such as Django (+DRF), Flask or FastAPI. There are two important topics which are often neglected, but which you should understand well and get right – at least in your production environment: performance and security.

As for performance, there are numerous articles which compare different frameworks, or the WSGI servers used to serve them, which are often decoupled. You should be familiar with the different approaches on how to configure WSGI servers. This article explains common options, such as spawning multiple processes, threads or green threads. This article makes a case for a possible optimization, by using PyPy instead of plain (C)Python. Recently, asyncio support has made its way into several frameworks via ASGI, which is essentially similar to using green threads. However, benchmarks made by others are dangerous. I highly recommend this article which not only presents a large list of WSGI/ASGI servers and Python web frameworks, but also illustrates how easy it is to tune the benchmark parameters to get completely different benchmark results. Don’t get me wrong, benchmarking your own system is still worth your while, because it is specific to your hardware and code. And of course, you only need to do it if you actually run into performance problems. Avoid premature optimization ;).

Regarding security, new problems arise all the time. Apart from following general practices such as OWASP Top 10 or OWASP API Top 10, I recommend following conferences closely related to your used technology, e.g. DjangoCon (schedule, videos) if you are working with Django.

Handling names of people

Whenever you build an application that takes the real name of a person as input, there are many things that you can do wrong. A common pitfall is that your validation routines are too strict, causing dismay when the user is told by your application that their name is supposedly “invalid”. If you want to learn more, I recommend the fantastic talk “Your name is invalid” by Miroslav Šedivý (video, slides) given during EuroPython 2020.

Profiling & debugging

Profiling your application is a common technique to understand why your code (and which parts of it) are slow. Python’s (c)Profile capabilities are a good start, and some IDEs such as PyCharm have good visualization capabilities of the contained information. If you want to get even more out of it, there are tools which let you analyze visual traces. Consider taking a look at viztracer or FunctionTrace (which supports macOS and Linux) or py-spy. For an elaboration on tools based on Python’s basic profiling features (such as timeit, profile or cProfile), see this article. Other great profilers are scalene and austin. If you want to continuously profile your application in production, then check out Pyroscope!

Another common technique is debugging. You most likely use the graphical debugger of your IDE, place break points and step through your code manually. Or you may prefer CLI tools from way back then, using PDB and remember all those fancy commands by heart. Either way, this approach has a few problems. For instance, you only get to shed light on the code path that you decided to step through, ignoring all other paths. Stepping through code is also not possible if it is timing-dependent, and stepping at a break point for several seconds will cause incorrect behavior. Instead of plastering your code with log statements, consider Cyberbrain, which traces all the variables changes inside a function over time, and presents them in a visual graph. The project is still in an early stage, but you may already benefit from it.

Package managers

Did you ever wonder why the Python community came up with so many package managers and tools to isolate virtual environments? Off the top of my head, there is pip, pyenv, pipenv, pipx, poetry, easy_install, and virtualenv. What do these tools do, how do they differ, and which dependencies exist? There is a talk by Michał Wodyński which answers these questions (video, slides).

Testing

Writing automated tests that actually improve your code’s stability is a craft – and not an easy one. For good reasons there are specialized job positions for this kind of task, e.g. QA engineers.

If you ever find yourself in the position of writing tests in Python, then you should definitely use pytest instead of Python’s built-in unittest module. Pytest has way more features and has been extended by many third-party plug-ins, making it even more powerful. This introduction article will get you started.

Building the right tests as a difficult task. You have to choose suitable criteria for test case selection, such as statement/branch/(basis) path coverage in case of whitebox testing, or functional/input/output coverage for blackbox testing. You also have to apply further techniques (and mixtures thereof) such as Equivalence Class Testing or Boundary Value Testing. To make this job a bit easier, I found two very interesting techniques:

  1. Mutation testing: a mutation tester modifies the code of the system under test at run-time, then runs your test suite. Since the behavior of the tested code was changed, your (unadapted) test suite should no longer pass. If it does still pass, the mutation tester raises an error. There are Python libraries such as mutmut (with links to further details) which do this.
  2. Property-based testing: a generator builds random test inputs (where you provide some guidance, e.g. upper/lower limits on the input length), and you think of generic properties that always hold, irrespective of the concrete values of the generated inputs. For instance, if you implement a container structure (similar to a dict), then added N randomly generated objects to the container, then the property (1 <= len(container) <= N) should always hold. Check out the hypothesis framework, which has excellent documentation and many introductory articles. You can also take a look at this advanced article demonstrating how to generate objects of a more complex type.

Finally, there’s the option to have a tool analyze your code and generate test code from it. Check out Pynguin, which does this for Python. Naturally, you have to treat the result with care, and check each generated test function manually!

Functional programming

Python is typically used as object-oriented language. However, you can mix it with constructs from functional programming. There are various helper libraries, such as toolz, more-itertools or funcy.

If your application is processing data streams, consider learning about reactive programming. In particular, ReactiveX is a general paradigm with a set of API definitions that has been implemented in many programming languages, including Python with RxPY.

Plotting / data visualization

Unless you’re completely new to data visualization in Python, you probably have used libraries such as matplotlib (or a higher-level API, like Seaborn), or Pandas. One of the most powerful frameworks I have encountered is Altair, which is a declarative statistical visualization library, based on Vega and Vega-Lite.

Before choosing and studying any particular library, however, it makes sense to become familiar with the different kinds of graphs that exist, and which ones to use, depending on the circumstances. There are some great catalogues such as https://datavizcatalogue.com, https://datavizproject.com or http://chartmaker.visualisingdata.com which provide a good overview.

Writing decorators

Python decorators are a great mechanism to add functionality to methods or classes, without them knowing. However, understanding them is not an easy task. To get started, I recommend this recent introduction article that both explains the concepts and provides practical examples.

Performance via native code

Sometimes Python’s run-time performance is not sufficient. This is especially true with algorithms processing large amounts of data. To get more speed, it is possible to instead write the functionality in languages that compile to machine code, such as C or C++, and then call the functionality from Python with a wrapper. This is also what some libraries such as numpy do, to speed up performance.

However, you don’t need to learn a new language like C++ to speed up your own algorithms. Cython is a great tool that lets you write Python-like code (with a few minor modifications) that is translated to C code and can then easily be called from Python. Take a look at this excellent introduction article, or the official documentation. If you’re interested in wrapping existing C++ libraries, this article will be helpful. An alternative library to create C++ wrappers is pybind11.

Configuration management

Configuration management refers to the ability of your application to take in configuration arguments at launch (or even at run-time), rather than using hard-coded values. There is a plethora of available options, such as environment variables, files (e.g. .env, JSON or INI), or databases. Things quickly become complex when you need to implement several options, error handling (e.g. when parsing types), configuring default values, or cascaded configuration options (e.g. look for environment variables first which have the highest priority, then an .env file, then a database). One of the most capable Python packages that support you with these tasks is Dynaconf, which supports a plethora of input (file) formats.

Interesting blogs and video sites

Since technology evolves constantly, I recommend that you subscribe to text- and video-based sites which continuously post new articles about the hottest frameworks, standard library additions, etc.

Regarding text-based content, such as blogs and newsletters:

  • Python Weekly, a newsletter featuring curated news, articles, new releases of libraries, and events
  • Real Python, a huge collection of tutorials for Python’s standard features and third party libraries
  • Itamar’s Blog, which focuses on Python performance and memory optimization, and best practices for packing Python applications using Docker containers
  • Chris’ blog, with a focus on data analysis (e.g. using Pandas)
  • Hynek’s Blog, which is a bit inactive since one year, but there is a lot of good content!

If you prefer watching videos:

  • PyVideo lists all videos of the latest Python-related conferences
  • calmcode is a collection of video tutorials for various Python libraries

Conclusion

Python has a huge community which keeps building great functionality. Reusing software made by others takes a lot of work off your shoulders. But you need to keep in mind that software reuse is not for free. You have to evaluate available options, which becomes increasingly harder the more options there are. Once you settled on a library, you need to invest time to learn it, and staying up to date with new libraries is no picnic either.

I hope that this article helped you find tools and libraries for specific use cases. My default advice for “further reading” are awesome lists. These are large, categorized lists maintained by the community, for any kind of popular technology – be it a programming language or library. Just search for “awesome <technology name>” on the Internet. For Python, there are multiple ones, such as this one. Good luck!

Leave a Comment