Alpine Linux has become quite popular as Docker base image due to its size. However, instead of the common glibc it uses muslc which could impact you depending on what software you run on it.
Here’s a list of reasons for using a mainstream distribution for your python applications instead of Alpine:
- Nobody publishes Alpine wheels, so you have to build libraries from source (e.g. cryptography, pandas, numpy). Even if you can make it work, there may be inconsistency in the behavior from what the library authors intend/test/experience.
- Size has zero bearing on your computer bill here. Anecdotally, Alpine packages are usually slower (require more compute, not less).
- It is highly unlikely that the cost of storage of one image vs the other is more than a couple hours engineering work.
- Musl has different malloc, different libm, different pthreads. These things all will impact performance in some way (maybe better, maybe worse), and will hit implementation bugs in libraries at some point because someone coded too specifically against glibc. This + development time tradeoff risk doesn’t seem worthwhile, especially in data science where many libraries are needed and the library quality varies so wildly and often aren’t even portable beyond one specific version of one specific Linux distribution.
Read using Alpine can make Python Docker builds 50× slower for an in-depth explanation.
Let’s look at a real-world example:
$ docker run --rm -it python:alpine apk add alpine-sdk libffi-dev time pip install pandas cryptography ... real 16m13.104s user 16m9.311s sys 0m7.847s
and then repeat with python:slim:
$ docker run --rm -it --entrypoint=bash python:slim time pip install pandas cryptography ... real 0m12.107s user 0m6.410s sys 0m0.947s
This was on a 3990X (64C128T 256GB ram). Is your CI this fast? Now which is bigger? The alpine one is now much bigger!
See, you’re are much better off using a debian-based distro like -slim, it’s more or less the same size, it’s way more compatible. It saves build time, debug time, hassle.
- python:alpine is 107MB
- python:slim is 193MB
That 86MB becomes ~0% once you install your other packages. But for sure you will spend hours debugging why its different, yielding negative cost savings unless you are running thousands of instances or need to spool up in the absolute lowest time in some highly dynamic serverless environment.
Still not convinced? There is no list of security flaws (CVE) maintained for alpine, so tools like Clair (SAST) don’t scan it properly. The closest is the alpine-secdb:
The purpose of this database is to make it possible to know what packages has backported fixes. It is not a complete database of all security issues in Alpine, and it should be used in combination with another more complete CVE database.
Unless you want massively slower build times, larger images, more work, and the potential for obscure bugs, you’ll want to avoid Alpine Linux as a base image.