Docker python image with psycopg2 installed

Will Barillon
7 min readApr 18, 2022

Situation : you’re a django web developper and are about to create an application with a postgresql backend. You learnt all the pros about containerizing your app thanks to Docker and are looking to do so. However, installing psycopg2 is not working and the first workarounds coming from the first results on internet are just “install psycopg2-binary”. You know they are wrong, yet still can’t find an easy way to just have a decent dockerfile allowing you to install psycopg2 on your python image.

If your situation is close to the one above, you can just go to the end of this article where dockerfiles are waiting for you. You can also read the whole article and learn how to write those Dockerfiles step-by-step.

Enough with the introduction and let’s get started !

Set-up the environment

First of all, you need to install docker. For windows and mac users, I recommend to install docker desktop. It is easier to manage your containers and it installs also docker-compose, another useful tool completing Docker.

Installation for windows : https://docs.docker.com/desktop/windows/install/

Installation for mac : https://docs.docker.com/desktop/mac/install/

Once it is installed, you need to set-up your environment like so :

  • New folder in which we’re going to create our image.
  • 2 files in the new folder : a requirements.txt file with psycopg2==<insert last-version> written and an empty Dockerfile.
  • A terminal opened with your new folder as endpoint. An endpoint is the end of a path.
  • Start docker desktop (you guys from Linux should know the workaround, come on you know your environment better than me ;) ).
End result of a working environment set-up
End result of a working environment set-up

We’re now ready to write our Dockerfile !

Naive approach : use the official python image

Let’s first install the python image and see what happens.

Disclaimer : do not pay attention to the build time. Even though I live in France, I have an african internet.

docker build is the command to create an image from a dockerfile, followed by the -t flag in order to add a tagname and the tagname. Finally we want our image to be created from the location of the dockerfile so we add a dot at the end.

docker build -t <tagname> .

As we can see on the screenshot, everything went fine. It wasn’t that hard after all ! Let’s check the size of our image.

Almost 1 Gb for 2 dependencies installed and python ? That’s a lot. As a comparison, on a local environment, this wouldn’t weight more than 250 Mb at most.

My guess as why the official python image is so big is that a lot of package — among unnecessary ones for our case — have been installed. There should be a way to install a lighter version of python and thus reducing the size of our docker image.

Lighten the python image with python-slim

Python slim is another image of python, but without every packages included. Let’s hope that even on this slim version we have everything needed to install our dependencies.

It appears that not. Running the slim version of python generates an error. As we’ve feared, there are packages missing that we need in order to install psycopg2.

We need to install them “manually”. Fortunately, even if a python image suggest we only have python provided for our image, it is not exactly the case. We actually have an extra-light OS derived from Debian working on the background. Thus we can use linux commands that will enable us to install our needed packages : libpq-dev, which is the very light and minimal package used to interact with a postgresql database and gcc, a kindof c compiler used to install psycopg2.

We have our image with a successfull installation ! Let’s check its size.

We’re getting closer to a reasonable size. But we can do better thanks to multi-stage build.

Leave the unnecessary behind and keep only the essential

gcc is a package whom sole purpose is to install psycopg2. Once it’s done, it is merely a burden to our docker image.

Multi-stage build allows us to create an image from another image. In this paradigm, we are no longer use the word “image” but “stage” instead. Multiple stages implying the construction of images to create the “final image”.

The plan is to create a first stage — the builder — installs every dependencies we need. Then we copy those dependencies on the next stage used to run our django application. Question is : how are we suppose to locate and copy only the files of our dependencies ?

A python virtual environment !

Let’s rewrite our dockerfile to create a virtual environment, activate it, install the dependencies listed on requirements.txt. Then we’re going to copy the virtual environment in the next docker stage, leaving behind gcc and sparing us its part on the size of the final docker image.

Not bad at all, I must say ! Maybe we can do better by improving a few details. The first — and only that comes to my mind — is that during the apt-get update, our linux OS is generating a list of every packages listed that can be updated. It’s not much but we’re starting to get a small images so any Mb spared are welcome.

To do so we’re going to add the following command after the installation of our packages : rm -rf /var/lib/apt/lists/*. Remember to do this only for our last stage, since the previous one is going into oblivion once the docker image is built. There is no need to worry about its size.

Now, how many Mb this little maneuver made us spare ?

18 Mb, good enough. I think we’ve reached a reasonable size for our python image. Some people could also add no-install-recommends flag during the installation of packages. I don’t do that for security reason. That would be a shame if our image wouldn’t receive any updates for security or performance, only to spare a few Mb, isn’t it ?

But alpine is lightier !

Alpine is a different linux distribution. Python-slim is based on a debian distribution and python-alpine is based on an alpine distribution. It is indeed ligthier than the debian one, but there are some fundamental differences that made historically python developper very reluctant to the idea of using an alpine image.

If alpine is better for most of project, when we’re dealing with python or c project, we need to be careful. I’m not an expert of the topic, but the general idea is that Alpine is not compiling C code the way debian does. For example you’ll notice a noticable difference between the speed build of the same image, depending on alpine distribution or debian distribution.

As far as I’m concerned, I’m afraid that, even nowaday in 2022, the difference of compilation might lead to very dark and weird bugs. So I won’t use an alpine image in a professional frame. But hey, let’s give it a try and write our dockerfile with a python-alpine base.

Indeed alpine is holding its promise of being lightier. But the different syntax and its different behavior toward C code might significant cons if you don’t know what’s going on with C code, like me.

Final dockerfiles

Our images are fine but not useful yet. We need to add a few things. PYTHONDONTWRITEBYTECODE and PYTHONBUFFERED are almost mandatory when working with python. You’ll avoid some weird bugs by enabling it in your environment. Finally, as I wrote this dockerfile for myself with the intention to develop my django application within a docker container, I need to copy paste all the content of my directory in the docker container with a COPY clause.

You may also make sure to reduce the number of layers by stacking as many commands as possible on a single one. It works for RUN clause but also for ENV or ARG clause.

Thank you for your time and stay tune to read my next article about developping your django application inside your docker container.

--

--

Will Barillon

A python developer / data scientist that wants to provide useful informations to everyone.