When learning docker an interesting question you may have is: What happens with commands that install software when rebuilding an image.
Let’s give a scenario:
You create a simple docker file like this:
FROM python:2.7-slim
# Update the system
RUN apt-get update -y && apt-get upgrade -y
# add some source files
ADD . /app
# expose a port
EXPOSE 80
# Run app.py when the container launches
CMD ["python", "app.py"]
this is not a complete Dockerfile
, just the snippets we are interested in.
If we try to build this an image based on this file, we get something like this:
Step 1/5 : FROM python:2.7-slim
...
Step 2/5 : RUN apt-get update -y && apt-get upgrade -y
...
Step 3/5 : ADD . /app
...
Step 4/5 : EXPOSE 80
...
Step 5/5 : CMD ["python", "app.py"]
...
Successfully built 3609163bdce5
(the output is truncated to save space).
The important step is 2
where we update the system. We can see that this will download and install new versions for the libraries that we have on the system.
This can pose a problem because we may rely on a specific version of the library. In other words, next time we build the image (imagine we updated the codebase and need to create a new image
to deploy) the apt-get update -y && apt-get upgrade -y
run and ruin the image (our application not handling the new libraries).
UnionFS to the rescue
One thing to keep in mind is the way images are built/stored/transferred using a Union file system that works with layers. If I a layer is already found it will not be “compiled” again, this means that if we add a new file to our directory echo "new dummy file" >> dummy
and our current directory will look like this:
~ $ ls
Dockerfile dummy
When running a new build:
~ $ docker build .
Sending build context to Docker daemon 3.072kB
Step 1/5 : FROM python:2.7-slim
---> b0259cf63993
Step 2/5 : RUN apt-get update -y && apt-get upgrade -y
---> Using cache
---> 00414512ae60
Step 3/5 : ADD . /app
---> 7c8266369b33
Step 4/5 : EXPOSE 80
---> Running in 6bc4c5a16d2f
Removing intermediate container 6bc4c5a16d2f
---> ae8b02e34ced
Step 5/5 : CMD ["python", "app.py"]
---> Running in a78e73b6e1b9
Removing intermediate container a78e73b6e1b9
---> 96dd2f890cbf
Successfully built 96dd2f890cbf
we can see that the step 2
was skipped and the cached version for that layer (that runs RUN apt-get update -y && apt-get upgrade -y
) was used.
This is an important distinction to make if we build this image on the same machine (environment) where it was previously built the RUN
command will not be executed.
This helps us maintain the same OS-lvl libraries and binaries when we build images but open a new problem: security.
One reason why you may want to update the libraries is to make sure you have the latest security patches.
The build command helpfully allows us to disable the cache with the --no-cache
command. Running the build in this way we can see that the updates are running.
docker build . --no-cache
Sending build context to Docker daemon 3.072kB
Step 1/5 : FROM python:2.7-slim
---> b0259cf63993
Step 2/5 : RUN apt-get update -y && apt-get upgrade -y
---> Running in cce3fb472a18
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
Get:2 http://security.debian.org jessie/updates/main amd64 Packages [588 kB]
...
...
---> 1dc0f9d842b1
Step 3/5 : ADD . /app
---> 597a83647250
Step 4/5 : EXPOSE 80
---> Running in 4e0da110e6f4
Removing intermediate container 4e0da110e6f4
---> 8159270ed233
Step 5/5 : CMD ["python", "app.py"]
---> Running in 8c3928f25d7a
Removing intermediate container 8c3928f25d7a
---> 4eb0161a51ab
Successfully built 4eb0161a51ab
Another problem with updating images is the amount of unnecessary updated it will bring, a docker image should be light and only contain the required dependencies. Thankfully most containers don’t and shouldn’t be exposed to the internet. This minimizes the attack vectors a bad actor can use.
Caching
Another problem with apt-get
in a container is the caching problem. You can’t have a docker file that looks like this:
RUN apt-get update -y
RUN apt-get upgrade -y
The first command will use a cache that is not shared with the second command; this is a good reason why you should chain them using bash &&
.