Yesterday, we learnt about Docker multi-stage builds and how awesome they are.
Today, we’re pushing it even further by combining it with the ONBUILD
instruction. Fasten your seat belts!
The copy-pasted Dockerfile of hell
At the moment we have more than 50 Node.js microservices at Busbud, and each of them have a Dockerfile. Except for a couple microservices, this Dockerfile is basically the exact same: installs dependencies (including system build tools if we have some native dependencies), and add our app.
We managed to do that in both a time efficient and space efficient manner by using multi-stage builds. But we know that having many Dockerfiles that are basically identical is a pain to maintain. We’ve experienced this pain first hand over the last two years worth of changes.
In the beginning we were editing the Dockerfiles manually, but when we had more than 20 it started to be a real pain.
I started
bufdo
ing Vim macros to ease that, for a bit, but by the time we had 40 or more Dockerfiles I was editing them all at once throughsed -i
scripts and checking thegit diff
to make sure it applied the change properly across all the Dockerfiles that can have slight variations (some have build dependencies, some other don’t, some have additional dependencies, some require extra build steps, etc.).
This is too much work, and as I’m lazy, I decided to refactor out as much as I could from the Dockerfiles so we have a single “top-level” Dockerfile to edit when we want to make a change in the way we build our microservices.
Using ONBUILD
instructions
That’s where ONBUILD
saves us: we can make a base image that
not only refactors out the common parts (installing a specific version
of npm or Yarn, installing build dependencies), but also the
instructions that are specific to a build, like adding files to the image,
and any RUN
instruction that depends on those files to be added:
# node:x.x.x-alpine-onbuild
FROM node:x.x.x-alpine
WORKDIR /app/
RUN apk add --no-cache --virtual .build-deps python make g++
RUN rm /usr/local/bin/yarn && npm install -g yarn
ONBUILD COPY ./package.json ./yarn.lock /app/
ONBUILD RUN yarn --production
ONBUILD COPY . /app/
ONBUILD RUN apk del .build-deps
CMD ["node", "."]
Then we can extend that base image in all our Dockerfiles, and have the
FROM
be the only line needed:
FROM node:x.x.x-alpine-onbuild
But we also saw how multi-stage builds allowed us to separate the build steps from the runtime image to push the smallest image possible to the registry. What if we could have both?
Multi-stage + ONBUILD
= ❤️
I’m not sure if it’s a bug, a feature, or a undefined behavior, but it
turns out that you can reference a build stage from the ONBUILD
instructions of a base image. Sounds confusing? It’ll be more clear with
an example.
Let’s start by making a base image only for building our app:
# node:x.x.x-builder
FROM node:x.x.x-alpine
WORKDIR /app/
RUN apk add --no-cache --virtual .build-deps python make g++
RUN rm /usr/local/bin/yarn && npm install -g yarn
ONBUILD COPY ./package.json ./yarn.lock /app/
ONBUILD RUN yarn --production
ONBUILD RUN apk del .build-deps
So far so good, nothing fancy. We can extend it and make a new stage to make a small production image:
FROM node:x.x.x-builder AS builder
FROM alpine:x.x
WORKDIR /app/
ONBUILD COPY --from=builder /usr/local/bin/node /usr/local/bin/
ONBUILD COPY --from=builder /usr/lib/ /usr/lib/
ONBUILD COPY --from=builder /app/ /app/
ONBUILD COPY . /app/
CMD ["node", "."]
That’s great, but we can go deeper. Let’s extract the second stage that defines the runtime image in a base image too:
# node:x.x.x-runtime
FROM alpine:x.x
WORKDIR /app/
ONBUILD COPY --from=builder /usr/local/bin/node /usr/local/bin/
ONBUILD COPY --from=builder /usr/lib/ /usr/lib/
ONBUILD COPY --from=builder /app/ /app/
ONBUILD COPY . /app/
CMD ["node", "."]
And modify our Dockerfile to use it:
FROM node:x.x.x-builder AS builder
FROM node:x.x.x-runtime
There’s no way this would work, right?
$ docker build -t multi-stage-onbuild-test .
Sending build context to Docker daemon 1.295MB
Step 1/2 : FROM node:6-builder as builder
# Executing 3 build triggers...
Step 1/1 : COPY ./package.json ./yarn.lock /app/
Step 1/1 : RUN yarn --production
---> Running in 2af6b63fe907
yarn install v0.24.4
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 1.32s.
Step 1/1 : RUN apk del .build-deps
---> Running in d4d220dec219
(1/26) Purging .build-deps (0)
...
OK: 6 MiB in 13 packages
---> ab798b07ef77
Removing intermediate container fcb4e828e943
Removing intermediate container 2af6b63fe907
Removing intermediate container d4d220dec219
Step 2/2 : FROM node:6-runtime
# Executing 4 build triggers...
Step 1/1 : COPY --from=builder /usr/local/bin/node /usr/local/bin/
Step 1/1 : COPY --from=builder /usr/lib/ /usr/lib/
Step 1/1 : COPY --from=builder /app/ /app/
Step 1/1 : COPY . /app/
---> ba7bd1900b34
Removing intermediate container b6dd4d376ef6
Removing intermediate container 9292baa3d600
Removing intermediate container 2e7364e2c000
Removing intermediate container 37b69f6ddd38
Successfully built ba7bd1900b34
Successfully tagged multi-stage-onbuild-test:latest
As surprising as it can be, this image builds and does exactly what we want!
Now all our Node.js microservices’ Dockerfiles are those exact two lines, with optional additional steps in both the builder and runtime if needed. But the common part is now factored away from the microservice images; it sits cleanly in the base builder and runtime images!
Can we go even deeper?
Combining ONBUILD
and two base images using multi-stage builds allowed
us to have trivial Dockerfiles and keep common logic in only one place,
but it requires to maintain two Dockerfiles, in the same way the
builder pattern did until multi-stage builds were
introduced. We also need two lines in the main Dockerfile, and we have
to give the build stage the name that is expected by the runtime image.
This is definitely a hack and could be cleaner by having a way for a
base image to define multiple stages in the context of the downstream
build, the same way ONBUILD
does for commands. That would allow the
community to make “buildpack” images, that could build and package
applications for production, in a generic way, while keeping all the
benefits of multi-stage builds.
We could even get rid of Dockerfiles entirely if we want, when the buildpack already supports everything we need. Imagine the following:
docker build --buildpack=node:6 -t myapp .
The buildpack would know how to add your package.json
, figure out if
it needs to run npm install
or yarn
, and add everything to a small
runtime image that doesn’t include any build dependencies.
Turns out this is nearly possible… you can use the --file
option to
use a custom Dockerfile path (that can be the two lines one we just made):
docker build -f /path/to/buildpacks/node/6 -t myapp .
The downside is that the file can’t be remote, and can’t be managed like base images. Also it wouldn’t allow a way to customize the build out of the box, so it’s maybe not that of a good solution.
Even if it looks hacky, the current solution we have of having two
FROM
, one for the bulilder and one for the runtime, allows us to
customize the build process and the runtime image without any special
syntax, and those two base images can be pulled from the registry like
any other image.
You’re now all caught up on how we use Docker at Busbud to ship fast and build out the world’s largest bus supply. If you’re interested in these challenges and more, be sure to reach out, we’re hiring!
Photo by Thomas Kelley.