Amazon CodeBuild boilerplate for cached Docker image builds

Posted on Feb 15, 2021

How to cache docker builds on AWS CodeBuild?

I switched a lot of my spin off projects to AWS recently. One of the components I love is AWS CodeBuild - true godsend for the lazy :) CodeBuild provides a cost effective way to manage your builds, especially if you build smaller images infrequently. You do not have to spin up a new EC2 host, and wait for provisioning, instead, you just hit the Build button (or automate the pipeline), and magic happens. Since Amazon gives you 100 free build minutes a month, chances are that you will stay in the free tier forever, and you can also automate push to your private ECR repository.

My expectations from a good build pipeline are not exaggerated:

  • Build fast so we can be cheap - this means, I want Docker cache. This is important for me, because I work mostly with R, and downloading/installing packages is a royal pain (even binary ones)
  • Make a nice pipeline that can be copypasted to multiple projects, because I am lazy

Caching docker is a bit tricky - CodeBuild specifies and supports 2 cache modes directly:

  • S3 - they explicitely state this is not recommended for docker (also, you will be accumulating S3 GET requests like crazy)
  • local - this actually means host, CodeBuild will ‘try’ to cache your stuff on the host, and if you happen to run a subsequent build on the same host, it will use that. I actually never managed to hit this cache, unless I did subsequent builds in the same minute. But my time between builds is measures in days or weeks, not minutes.

Officially, there is no other solutions, but there is a way to use docker itself - just pull the image you are about to rebuild 1st:

  • My build and stuff happens to be in same availability zone - pulling from my own repository is actually cheaper than hitting rstudio :)
  • It is damn fast
  • Since my base R image is the same, and I very rarely change it, I am guaranteed to hit everythig but the last few steps that actually copy the program and set up my docker entrypoint.

Sweet :)

Complete build spec leveraging this looks like this:

version: 0.2
# You need to have these global environment variables set
# REPO_URL = link to private ECR
# IMAGE_NAME = name of image to build
# IMAGE_TAG = tag to build by default (latest)
phases:
  pre_build:
    commands:
      - REPO=$REPO_URL/$IMAGE_NAME
      - REPO_TAG=$REPO:$IMAGE_TAG
      - REPO_GIT=$REPO:$(git log -1 --format=%h)
      - echo Building $REPO
      - echo Region set to $REGION
      - echo Logging in to Amazon ECR...
      - aws --version
      - docker --version
      - aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPO_URL
      - docker pull $REPO_TAG || true
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build --cache-from $REPO_TAG --tag $REPO_TAG --tag $REPO_GIT .
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $REPO_TAG
      - docker push $REPO_GIT

Note: Use docker pull ... || true so the build continues if there is no image present.

Note 2: As of now, docker on CodeBuild is version 19.x and does not support docker push -a, so I have to push both tags separately.