Rolling, Concurrent, Repeatable Deploys

Apr 5, 2020

Deploying your services should be rolling, concurrent, and repeatable. Let's break down why these are desirable properties:

Rolling deploys

up is a little tool that helps achieve this goals. It lets you specify shell commands to run sequentially and will apply them across a group of servers. Using it requires 2 things, an Upfile and an inventory.json file. Your Upfile describes the commands to run, and its syntax is deliberately familiar to Makefiles.

deploy_myapp
	rsync -chazP myapp myapp.service $remote:myapp
	ssh $remote 'sudo mv myapp.service /etc/systemd/system/'
	ssh $remote 'sudo chmod 440 /etc/systemd/system/myapp.service'
	ssh $remote 'sudo systemctl enable myapp.service'
	ssh $remote 'sudo systemctl daemon-reload'
	ssh $remote 'sudo systemctl restart myapp'
	ssh $remote 'curl -s --max-time 1 http://localhost:80/health'
	sleep 5 && $check_health

remote
	$UP_USER@$server

Your inventory.json file maps your service's IP addresses to specific boxes, like so. Notice that you can host many services on the same box. Think of these as "tags" moreso than services, so you can also group them with helpful descriptors, like "debian," in case you wanted to do rolling updates.

{
	"1.1.1.1": ["myapp", "debian"],
	"1.1.1.2": ["myapp", "debian"]
}

We don't want to rebuild our app for each machine, so we'll use a little shell wrapper to build once, deploy to each machine, then clean up after we're done.

#!/usr/bin/env bash
set -efu

name=myapp
tmpdir=/tmp/$name

echo "deploying ${name} at $(date)"
mkdir -p $tmpdir

echo "compiling"
GOOS=linux GOARCH=amd64 go build -o $tmpdir/$name ./cmd/$name

echo "deploying"
up -c deploy_${name} -t ${name}

echo "cleaning up"
rm -r $tmpdir

echo "completed ${name} deploy at $(date)"

That's it! Just execute our shell script with `./deploy.sh` and it'll compile our service, deploy sequentially to each of our servers, then clean up build artifacts.

Concurrent deploys

As you scale, you may have enough duplicate servers that it makes sense to roll several out at the same time. up makes this easy as well. We just modify our up line above to add "-n 2" where 2 is the number of boxes to deploy at once:

up -c deploy_${name} -t ${name} -n 2

Bulk server management

Needing to update dozens of servers at once can be a real pain, but it doesn't need to be. Let's extend the Upfile we wrote above to add update commands for Debian and OpenBSD:

update_debian
	ssh $remote 'sudo apt update'
	ssh $remote 'sudo DEBIAN_FRONTEND=noninteractive apt -y upgrade'

update_openbsd
	ssh $remote 'doas syspatch'
	ssh $remote 'doas pkg_add -u'

Jumpboxes

You may not manage your boxes through ssh directly and instead configure them through a jumpbox or a bastion host. Adding support is simple. Let's modify our Upfile once more to add an ssh variable:

ssh
	ssh -J user@1.1.1.3

deploy_myapp
	$ssh $remote '...'

Dependency graphs

Some services may depend on others to boot. For instance, all of Thankful's servers depend on a log server, then most of them depend on our reverse proxy which acts as a kind of internal service directory.

Luckily we already have the perfect tool to manage dependency graphs: make! Let's write a little Makefile that encodes these dependencies, then execute our deploys in parallel:

deploy_logger:
	./deploy/logger.sh
.PHONY: deploy_logger

deploy_myapp: deploy_logger
	./deploy/myapp.sh
.PHONY: deploy_myapp

deploy: deploy_logger deploy_myapp
.PHONY: deploy

Then it's as easy as: make -j4 deploy and we bring up all of our services in parallel while respecting their dependency graphs.

So with a 1.2k LOC, dead-simple tool (up), we can deploy updates in seconds, easily besting Heroku and Google App Engine's deployment times with a tool that anyone can fully understand and debug.

Neat!

Parts:

  1. Modern Infra and DevOps
  2. Concurrent, Rolling, and Repeatable Deploys
  3. Self-healing infrastructure and service mesh with TLS
  4. Infrastructure stored in git, managed on the command line
  5. Binpacking to minimize costs
  6. Better secret management
  7. Log aggregation, broadcasting, and observability
  8. Safe and repeatable database migrations at scale

This will be updated as more parts are released.