Serverless code pipelines AWS: Terraform

Alternative title: “How I built Jenkins with a few lines of Terraform and some Christmas Cake”.

Almost a year ago I wrote up my comparison of Wasabi and AWS S3, and hinted at the end that I was going to use Wasabi as part of a personal system for archiving photos. Well, 2020 got fully up to speed around then, and my plans were thrown into disarray.

Ferris Bueller’s Day Off

Like many people working in technology, my work got far busier in 2020, and I looked enviously at anyone who suddenly had time to learn to crochet, bake croissants, or play guitar. I finally got some time around Christmas to return to the terraform project (which is almost finished), and am quite pleased with the result.

Increasingly I’m a big fan of a X-as-a-Service as a paradigm, when married to a security agreement like the AWS Shared Responsibility Model: by removing a server that I need to directly manage, a whole universe of security and maintenance overhead just goes away and I can focus on the solution, rather than on the constant gardening.

Since I knew that I would be using an AWS Lambda function — or functions — to provide the computation in my solution, I knew that I needed a code build pipeline. Well, I don’t really need a code build pipeline. I could just build the lambda on my desktop and move it around by hand, but this is 2021, and there’s no reason not to always try to do things the right way instead of the half-arsed way. So it was clear to me from the outset that I would want to wire together several AWS services for my solution:

  • CodeCommit
  • CodeBuild
  • CodePipeline

Anyone who has worked in IT for the last decade should be familiar with the Continuous Integration (CI) paradigm — commit and push the code, a build kicks off somewhere, and if all goes well a deployable artefact pops out the other side. The default choice is still Jenkins/Hudson (despite the existence of much better software such as CircleCIBamboo and GoCD), but I really do not want to commit to building a server that will see light use, and yet require constant maintenance and tuning. Life’s too short, and server maintenance should be someone else’s problem.

“The Somebody Else’s Problem field is much simpler [to invisibility] and more effective, and what’s more can be run for over a hundred years on a single torch battery. This is because it relies on people’s natural disposition not to see anything they don’t want to, weren’t expecting, or can’t explain.” (Douglas Adams: Life, The Universe and Everything)

So, for me the choice was obvious — CodeCommit as a code repository, CodeBuild to do the build, and CodePipeline to coordinate the process.

The documentation for all of these services — both from AWS and for the corresponding Terraform provider bits and pieces — is excellent and complete… but I quickly discovered that while the documentation for each individual service was outstanding, there are subtle gaps and pitfalls when wiring them together to a full solution.

As an aside, while all of this can be done via the AWS console, I have a niggling annoyance with how the console presents CodePipeline and other complex services. As you will see below, there are some subtleties around permissions and logging for these services that is not well explained. If you work through the console, very often IAM and CloudWatch resources are just magically (and silently) created on your behalf, rather than resources that are directly under your control.

I’m going to use this article to walk through the Terraform code that I needed to get this working, and then follow up in a future article on how the Lambda solution does it’s work. This will be pretty code-heavy from here on down, but unfortunately I won’t link you directly to a GitHub project for it. The solution is all part of a larger project that maintains a bunch of related personal services on my AWS account, and I’m loathe to publish that for privacy and security reasons.

Let’s start with CodeCommit. Since it’s introduction in 2015 as a simple drop in replacement for your existing Git server, it’s been polished into a very nice service offering all the expected features for managing merge requests and performing code reviews, and for doing code editing directly into the repository if desired. What it does not have is direct support for building the code on some event (such as a check in). That’s completely reasonable, because that’s handled by CodeBuild, by design.

From the point of view of checking code in and out, and generally working with it via git, you and your developers are in familiar territory — it’s just another Git repository:

% git remote -vorigin ssh:// (fetch)origin ssh:// (push)

The big difference is that in order to use a web interface to do reviews and handle merge requests, you and your developers will need to work through the AWS console (actually, you could rebuild all of the facilities using the AWS API, but that’s just crazy). If that’s a problem for you, well, to be entirely blunt you need to ask why your security posture is opting to make life harder for your developers. It’s 2021, the cloud is safe, just bite the bullet and create the IAM entities for your devs and give them console access.

Now, you can use HTTPS and authenticate on each git pull and push, but like I said, it’s 2021, so you should choose to do the right thing, and use SSH instead with SSH public/private keys for proof of identity. Here’s where the first pitfall arises: how does the developer get their public SSH key registered? Simple answer: via the IAM console, they choose IAM, then Users, then pick their own account, then finally use the Security credentials tab. On there they will see SSH keys for AWS CodeCommit, and can upload a public key:

IAM Security Credentials

Simple, but what’s not immediately obvious is that you need to grant them permission to be able to do this at all. The base level permission needed to be able to see anything at all in IAM is read permission, but then there are additional permissions needed. I manage this by using some pre-baked policies that are attached to a “readonly” group, and then assign users into and out of that group as required for terraform:

resource "aws_iam_group" "readonly" {
  name = "readonly"
  path = "/"
}resource "aws_iam_group_policy_attachment" "readonly" {
  for_each = toset([
  group      =
  policy_arn = each.value

That’s enough to allow members of the group to go into the IAM console, and update their own SSH key (yes, the pre-baked policy ensures they can only update their own details, but you are always free to create your own policies based on those provided).

Ok, I can get my public key in, and I’m ready to work against a repository. Time for the repository. I use a project name throughout to keep the names and tags on various resources consistent, but otherwise running up the repository is very simple:

locals {
  lambda_name = "photo-lambda"
}resource "aws_codecommit_repository" "photo_lambda" {
  repository_name = local.lambda_name
  description     = "Source for lambda for photo archive"
  tags            = merge({ "Name" = local.lambda_name }, var.tags)

Let’s move on to building the code from that archive, rather than building it on my desktop. First I am going to need an S3 bucket to store build artefacts in. Building buckets is easy, so you should feel comfortable having lots of buckets, rather than re-using buckets. It makes it much easier to provide fine-grained access for different purposes, and even for your own toy projects it’s a good idea to be rigorous about access control:

resource "aws_s3_bucket" "build" {
  bucket_prefix = "rahookbuild"
  acl           = "private"  versioning {
    enabled = true
  }  lifecycle {
    prevent_destroy = true
  }  tags = merge({ "Name" = "rahook-build" }, var.tags)
}resource "aws_s3_bucket_public_access_block" "build" {
  bucket =  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true

This gives us a locked down bucket that’s only accessible from inside the AWS account, and only accessible by users who have explicitly granted permission (I won’t cover user access to that bucket, that’s an exercise for the student).

You might also notice that I have a default set of tags (var.tags) that I extend with just the Name tag. It makes it easier for me to make sure that everything related to a particular project has consistent base tags for cost monitoring, among other things.

Next step is to define a build. There are two parts to this: the build specification that is defined in the source code, and configuration of the CodeBuild project. Coffee break before going on.

Refreshed? Good, let’s go.

There’s a lot of configuration in the CodeBuild project, most of which is obvious when you look at it in small pieces, but there’s some prerequisites. First, an IAM role that is adopted to allow the build to run. Note that this role needs a role policy that allows CodeBuild to assume the role…

# policy allowing CodeBuild to adopt the roledata "aws_iam_policy_document" "codebuild_assume" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = [""]
}# Role that CodeBuild will adoptresource "aws_iam_role" "codebuild" {
  name        = "${local.lambda_name}-codebuild"
  description =
      "role allowing codebuild access to build ${local.lambda_name}"
  assume_role_policy = 
      data.aws_iam_policy_document.codebuild_assume.jsonforce_detach_policies = true
  tags = merge({ "Name" = local.lambda_name }, var.tags)
}# Policy controlling the permissions the role hasdata "aws_iam_policy_document" "codebuild" {
  statement {
    sid     = "createlogs"
    actions = ["logs:CreateLogGroup",
    resources = [
  }  statement {
    sid       = "s3"
    resources = ["arn:aws:s3:::codepipeline-${var.aws_region}-*"]
    actions   = ["s3:PutObject",
  }  statement {
    sid       = "gitpull"
    resources = [aws_codecommit_repository.photo_lambda.arn]
    actions   = ["codecommit:GitPull"]
  }   statement {
    sid       = "push"
    resources = [,
    actions   = ["s3:PutObject",
  }  statement {
    sid = "read"
    resources = [
    actions = ["s3:GetObject",
}# Attaching the policy to the roleresource "aws_iam_role_policy" "codebuild" {
  name   = "${local.lambda_name}-codebuild"
  role   =
  policy = data.aws_iam_policy_document.codebuild.json

Whew, that’s a lot to digest. We need to give the CodeBuild job permission to:

  • write logs in CloudWatch;
  • use S3 for it’s internal purposes;
  • Pull the code from CodeCommit;
  • Push the built artefact into our build bucket;
  • Read the artefact from our build bucket.

Ah, yes. CloudWatch logs — better create those so we can specify the retention period. Three months of logs is probably overkill, but:

resource "aws_cloudwatch_log_group" "lambda" {
  name              = "/aws/codebuild/lambda"
  retention_in_days = 90  tags = merge({ "Name" = "lambda-codebuild" }, var.tags)

That should be all the prerequisites taken care of, now for the CodeBuild job. It looks confronting, but it makes sense — we have:

  • some general configuration
  • a definition of where the code is coming from;
  • a definition of where we put the generated artefact;
  • the type of ephemeral instance to build on;
  • instructions on where to put our logs.
resource "aws_codebuild_project" "photo_lambda" {
  name         = local.lambda_name
  description  =
     "project to build the ${local.lambda_name} lambda functions"
  service_role = aws_iam_role.codebuild.arn  build_timeout  = 15
  badge_enabled  = true
  source_version = "refs/heads/master"  source {
    git_clone_depth     = 1
    insecure_ssl        = false
    location            =
    report_build_status = false
    type                = "CODECOMMIT"    git_submodules_config {
      fetch_submodules = false
  }  artifacts {
    encryption_disabled    = false
    location               =
    name                   = local.lambda_name
    namespace_type         = "NONE"
    override_artifact_name = true
    packaging              = "ZIP"
    path                   = local.lambda_name
    type                   = "S3"
  }  environment {
    compute_type                = "BUILD_GENERAL1_SMALL"
    image                       =
    image_pull_credentials_type = "CODEBUILD"
    privileged_mode             = false
    type                        = "LINUX_CONTAINER"
  }  logs_config {
    cloudwatch_logs {
      status     = "ENABLED"
      group_name =
    }    s3_logs {
      encryption_disabled = false
      status              = "DISABLED"
  }  tags = merge({ "Name" = local.lambda_name }, var.tags)

As I said earlier, there are two parts to the build definition in Terraform. You can see that the CodeBuild resources defined in our Terraform script describe the build environment, but not how to actually build the artefact. That relies on their being a buildspec.yml file in the top level of our source code. This is a YAML file with a rich DSL — refer to the AWS documentation for the (considerable) detail on your options here, but it’s likely you’re going to need a simple definition like mine:

version: 0.2phases:
      golang: latest
      - echo Build started
      - go test
      - go build
      - echo Build completed
    - photo-lambda
  name: photo-lambda-$(date +%Y%m%d.%H%M%s).zip

You can see what the instructions are: install the latest Go, test the code, build the code, and finally bundle up the compiled code as a ZIP file with a timestamp.

This combination of the CodeCommit repository, CodeBuild definition, buildspec.yml and S3 bucket are enough for you to be able to login to the AWS console, go to the CodeBuild project, and hit the big orange “Start Build” button to initiate a build and see it eventually pop out in the S3 bucket.

Hooray for being able to do a serverless build, but there’s still a bit missing: we want the build to initiate automatically when we push code. This is where CodePipeline steps in, although it’s a capable of a lot more than just simply starting a build. So lets start in on that. Like almost any service on AWS, we first have to permit CodePipeline to operate on our behalf:

# policy to allow CodePipeline to adopt a roledata "aws_iam_policy_document" "pipeline_assume" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = [""]
}# the role for CodePipeline to adoptresource "aws_iam_role" "pipeline" {
  name               = "${local.lambda_name}-pipeline"
  description        =
     "role allowing pipeline access to relevant resources"
  assume_role_policy = 
     data.aws_iam_policy_document.pipeline_assume.json  force_detach_policies = true
  tags = merge({ "Name" = local.lambda_name }, var.tags)
}# The permissions CodePipeline has to act on our behalfresource "aws_iam_role_policy" "pipeline" {
  name   = "${local.lambda_name}-pipeline"
  role   =
  policy = data.aws_iam_policy_document.pipeline.json

The policy attached to this is, for me, currently pretty extensive and gives wider permissions than I’m comfortable with. There’s an inbuilt policy that AWS recommends, but it’s even wider in it’s scope, so this is a partially trimmed down one that I want to finesse:

data "aws_iam_policy_document" "pipeline" {
  statement {
    actions   = ["iam:PassRole"]
    resources = ["*"]    condition {
      test     = "StringEqualsIfExists"
      variable = "iam:PassedToService"
      values = [
  }  statement {
    actions = [
    resources = ["*"]
  }  statement {
    actions = [
    resources = ["*"]
  }  statement {
    actions   = ["codestar-connections:UseConnection"]
    resources = ["*"]
  }  statement {
    actions = [
    resources = ["*"]
  }  statement {
    actions   = ["lambda:InvokeFunction", "lambda:ListFunctions"]
    resources = ["*"]
  }  statement {
    actions = ["opsworks:CreateDeployment",
    resources = ["*"]
  }  statement {
    actions = ["cloudformation:CreateStack",
    resources = ["*"]
  }  statement {
    actions = ["codebuild:BatchGetBuilds",
    resources = ["*"]
  }  statement {
    actions = ["devicefarm:ListProjects",
    resources = ["*"]
  }  statement {
    actions = ["servicecatalog:ListProvisioningArtifacts",
    resources = ["*"]
  }  statement {
    actions = ["cloudformation:ValidateTemplate"]
    resources = ["*"]
  }  statement {
    actions = ["ecr:DescribeImages"]
    resources = ["*"]
  }  statement {
    actions = ["states:DescribeExecution",
    resources = ["*"]
  }  statement {
    actions = ["appconfig:StartDeployment",
    resources = ["*"]

At some state I will go back and rip out actions that are not needed for my pipeline, and be more specific about resources. Right. The pipeline itself. Again, there’s a fair bit of configuration, but it’s simpler if you look at it as consisting of some general setup stuff, followed by a set of steps (or stages) run one after another. You might notice that each step is chained together by specifying the input and output artifacts, rather than specifying a run order. As I said, CodePipeline can be configured to support very complex and sophisticated build operations, including running stages in parallel.

resource "aws_codepipeline" "photo_lambda" {
  name     = local.lambda_name
  role_arn = aws_iam_role.pipeline.arn
  tags     = merge({ "Name" = local.lambda_name }, var.tags)  artifact_store {
    location =
    type     = "S3"
  }  stage {
    name = "Source"    action {
      category = "Source"
      owner    = "AWS"
      name     = "Source"
      provider = "CodeCommit"
      version  = "1"
      configuration = {
        "BranchName"           = "master"
        "OutputArtifactFormat" = "CODEBUILD_CLONE_REF"
        "PollForSourceChanges" = "false"
        "RepositoryName"       = 
      input_artifacts  = []
      output_artifacts = ["SourceArtifact"]
      namespace        = "SourceVariables"
      region           = var.aws_region
      run_order        = 1
  }  stage {
    name = "Build"    action {
      category = "Build"
      owner    = "AWS"
      name     = "Build"
      provider = "CodeBuild"
      version  = "1"
      configuration = {
        "ProjectName" =
      input_artifacts  = ["SourceArtifact"]
      output_artifacts = ["BuildArtifact"]
      namespace        = "BuildVariables"
      region           = var.aws_region
      run_order        = 1
  }  stage {
    name = "Deploy"    action {
      category = "Deploy"
      owner    = "AWS"
      name     = "Deploy_to_S3"
      provider = "S3"
      version  = "1"
      configuration = {
        "BucketName" =
        "Extract"    = "false"
        "ObjectKey"  =
      input_artifacts  = ["BuildArtifact", ]
      output_artifacts = []
      region           = var.aws_region
      run_order        = 1

Hang on, doesn’t this just do the same as the CodeBuild project? Get the source, build it, and shove it into the bucket? You’re right, and well spotted. This is one of the things not entirely made clear in the documentation, that CodePipeline makes use of things like the runtime environment configuration from the CodeBuild project, but otherwise does things it’s own way somewhat disconnected from what CodeBuild is defined to do. In particular, it makes use of the S3 bucket to temporarily store code and artifacts before moving them around.

You can use CodeBuild without CodePipeline, but it’s a pain to use CodePipeline without CodeBuild.

You might also have spotted that there’s a PollForSourceChanges property in the get-the-source configuration. This is mainly intended for cases where your source is not held in CodeCommit, and is not the recommended way of kicking off a build on commit. Instead, we want to use a CloudWatch Event, so let’s do that.

First, an event rule that watches for updates in the master branch (and yes, I do intend to rename that to main):

resource "aws_cloudwatch_event_rule" "lambda" {
  name        = "${local.lambda_name}-build"
  description = "Rule to start pipeline on commit into ${local.lambda_name} CodeCommit"
  is_enabled  = true  event_pattern = jsonencode(
      detail = {
        event         = ["referenceCreated", "referenceUpdated"]
        referenceName = ["master"]
        referenceType = ["branch"]
      detail-type = ["CodeCommit Repository State Change"]
      resources   = [aws_codecommit_repository.photo_lambda.arn]
      source      = ["aws.codecommit"]
  )  tags = merge({ "Name" = local.lambda_name }, var.tags)

And an instruction to send the notification to CodePipeline:

resource "aws_cloudwatch_event_target" "lambda" {
  target_id = local.lambda_name
  rule      =
  arn       = aws_codepipeline.photo_lambda.arn  role_arn = aws_iam_role.event.arn

with the inevitable permissions we need to grant to CloudWatch to be able to perform that action on our behalf (this, by the way, is the least well covered pitfall in the documentation!):

# policy that allows CloudWatch to adopt the roledata "aws_iam_policy_document" "event_assume" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = [""]
}# Role for CloudWatch to adoptresource "aws_iam_role" "event" {
  name = "${local.lambda_name}-pipeline-trigger"
  description  = "this allows EventBridge to trigger the pipeline when it notices a commit"
  assume_role_policy    =
  force_detach_policies = true
  tags = merge({ "Name" = "${local.lambda_name}-pipeline-trigger" }, var.tags)
}# Permissions for CloudWatch to start the pipeline on our behalfdata "aws_iam_policy_document" "event" {
  statement {
    sid       = "start"
    resources = [aws_codepipeline.photo_lambda.arn]
    actions   = ["codepipeline:StartPipelineExecution"]
}# And attach those permissions to the adopted roleresource "aws_iam_role_policy" "event" {
  name   = "${local.lambda_name}-pipeline-trigger"
  role   =
  policy = data.aws_iam_policy_document.event.json

This is a common pattern in non-trivial AWS event-driven services — create a role and attach a policy that grants the service limited permissions to act on our behalf. It’s also one of the main things that is silently and obscurely done on your behalf if you set up these things via the AWS console. One of the benefits of using infrastructure as code, such as via Terraform, is that you gain much greater visibility of all the subtle configuration bits and pieces.

So there you have it. A lot of Terraform code with a bunch of subtlety that achieves one thing for me: I can make changes to my Lambda code, commit it to master, and then magically a minute or so later, there’s a new deployable waiting for me in my S3 bucket.

All without any server being managed, no extra software installed or configured, and a vanishingly small cost involved — I would have to be doing hundreds of builds a week before the cost of all this even added up to a cup of coffee.

Yes, the TCO is theoretically more than that, as it took quite a few hours to initially create this, however iterating on it to add more pipelines (which I will probably do via a Terraform module) is going to take very little time, and I get a solid, enterprise-ready solution for my own simple projects. That’s worth my time.

And the cake was not a lie.

To find out more about software engineering, click here.

Similiar Articles


Sign up today for monthly newsletters containing:

  • News and insights from your industry
  • Relevant thought leadership articles
  • Engaging video content
  • Notifications of our upcoming events
  • Networking opportunities with C-Suite leaders