Litiverse

Litiverse

Part 1: Lambda via Docker - Deploying an AWS Lambda function as a container image

Until very recently, I tried to stay as far away from AWS lambda as possible, especially when the function required anything other than built-ins. This was because of the cumbersome need to upload a compressed bundle of the function code with any dependencies. With the new availability to package and deploy lambda functions as container images, I've become a recent convert. Why? Because running a memory hungry NLP pipeline (up to 8GB depending on the size of the document, and models used) on a set of documents can cost as little as a few cents.

Assuming there are other people who might want to do the same, I'm writing a two-part post:

  1. on how to deploy an AWS Lambda function using a Python-based Docker image; and
  2. more specifically, how to deploy AllenNLP as a Docker image-based Lambda function.

This post is the more general one in the series about deploying a container image based Lambda function. Click here for the more specific post about running AllenNLP on Lambda.

Steps

  1. Setting up the Lambda function repo/directory
  2. Building the Docker image
  3. Creating the Lambda function
  4. Automatically updating the function

1. Setting up the Lambda function repo/directory

The bare minimum you'll need from your Python code is nn importable function that accepts the Lambda event and context arguments. The file structure I use, to allow for testing and serving multiple lambda functions from the same image, is as follows:

your_repo/
    ...
    your_module_name/
        __init__.py
        main.py  # contains the handler functions
        tests/
            __init__.py
            test_file.py
    .dockerignore
    Dockerfile
    handlers.py  # imports handler functions
    terraform_setup/
        ...  # terraform files

The lambda_handler function in main.py is pretty simple, and looks like this:

# your_module_name/main.py

from typing import Dict

def make_migrations_handler(event: Dict, context: Dict):
    # some code
    ...

The file handlers.py is really just a way to reference the handler function from the Lambda task root folder when the files are copied into the folder from which Lambda functions are run (the LAMBDA_TASK_ROOT, which resolves to /var/task).

# handlers.py
from your_module.main import lambda_handler

You'll see when I configured the Lambda function within AWS, I referred to the handler function using handlers.lambda_handler, which points to the handlers.py file that imports the handler function lambda_handler.

2. Building the Docker image

Most commonly, the Lambda function configuration will refer to a Docker image contained within an Elastic Container Registry (ECR) repository. So as part of the deploy process, you'll need to build a Docker image, upload it to ECR and update the lambda function (assuming you've already created it).

The Dockerfile can be very simple.

# Dockerfile

FROM public.ecr.aws/lambda/python:3.8

COPY .   .

# If your repo only serves a single Lambda function,
# then you can uncomment the next line, and omit the
# `image_config` value in the Terraform/Lambda function
# config.
# I leave the `CMD` out because different functions point
# to different handlers.
# CMD ["handlers.lambda_handler"]

One wrinkle I ran into when first building Docker images for use in Lambda functions is that the root of the function will be /var/task because that's where the Lambda executor will look for the Lambda function files. This will be the case unless you modify the ENTRYPOINT value. In my example, the handlers.py folder must be in /var/task as well as the module from which handlers.py imports the Lambda handler function.

You'll then need to build the image and upload it to your ECR repo. There are many ways of doing this - I use the AWS CLI and Docker CLI after creating the repo (either through Terraform or the AWS console):

aws configure
aws ecr get-login-password --region YOUR_REGION --no-include-email
docker login -u AWS -p some-token-from-step-above ECR-REPO-FROM-STEP-ABOVE
docker build -t ECR-REPO-URI:IMAGE-TAG
docker push ECR-REPO-URI --all-tags

Your image should now be in ECR and ready to be referenced in an AWS Lambda function.

3. Creating the Lambda function

Because creating the Lambda function via Terraform is relatively simple, I'll only highlight the Docker image part, which is slightly more involved. I've omitted a lot of the other resources because there's nothing peculiar to the Image package type - you can see the other required resources here. The important parts of the aws_lambda_function resource config are the image_uri, package_type and image_config fields:

# /terraform_setup/lambda.tf

resource "aws_lambda_function" "main" {
  image_uri     = "${aws_ecr_repository.main_repo.repository_url}:latest"  # repo and tag
  package_type  = "Image"
  function_name = "your-function-name"
  role          = aws_iam_role.lambdas_iam_role.arn

  image_config {
    # use this to point to different handlers within
    # the same image, or omit `image_config` entirely
    # if only serving a single Lambda function
    command = ["handlers.lambda_handler"]
  }

  depends_on = [
    aws_cloudwatch_log_group.log-group,
    aws_iam_role.lambdas_iam_role,
    aws_iam_role_policy_attachment.attach_execution_policy
  ]

  environment {
    variables = {
      FOO = "BAR
    }
  }
}

Your function is now ready to be used.

4. Updating the Lambda function

Updating the function is done by building a new Docker image, uploading the new version to ECR and updating the Lambda function to point to the newly uploaded image.

All of this can be done in the AWS Console or the AWS and Docker CLI commands described in Step 2 above. Alternatively, you can setup a continuous delivery (CD) pipeline using Github Actions. That way, any time changes are committed to the main branch of your repo, the Github Actions runner will handle the deployment automatically.

To do this, you'll need to enable Github Actions builds within your repo and commit a workflow file in the .github/workflows directory of your repo (eg, .github/workflows/deploy.yml). The following file deploys on pushes to the main branch (eg, after a PR is merged):

name: Deploy
on:
  push:
    branches:
      - main
jobs:
  build_and_push:
    runs-on: ubuntu-latest
    if: contains(github.ref, 'main')
    outputs:
      image_tag: ${{ steps.build_image_step.outputs.image_tag }}

    steps:
      - name: Check out repo
        uses: actions/checkout@v2
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ secrets.AWS_DEFAULT_REGION }}
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1

      - name: Build, tag, and push image to Amazon ECR
        id: build_image_step
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          ECR_REPOSITORY: your-repo-name
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build --build-arg -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REGISTRY/$ECR_REPOSITORY:latest .
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
          echo "::set-output name=image_tag::$IMAGE_TAG"
  deploy:
    runs-on: ubuntu-latest
    if: contains(github.ref, 'main')
    needs: build_and_push

    steps:
      - uses: actions/checkout@v2
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ secrets.AWS_DEFAULT_REGION }}
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1
      - name: Update Lambda and Publish
        run: |
          aws lambda update-function-code --function-name your-func-name --image-uri ${{ steps.login-ecr.outputs.registry }}/your-repo-name:${{ needs.build_and_push.outputs.image_tag }} --publish

The references to secrets are secrets stored in my Github account, and helps me avoid stating these values in my repo.

If all goes well, Github Actions will build and upload a Docker image on pushes to your specified branch(es) and update the Lambda function to point to that image.

Hopefully this worked for you. If I've missed anything, let me know.

Kelvin