Part 2: Lambda via Docker - Running AllenNLP on AWS Lambda

March 1, 2021

Before it was possible to deploy an AWS Lambda function over 250MB in package size and requiring over 3GB of memory, I've run memory hungry NLP pipelines in EC2 instances, scaling up and down depending on the number of documents to process and paying too much to keep servers, even one, on standby. Now that it's possible to deploy container-images of up to 10GB image size and 10GB in memory, I can process a set of documents for as little as a few cents and not worry about EC2 instances sitting idle.

Assuming there are other people who might want to do the same, I've written a two-part post:

on how to deploy an AWS Lambda function using a Docker image; and
more specifically, how to deploy allennlp as a Docker image-based Lambda function.

This post is the more specific one in the series about running AllenNLP within a Lambda function. Read here for the more general post about using a Docker image to package and deploy a Lambda function.

Problems running AllenNLP in an AWS Lambda function

AllenNLP needs a few things to run, and some of these things cause issues in a Lambda function:

Not insignificant amounts of memory, depending on your model
Ability to write to the Lambda filesystem when saving/extracting pretrained models
Not insignificant amounts of storage to save and extract retrieved model files

Solutions to these problems are below.

1. Solution Increasing memory allocations

AWS Lambda runs by default with a max memory allocation of 128MB. 128MB isn't going to be enough for your AllenNLP model. You'll know this is the case if you see your function being killed prematurely with the log message: Error: Runtime exited with error: signal: killed Runtime.ExitError (assuming you've enabled logging). That message combined with the end of run report indicating that all available memory was used (eg, Memory Size: 2048 MB Max Memory Used: 2048 MB) is a pretty good sign you need to allocate more memory (or use less of it).

Because Lambda is serverless, increasing memory size is as simple as changing the configuration in the AWS Console or however you configure the function. For example, in Terraform, increasing the memory_size value in the aws_lambda_function resource and applying the changes will make the change effective.

2. Inability to write to read-only filesystem

AWS prevents functions from writing to any location other than /tmp during function invocation. That means if any of your code attempts to write to /usr/tmp or some other folder, then your code will raise an OSError, likely this: IOError: [Errno 30] Read-only file system. As AllenNLP will attempt to save and then extract model files not already on the filesystem (eg, HTTP or S3 URIs), it's very likely you'll see this error.

You have two options here. The first is to ensure any models are saved and extracted within the Lambda package. Assuming the Lambda is container-image based, then you could ensure models are extracted and available during the Docker build stage of deployment. This would avoid storing/extracting of models occuring during function invoke, avoiding time and cost of data transfer.

Your other option, is to ensure that models saved and extracted during invocation are saved and extracted to the /tmp folder. Lambda function containers are re-used and directory content (including /tmp content) is available during re-use. Despite this, Lambda function containers are ultimately ephemeral. This means that you'd be saving/extracting files whenever the container is not reused (which will likely be a lot more times than the Docker image is built, depending on how frequently you invoke your function).

Another issue is if you need more than 512MB of storage within /tmp, which is the maximum storage allowed by AWS. This is very likely given the size of uncompressed NLP model files.

To overcome both these issues - the storage size problem and the ephemeral nature of /tmp - we cann mount an Elastic File System. The EFS filesystem can handle >512MB and is permanent - storing/extracting a model once will make it available to all subsequent invokes. Also, it means avoiding having to download and extract models during Docker image building, which slows down deploys and bloats Docker image sizes.

Mounting an EFS filesystem

There are many tutorials available about setting up EFS. The important points for using it with your Lambda instance are as follows:

Ensure that the EFS filesystem and Lambda function are within/have access to the same VPC and subnet (you should add the EFS mount target to the subnet, and associate the security group associated with the same VPC as the specified subnet)
Connect the EFS filesystem to your lambda function within the AWS console for Lambda, or Terraform. The local mount path will be the path on which the EFS filesystem is mounted and accessible (eg, within the Lambda func, you'd be able to save to /mnt/elasticfs given the config below):

    resource "aws_lambda_function" "func" {
    image_uri     = "123.some.aws.uri:latest-tag"
    package_type  = "Image"
    function_name = "your-func-name"
    ...

    file_system_config {
        arn = aws_efs_access_point.access_point_for_lambda.arn

        # Local mount path inside the lambda function. Must start with '/mnt/'.
        local_mount_path = "/mnt/elasticfs"
  }

Pointing AllenNLP to the EFS filesystem

Once EFS is provisioned, you'll need to tell AllenNLP to do saving/extracting in EFS (and not /tmp or any other dir).

Interestingly, AllenNLP makes liberal use of tempfile to create ephemeral temporary files/dirs (a great built-in util btw), but that doesn't help you because the path to the files/dirs is to a read-only part of the filesystem and you're unable to set any argument about where that tempfile ought to live.

Thankfully, the authors of the package use an environment variable ALLENNLP_CACHE_ROOT to determine where model archives should be saved and extracted. It's not very well documented, but you can see its use here. You'll see that in the absence of the env var, CACHE_ROOT resolves to .allennlp.

So, with EFS setup, I set ALLENNLP_CACHE_ROOT to /mnt/elasticfs and AllenNLP now saves and extracts archives to /mnt/elasticfs instead of /usr/.allennlp. You can check this in the logs as AllenNLP does a good job logging directory information to the INFO level during model extraction.

If you use a model that requires PyTorch, and especially if PyTorch attempts to download models or other files, you'll also need to set the TORCH_HOME environment variable to /mnt/elasticfs. That variable specifies where models will be saved (see here). Otherwise you'll hit the same OSError when it tries to write outside of /tmp.

You should now be setup such that any model extraction takes place within the EFS filesystem and persists for subsequent Lambda function invocations, shortening the duration of requests and making it even cheaper for you to run documents through your NLP pipeline.

Kelvin