A crucial limitation of Container Image Support for AWS Lambda

At AWS re:Invent 2020, AWS Lambda released Container Image Support for Lambda functions. This new feature allows engineers to package and deploy Lambda functions as container images of up to 10 GB in size. It offers technical teams an exciting opportunity to unify development and deployment processes and maximise investments in container-centric pipelines and tooling.

So we decided to adopt it into our solution and remove s3 bucket access from our deployment pipelines. Happy days!

Until I saw below highlighted statement in Lambda documentation

Fig 1. an unexpected restriction

Considering IAM provides sophisticated and powerful access controls for AWS ECR, I cannot believe this limitation is real so I showed it to the team. We came to a conclusion that commercially it is understandable for AWS to only accept ECR as the image registry for Lambda; but it does not make much sense to limit it to ECR in the same AWS account with the Lambda Function.

At Adatree, we run multiple AWS accounts as a way of ensuring our customer’s CDR data is completely isolated and we believe this is the best way to run a SaaS in a highly regulated environment like CDR. We do not want to duplicate our ECR repositories into every single account to use this new feature. It has been a long-awaited feature for us but this limitation would be an unfortunate show stopper should it be true.

So I set off for a technical spike to prove

  • either the documentation is inaccurate (fingers crossed)
  • or this new feature is not ready for our use case yet

Step 1:

I picked a Java 11 Lambda function that migrates database schema with Liquibase without risk of deadlock and updated it to produce a Docker image to ECR in our management account rather than uploading a zip file to s3 bucket in the management account.

Below is the Dockerfile, much more concise than the equivalent SAM template yaml file. Local testing using RIE is smooth and pleasant. This feature looks really promising!

#Dockerfile

FROM public.ecr.aws/lambda/java:11

COPY build/libs/*-all.jar /var/task/lib/

CMD [ “au.com.adatree.liquibase.App::handleRequest” ]

Step 2:

Then I updated CloudFormation template to use image instead of zip for Lambda function and added permission policies for Lambda to access ECR. I am showing the diff here just to make it obvious how little changes are needed here.

Fig 2. Very little changes are needed to use a container image. Amazing!!
Fig 3. ECR permissions

Step 3:

My merge request triggered a CloudFormation stack update and it turned out to be a successful UPDATE, awesome! So the documentation was inaccurate, which makes sense! Then I clicked the Feedback button on the documentation page and provided a (now seems embarrassing) feedback out of goodwill.

Fig 4. A premature feedback (out of goodwill)

Step 4:

Then I made a coffee for myself and went back to do some more testing. It did not take long for me to realise that something was not right.

  • CloudWatch logs showed the deployed lambda is running stale code
  • Lambda console page displays the Lambda Function as a Zip package instead of Image
  • there was an error in the CFN events even though the cloudformation deployment was “successful”
Fig 5. it takes some effort to see the error

I do expect one or two “hiccups” from technical spikes, especially when adopting new features, so I made some changes to the Lambda function and deployed again.

This time I got an error saying “Please do not provide imageUrl when packageType is Zip”.

Fig 6. That is ugh… some progress, maybe…

I checked the deployed CloudFormation template, packageType is Image for sure. Time for a hypothesis — it is probably the SAM transformation that failed to transform from AWS::Serverless::Function to AWS::Lambda::Function. So I removed the transformation from CloudFormation and changed to use AWS::Lambda::Function directly and deployed again.

Fig 7. use AWS::Lambda::Function instead

Then what I feared was proved to be true — the deployment failed with a 400 error. Image repository must be in the same account as the caller So the documentation was correct and I was wrong.

Fig 8. when fear comes true

That marks the end of the technical spike and I have parked all my changes in a feature branch for another day.

Lessons learnt from this spike:

  1. It is a common practice to have multiple AWS accounts and one or two centralised ECR repositories in a dedicated management account. So it is a big surprise that Container Image Support for AWS Lambda is released with such a limitation. I know this will be a show stopper for many teams. We will keep an eye on this feature and jump on it when this changes.
  2. New features are often released as MVP so it can be risky to adopt them straight away. A fit-for-purpose technical spike is very important.
  3. Every layer of abstraction could obscure the root cause of a problem, a simplified solution not only makes troubleshooting easier but also more often than not is a better solution if the functionality is the same. Once this limitation is removed, I will remove SAM transformation from that CloudFormation template.

Hope this blog can save some time for fellow innovators that are keen to use this feature in a similar environment.

[Update on 27 Feb 2021]

I raised a customer support ticket with AWS team and they responded within 24 hours saying AWS has an existing internal feature request related to support cross account ECR Support for Lambda image but they are unable to provide an ETA when this feature will be implemented.

That is great news! Let’s hope we will see it LIVE soon.

Father of two giggly girls; a technical problem solver who focuses on both delivery and growth

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store