Cross-region pull for a lambda function

Having got x-acct pull working for my lambda function, in staging, I had foolishly assumed that running the same CDK script for prod would be easy (as that is the same account where the ECR lives).

Instead the build was failing, with a confusing message:

14:52:31  MyStack |  4/11 | 1:52:29 PM | CREATE_FAILED        | AWS::Lambda::Function                       | MyLambda (MyLambda...) Resource handler returned message: "Source image ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/REPO:latest is not valid. Provide a valid source image. (Service: Lambda, Status Code: 400, Request ID: ...) (SDK Attempt Count: 1)" (RequestToken: ..., HandlerErrorCode: InvalidRequest)

Obviously, the ECR uri is valid, or it wouldn’t be working in the other account. I assumed it was permissions related, but the permissions I had added for x-acct seemed to be a superset of the permissions necessary within the same account.

When I tried to create the lambda in the console, a slightly more useful error was returned. It seems that Lambda is unable to pull from ECR in another region (even though Fargate has no trouble). The easiest solution to this, is to enable replication.

You can do this in the cloudformation to create the repo:

Resources:
    RepositoryReplicationConfig:
        Type: AWS::ECR::ReplicationConfiguration
        Properties:
            ReplicationConfiguration: 
                Rules:
                    - Destinations:
                        - Region: ...
                          RegistryId: ... (account id)

Cross-account pull for a Lambda function

I have been trying to set up a Lambda function, using the CDK; that uses a docker image, from a different account (because reasons). It felt like I was stuck in a chicken & egg situation, where the IAM role to be used was created by the stack, which then failed (because it couldn’t pull the image) and rolled back; deleting the role.

I tried using a wildcard, for the principal:

                Statement:
                -
                    Sid: AllowCrossAccountPull
                    Effect: Allow
                    Principal:
                        AWS: "arn:aws:iam::$ACCOUNT_ID:role/$STACK-LambdaServiceRole*"

but that was rejected:

Resource handler returned message: "Invalid parameter at 'PolicyText' failed to satisfy constraint: 'Principal not found'

After some digging around in the CDK source code, I was able to create the role first; and set up the cross account permissions before creating the lambda. But I was still getting the same error:

Resource handler returned message: "Lambda does not have permission to access the ECR image. Check the ECR permissions. (Service: Lambda, Status Code: 403, ...

At this point, I did what I should have done originally, and actually read the docs. It turns out that like Fargate tasks use a separate role to start the task, and execute it, so does Lambda. But in this case, one role is played by the service itself.

After a bit more flopping around, I finally had something that worked 🥳

                Statement:
                -
                    Sid: CrossAccountPermission
                    Effect: Allow
                    Principal:
                        AWS: "arn:aws:iam::$ACCOUNT_ID:root"
                    Action:
                        - "ecr:BatchGetImage"
                        - "ecr:GetDownloadUrlForLayer"
                -
                    Sid: LambdaECRImageCrossAccountRetrievalPolicy
                    Effect: Allow
                    Principal:
                        Service: "lambda.amazonaws.com"
                    Action:
                        - "ecr:BatchGetImage"
                        - "ecr:GetDownloadUrlForLayer"

Triggering a cron lambda

Once you have a lambda ready to run, you need an EventBridge rule to trigger it:

docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli events put-rule --name foo --schedule-expression 'cron(0 4 * * ? *)'

You can either run it at a regular rate, or at a specific time.

And your lambda needs the right permissions:

aws-cli lambda add-permission --function-name foo --statement-id foo --action 'lambda:InvokeFunction' --principal events.amazonaws.com --source-arn arn:aws:events:region:account:rule/foo

Finally, you need a targets file:

[{
    "Id": "1",
    "Arn": "arn:aws:lambda:region:account:function:foo"
}]

to add to the rule:

aws-cli events put-targets --rule foo --targets file://targets.json

Cron lambda (Python)

For a simple task in Redshift, such as refreshing a materialized view, you can use a scheduled query; but sometimes you really want a proper scripting language, rather than SQL.

You can use a docker image as a lambda now, but I still find uploading a zip easier. And while it’s possible to set up the db creds as env vars, it’s better to use temp creds:

import boto3
import psycopg2

def handler(event, context):
    client = boto3.client('redshift')

    cluster_credentials = client.get_cluster_credentials(
        DbUser='user',
        DbName='db',
        ClusterIdentifier='cluster',
    )

    conn = psycopg2.connect(
        host="foo.bar.region.redshift.amazonaws.com",
        port="5439",
        dbname="db",
        user=cluster_credentials["DbUser"],
        password=cluster_credentials["DbPassword"],
    )

    with conn.cursor() as cursor:
        ...

Once you have the bundle ready:

pip install -r requirements.txt -t ./package
cd package && zip -r ../foo.zip . && cd ..
zip -g foo.zip app.py

You need a trust policy, to allow lambda to assume the role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Effect": "Allow",
            "Sid": ""
        }
    ]
}

And a policy for the redshift creds:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "GetClusterCredsStatement",
        "Effect": "Allow",
        "Action": [
            "redshift:GetClusterCredentials"
        ],
        "Resource": [
            "arn:aws:redshift:region:account:dbuser:cluster/db",
            "arn:aws:redshift:region:account:dbname:cluster/db"
        ]
    }]
}

In order to create an IAM role:

docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli iam create-role --role-name role --assume-role-policy-document file://trust-policy.json
docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli iam attach-role-policy --role-name role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli iam attach-role-policy --role-name remove-duplicates --policy-arn arn:aws:iam::aws:policy/service-role/AWSXRayDaemonWriteAccess
docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli iam put-role-policy --role-name role --policy-name GetClusterCredentials --policy-document file://get-cluster-credentials.json

And, finally, the lambda itself:

docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli lambda create-function --function-name foo --runtime python3.7 --zip-file fileb://foo.zip --handler app.handler --role arn:aws:iam::account:role/role --timeout 900

If you need to update the code, after:

docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli lambda update-function-code --function-name foo --zip-file fileb://foo.zip

You can test the lambda in the console. Next time, we’ll look at how to trigger it, using EventBridge.

No module named ‘psycopg2._psycopg’

I was trying to set up a python lambda, and fell at the first hurdle:

What made it confusing was that I had copied an existing lambda, that was working fine. I checked a few things that were different: the python version (3.7), no effect. Even the name of the module/function.

I was using psycopg2-binary, and the zip file structure looked right. Eventually, I found a SO answer suggesting it could be arch related, at which point I realised that I had pip installed using docker, rather than venv.

I have no idea why that mattered (uname showed the same arch from python:3.7 as my laptop), but onwards to the next problem! 🤷