Converting an AWS Lambda to an ECS Task

I have a python script (removing duplicates from a Redshift database); that currently runs as a Lambda, triggered by EventBridge. Unfortunately, it has been known to take longer than 15 mins, the max timeout for a lambda function.

The alternative seems to be running it as an ECS Task (on Fargate). I was uploading the lambda function as a zip; but it’s pretty easy to convert that to a Docker image, and push to ECR:

FROM python:3

COPY requirements.txt  .
RUN  pip3 install -r requirements.txt

COPY app.py .

CMD [ "python", "-c", "import app; app.foo(None, None)" ]

I took the easy way out, and just set the CMD to the existing lambda entrypoint. You then need an ECS cluster (or you may be able to use the default cluster):

docker run --rm -it -v ~/.aws:/root/.aws -v $PWD:/data -w /data -e AWS_PROFILE amazon/aws-cli ecs create-cluster --cluster-name foo

And an IAM role to execute the task:

...  iam create-role --role-name FooExecution --assume-role-policy-document file://aws/ecs/TrustPolicy.json

With a trust policy for ECS to assume the role:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Action": "sts:AssumeRole",
        "Principal": {
            "Service": "ecs-tasks.amazonaws.com"
        },
        "Effect": "Allow"
    }]
}

And permissions to pull the image from ECR:

... iam put-role-policy --role-name FooExecution --policy-name ECR --policy-document file://aws/iam/ECR.json
{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "ecr:BatchCheckLayerAvailability",
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer"
        ],
        "Resource": [
            "arn:aws:ecr:$region:$account:repository/$repo"
        ]
    }, {
        "Effect": "Allow",
        "Action": [
            "ecr:GetAuthorizationToken"
        ],
        "Resource": [
            "*"
        ]
    }]
}

If you want logs in CloudWatch, you also need a policy allowing that:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": [
            "*"
        ]
    }]
}

That’s enough to run the task, but if you need to use any AWS API in the task (e.g. boto3), then you need another role:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "redshift:GetClusterCredentials"
        ],
        "Resource": [
            "arn:aws:redshift:$region:$account:dbuser:$cluster/$user",
            "arn:aws:redshift:$region:$account:dbname:$cluster/$db"
        ]
    }]
}

If you want logs, you need a log group:

... logs create-log-group --log-group-name /aws/fargate/foo
... logs put-retention-policy --log-group-name /aws/fargate/foo --retention-in-days 7

Phooooo… nearly there. Now you need a task definition:

... ecs register-task-definition --family foo --cpu 256 --memory 512 --network-mode awsvpc --requires-compatibilities FARGATE --execution-role-arn arn:aws:iam::$account:role/$executionRole --task-role-arn arn:aws:iam::$account:role/$taskRole --container-definitions "[{\"name\":\"foo\",\"image\":\"$image:latest\",\"logConfiguration\":{\"logDriver\":\"awslogs\",\"options\":{\"awslogs-region\":\"$region\",\"awslogs-group\":\"/aws/fargate/foo\",\"awslogs-stream-prefix\":\"foo\"}}}]"

And you should be in a position to test that the task can run, without the cron trigger:

... ecs run-task --launch-type FARGATE --cluster foo --task-definition foo:1 --network-configuration "awsvpcConfiguration={subnets=['foo',...],securityGroups=['sg-...'],assignPublicIp='ENABLED'}" --count 1

You need a public IP to pull the ECR image (unless you want to jump through some hoops).

If that went well, you can proceed to set up the cron trigger (EventBridge):

... events put-rule --name foo --schedule-expression 'cron(0 4 * * ? *)'

You need yet another role, to use the execution role:

...iam create-role --role-name FooEvents --assume-role-policy-document file://aws/events/TrustPolicy.json
...iam put-role-policy --role-name FooEvents --policy-name Ecs --policy-document file://aws/iam/Ecs.json

This time the trust policy needs to be for EventBridge:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Action": "sts:AssumeRole",
        "Principal": {
            "Service": "events.amazonaws.com"
        },
        "Effect": "Allow"
    }]
}

And the permissions for the target:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "iam:PassRole"
        ],
        "Resource": [
            "arn:aws:iam::$account:role/FooExecution",
            "arn:aws:iam::$account:role/FooTask"
        ]
    }, {
        "Effect": "Allow",
        "Action": [
            "ecs:RunTask"
        ],
        "Resource": "*",
        "Condition": {
            "ArnEquals": {
                "ecs:cluster": "arn:aws:ecs:$region:$account:cluster/foo"
            }
        }
    }]
}

And finally, you need a target for the rule:

...events put-targets --rule foo --targets file://aws/events/Targets.json
[{
    "Id": "1",
    "Arn": "arn:aws:ecs:$region:$account:cluster/foo",
    "RoleArn": "arn:aws:iam::$account:role/FooEvents",
    "EcsParameters": {
        "TaskDefinitionArn": "arn:aws:ecs:$region:$account:task-definition/foo",
        "LaunchType": "FARGATE",
        "NetworkConfiguration": {
            "awsvpcConfiguration": {
                "Subnets": ["subnet-***",...],
                "SecurityGroups": ["sg-***"],
                "AssignPublicIp": "ENABLED"
            }
        }
    }
}]

I included all 3 subnets from the default VPC, and the default SG.

That was a lot! But hopefully you now have a working cron job, that can take as long as it wants to complete.

Leave a comment