Add newlines to Firehose transform lambda

If you want to use a lambda to transform data being sent to a Firehose, the recommended “blueprint” looks something like this:

def lambda_handler(event, context):
    output = []

    for record in event['records']:
        payload = base64.b64decode(record['data']).decode('utf-8')

        # Do custom processing on the payload here

        output_record = {
            'recordId': record['recordId'],
            'result': 'Ok',
            'data': base64.b64encode(payload.encode('utf-8')).decode('utf-8')
        }
        output.append(output_record)

    return {'records': output}

which works perfectly, unless you want to later query the records stored in S3 using Athena. Then you discover that because the serialized json is not newline separated, you’re up shit creek.

The secret is to add the newline, after processing:

r = {
        'recordId': record['recordId'],
        'result': 'Ok',
        'data': base64.b64encode(json.dumps(data).encode() + b'\n').decode("utf-8")
    }

Using docker, instead of virtualenv

If you fancy a change, you just need a Dockerfile:

FROM python:3.7

RUN pip install pytest
COPY requirements.txt  .
RUN  pip3 install -r requirements.txt

and a pytest.ini, in the root (don’t ask):

[pytest]
pythonpath = .

Then you can build the image:

docker build -t foo .

And run the tests:

docker run -it --rm -v $PWD:/app -w /app foo pytest tests/

Or run the app locally, e.g. a lambda func:

docker run -it --rm -v $PWD:/app -w /app -e PGHOST=... -e PGUSER=... -e PGPASSWORD=... foo python -c 'import app; app.bar(None, None)'

Is it better? Probably not, you’re just swapping one set of problems for a different set 🤷