We have a PR build that uploads the generated (html) output to a public s3 bucket, so you can check the results before merging. This is useful, but the output has grown over time, and is now ~6GB; so the job takes a long time to run, and uploads a lot of unnecessary files.
I recently switched the trunk build to use sync from the AWS CLI (rather than s3cmd
), which was noticeably faster; so I thought I’d try using the --dry-run
feature, to generate a diff against the production bucket.
docker run --rm -v $PWD:/app -w /app -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY amazon/aws-cli s3 sync output/ s3://foo-prod --dryrun --size-only
Unfortunately, there’s no machine readable output options for that command, so we need to get our awk
on. My first attempt was to generate a cp command, for each line:
docker run ... | awk '{sub(/output\//, ""); sub(/&/, "\\\\&"); print "docker run --rm -v $PWD:/app -w /app -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY amazon/aws-cli s3 cp output/"$3" s3://foo-pr/"ENVIRON["GIT_COMMIT"]"/"$3}'
Once you’re satisfied the incantation looks correct, you can pipe the whole lot to bash
:
docker run ... | awk ... | bash
With this working locally, it seemed simple to just run that command as a pipeline step. It was not. Trying to escape the combination of quotes in groovy proved fruitless, and in the end I just threw in a bash script, and called that from the Jenkinsfile.
While this solved one problem:

mkdir changed
docker run --rm -v $PWD:/app -w /app -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY amazon/aws-cli s3 sync output/ s3://foo-prod --dryrun --size-only | awk '{sub(/&/, "\\\\&"); print "cp "$3" changed/"}' | bash
docker run --rm -v $PWD:/app -w /app -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY amazon/aws-cli s3 sync changed/ s3://foo-pr/$GIT_COMMIT --size-only
Finally, a build that is both quick(er), and uploads only the changed files!