Skip to content
Cloudkrunch
Linkedin

Securing ECS deployments with Github Actions

Github Actions, AWS, ECS, Terraform, Testing, Infrastructure9 min read

Pipelines

Pipelines

One of the largest responsibilities of a DevOps engineer is making great CICD pipelines. I've worked on many in my career; some that deployed services to Kubernetes clusters. A few that published secure base images for internal teams to use for deploying their microservices to cloud environments. I even use a pipeline to publish this blog and you can see where I wrote about that in cloudkrunch.com/the-cloudkrunch-blog and cloudkrunch.com/cicd-improvements.

So to show off all my experience with creating pipelines I decided to set up a project that highlighted just that and also some of my security background.

Project Overview

This project consists of two parts:

  1. ECS service that is served traffic by an ALB
VPC Diagram
  1. Github actions pipeline that deploys changes to ECS and runs secrets scans, unit tests, integration tests, and SAST/DAST scans
CICD Diagram

Creating the AWS infrastructure

All of the infrastructure components in this project are created in Terraform and I specifically use Terragrunt to manage my Terraform state and providers. Here's a list of the components.

VPC with public and private subnets

module "vpc" {
source = "cloudposse/vpc/aws"
namespace = "eg"
stage = "dev"
name = "projects"
ipv4_primary_cidr_block = "10.0.0.0/16"
assign_generated_ipv6_cidr_block = false
}
module "dynamic_subnets" {
source = "cloudposse/dynamic-subnets/aws"
namespace = "eg"
stage = "dev"
name = "services"
availability_zones = ["us-west-2a","us-west-2b","us-west-2c"]
vpc_id = module.vpc.vpc_id
igw_id = [module.vpc.igw_id]
ipv4_cidr_block = ["10.0.0.0/16"]
nat_gateway_enabled = true
}

ECR with on-push scanning enabled and lifecycle policy

locals {
repo_name = "sample-service"
}
resource "aws_ecr_repository" "sample_service_repo" {
name = local.repo_name
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "AES256"
}
}
resource "aws_ecr_lifecycle_policy" "lifecycle" {
repository = aws_ecr_repository.sample_service_repo.name
policy = <<EOF
{
"rules": [
{
"rulePriority": 1,
"description": "Keep last 5 images",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 5
},
"action": {
"type": "expire"
}
}
]
}
EOF
}
ECR Repo

ECS cluster with cloudwatch logs encrypted with Customer Managed Key

resource "aws_kms_key" "cluster_logs" {
description = "cluster logs KMS key"
deletion_window_in_days = 7
enable_key_rotation = true
}
resource "aws_cloudwatch_log_group" "cluster_logs" {
name = "cluster-logs"
}
resource "aws_ecs_cluster" "general" {
name = "general-cluster"
configuration {
execute_command_configuration {
kms_key_id = aws_kms_key.cluster_logs.arn
logging = "OVERRIDE"
log_configuration {
cloud_watch_encryption_enabled = true
cloud_watch_log_group_name = aws_cloudwatch_log_group.cluster_logs.name
}
}
}
}

ECS service IAM roles

resource "aws_iam_role" "ecs_task_execution_role" {
name = "sample_service_execution_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_role" "ecs_task_role" {
name = "ecs_sample_service_task_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "ecs-task-execution-role-policy-attachment" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

ECS task definition

resource "aws_ecs_task_definition" "definition" {
family = "sample_service"
task_role_arn = "${aws_iam_role.ecs_task_role.arn}"
execution_role_arn = "${aws_iam_role.ecs_task_execution_role.arn}"
network_mode = "awsvpc"
cpu = "256"
memory = "512"
requires_compatibilities = ["FARGATE"]
runtime_platform {
operating_system_family = "LINUX"
cpu_architecture = "X86_64"
}
container_definitions = jsonencode([
{
image = "${local.sample_service.image_url}",
name = "${local.sample_service.name}",
logConfiguration = {
"logDriver": "awslogs",
"options": {
"awslogs-region" : "us-west-2",
"awslogs-group" : "ecs-service-logs",
"awslogs-stream-prefix" : "${local.sample_service.name}"
}
}
portMappings = [
{
containerPort = 8080
hostPort = 8080
}
]
essential = true
}
])
}

ECS service

resource "aws_ecs_service" "sample_service" {
name = local.sample_service.name
cluster = var.cluster_arn
task_definition = aws_ecs_task_definition.definition.arn
desired_count = 1
depends_on = [aws_iam_role.ecs_task_role]
launch_type = "FARGATE"
load_balancer {
target_group_arn = module.alb.default_target_group_arn
container_name = local.sample_service.name
container_port = 8080
}
network_configuration {
subnets = var.dynamic_subnets.private_subnet_ids
security_groups = [ aws_security_group.sample_service.id ]
assign_public_ip = false
}
}

What it looks like in the cloud console.

Cloud Console ECS

Fine-grain policy for Github Actions deployment

There's a lot going on here, but I tried to be as specific as I can be to allow for least priviledge.

data "aws_iam_policy_document" "ecs_ecr_sample_service_doc" {
# Be able to get a token for ECR
statement {
actions = [
"ecr:GetAuthorizationToken",
"ecs:DescribeTaskDefinition",
"ecs:RegisterTaskDefinition",
]
resources = [ "*" ]
}
# ECR repository related tasks
statement {
actions = [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:ListTagsForResource",
"ecr:DescribeImageScanFindings",
"ecr:CompleteLayerUpload",
"ecr:UploadLayerPart",
"ecr:InitiateLayerUpload",
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage"
]
resources = [
var.ecr_sample_service_arn,
]
}
statement {
actions = [
"ecs:DescribeServices",
"ecs:CreateTaskSet",
"ecs:UpdateServicePrimaryTaskSet",
"ecs:DeleteTaskSet",
"elasticloadbalancing:DescribeTargetGroups",
]
resources = [
"*"
]
}
statement {
actions = [
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:ModifyListener",
"elasticloadbalancing:DescribeRules",
"elasticloadbalancing:ModifyRule",
]
resources = [
"arn:aws:elasticloadbalancing:us-west-2:${data.aws_caller_identity.current.account_id}:listener-rule/app/${module.alb.alb_name}/*"
]
}
statement {
actions = [ "ecs:UpdateService" ]
resources = [ aws_ecs_service.sample_service.id ]
}
statement {
actions = [
"elasticloadbalancing:DescribeTargetGroups",
]
resources = [
"arn:aws:elasticloadbalancing:us-west-2:${data.aws_caller_identity.current.account_id}:targetgroup/${module.alb.alb_name}*"
]
}
statement {
actions = [ "iam:PassRole" ]
resources = [
aws_iam_role.ecs_task_execution_role.arn,
aws_iam_role.ecs_task_role.arn
]
}
}

Once all that is stood up we just need to add a credential for a deployment user as a secret in our Github repository.

Setup the pipeline

I ended up using the Github Action provided by each tool to construct my pipeline.

Here are all of the tools I used to set it up and the jobs in my deploy.yml workflow:

Trufflehog - Secrets scan

secrets_scan:
name: Trufflehog secrets scan
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: TruffleHog OSS
id: trufflehog
uses: trufflesecurity/trufflehog@v3.63.2
continue-on-error: true
with:
path: ./
base: "${{ github.event.repository.default_branch }}"
head: HEAD
extra_args: --debug --only-verified
- name: Scan Results Status
if: steps.trufflehog.outcome == 'failure'
run: exit 1 # Exit if secrets are found

Built-in Go tests

unit_tests:
name: Go Unit Tests
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: '1.20.x'
- name: Install dependencies
run: go mod download
- name: Run Unit Tests
run: go test ./...

Semgrep - SAST scan

semgrep:
name: Semgrep SAST scan
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- name: clone application source code
uses: actions/checkout@v3
- name: run scan
run: |
semgrep \
--sarif --output report.sarif \
--metrics=off \
--config="p/default"
- name: save report as pipeline artifact
uses: actions/upload-artifact@v3
with:
name: report.sarif
path: report.sarif
- name: Download report
uses: actions/download-artifact@v3
with:
name: report.sarif

Build Docker image and deploy to ECS dev

After the unit tests, secrets scan, and SAST have run, we run the dev deployment. This step also inspects the scan results from ECR and checks there are no critical or high vulnerabilities in our container.

deploy_dev:
name: Deploy Development
runs-on: ubuntu-latest
environment: dev
needs:
- secrets_scan
- semgrep
- unit_tests
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.DEV_AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.DEV_AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image to Amazon ECR
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
# Build a docker container and
# push it to ECR so that it can
# be deployed to ECS.
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REGISTRY/$ECR_REPOSITORY:latest .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
- name: Scan Docker image
id: docker-scan
uses: alexjurkiewicz/ecr-scan-image@v1.7.1
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
with:
repository: ${{ env.ECR_REPOSITORY }}
tag: ${{ env.IMAGE_TAG }}
fail_threshold: high
- name: Download task def
run: |
aws ecs describe-task-definition --region ${{ env.AWS_REGION }} --task-definition ${{ env.ECS_TASK_DEFINITION }} --query taskDefinition > dev-task-definition.json
- name: Fill in the new image ID in the Amazon ECS task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: dev-task-definition.json
container-name: ${{ env.CONTAINER_NAME }}
image: ${{ steps.build-image.outputs.image }}

After deployment to dev is successful, we concurrently run DAST and integration tests against the dev service.

ZAP - DAST

dast_scan:
name: ZAP DAST scan
runs-on: ubuntu-latest
needs:
deploy_dev
steps:
- name: Checkout
uses: actions/checkout@v4
- name: ZAP Scan
uses: zaproxy/action-full-scan@v0.8.0
with:
target: ${{ env.DEV_ECS_SERVICE_URL }}
fail_action: true # Fail if there are any alerts
allow_issue_writing: false # Don't create Github Issues
rules_file_name: ./.github/files/ignore-rules-list.txt

Newman - Integration Tests imported from Postman

newman_integration_test_dev:
name: Dev Integration Tests
runs-on: ubuntu-latest
needs:
- deploy_dev
steps:
- name: Checkout
uses: actions/checkout@v4
- uses: matt-ball/newman-action@v2.0.0
with:
collection: ./integration/sample-service.postman_collection.json
environment: ./integration/dev/sample-service-dev.postman_environment.json

Build Docker image and deploy to ECS production

After all of the previous steps we are confident that the service is ready to be deployed to production!

deploy_prod:
name: Deploy Production
runs-on: ubuntu-latest
environment: prod
needs:
- dast_scan
- newman_integration_test_dev
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.PROD_AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.PROD_AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image to Amazon ECR
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
# Build a docker container and
# push it to ECR so that it can
# be deployed to ECS.
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REGISTRY/$ECR_REPOSITORY:latest .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
- name: Download task def
run: |
aws ecs describe-task-definition --region ${{ env.AWS_REGION }} --task-definition ${{ env.ECS_TASK_DEFINITION }} --query taskDefinition > prod-task-definition.json
- name: Fill in the new image ID in the Amazon ECS task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: prod-task-definition.json
container-name: ${{ env.CONTAINER_NAME }}
image: ${{ steps.build-image.outputs.image }}
- name: Deploy to Amazon ECS service
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: true

Outside of project scope and potential future improvements

  1. TLS connections on traffic from the ALB to the ECS service. This would include adding self-signed certificates on the service itself and I felt that it wouldn't be necessary for this specific project. This would necessary if there was a compliance requirement for end-to-end encrpytion (E2EE), like HIPAA. This technically isn't pure E2EE since the ALB decrypts and re-encrypts traffic, but HIPAA deems this as acceptable.
  2. Performance testing in the Github Actions pipeline. While I agree that performance testing is important, since the application isn't being used by the public, I don't think spending the time using a performance test would be beneficial. In the past I've used JMeter and K6.
  3. Breaking up the pipeline for PRs. Since I'm the only one that worked on this project I'm not using branching. If this was a real production project, code would have to be peer reviewed before deploying and I would have PRs run the unit tests, SAST, and secrets scan before allowing merges.
  4. Alerting on pipeline failure. This pipeline already sends alerts to my email account linked to my Github account, but I've always found that to be an inefficient way of doing alerts. In the past, I've added Slack alerts to specific company channels and feel like that is an appropriate way of alerting, if not as well alerting on normal paging channels (ie. PagerDuty, OpsGenie, etc).
  5. Using OIDC provider to authenticate Github actions pipeline instead of AWS users. This is a better option for credentials for your pipeline because they use short lived STS credentials, but it costs money to setup.

Overview

I really enjoyed working on this project. It has a lot of elements of topics I like to work on; CICD, security, and service management. Thank you for reading the article, please share it if you thought it was interesting or useful.