Securing ECS deployments with Github Actions
— Github Actions, AWS, ECS, Terraform, Testing, Infrastructure — 9 min read
Pipelines
One of the largest responsibilities of a DevOps engineer is making great CICD pipelines. I've worked on many in my career; some that deployed services to Kubernetes clusters. A few that published secure base images for internal teams to use for deploying their microservices to cloud environments. I even use a pipeline to publish this blog and you can see where I wrote about that in cloudkrunch.com/the-cloudkrunch-blog and cloudkrunch.com/cicd-improvements.
So to show off all my experience with creating pipelines I decided to set up a project that highlighted just that and also some of my security background.
Project Overview
This project consists of two parts:
- ECS service that is served traffic by an ALB
- Github actions pipeline that deploys changes to ECS and runs secrets scans, unit tests, integration tests, and SAST/DAST scans
Creating the AWS infrastructure
All of the infrastructure components in this project are created in Terraform and I specifically use Terragrunt to manage my Terraform state and providers. Here's a list of the components.
VPC with public and private subnets
module "vpc" { source = "cloudposse/vpc/aws"
namespace = "eg" stage = "dev" name = "projects"
ipv4_primary_cidr_block = "10.0.0.0/16"
assign_generated_ipv6_cidr_block = false}
module "dynamic_subnets" { source = "cloudposse/dynamic-subnets/aws"
namespace = "eg" stage = "dev" name = "services" availability_zones = ["us-west-2a","us-west-2b","us-west-2c"] vpc_id = module.vpc.vpc_id igw_id = [module.vpc.igw_id] ipv4_cidr_block = ["10.0.0.0/16"] nat_gateway_enabled = true}
ECR with on-push scanning enabled and lifecycle policy
locals { repo_name = "sample-service"}
resource "aws_ecr_repository" "sample_service_repo" {
name = local.repo_name image_tag_mutability = "MUTABLE"
image_scanning_configuration { scan_on_push = true }
encryption_configuration { encryption_type = "AES256" }}
resource "aws_ecr_lifecycle_policy" "lifecycle" { repository = aws_ecr_repository.sample_service_repo.name
policy = <<EOF{ "rules": [ { "rulePriority": 1, "description": "Keep last 5 images", "selection": { "tagStatus": "any", "countType": "imageCountMoreThan", "countNumber": 5 }, "action": { "type": "expire" } } ]}EOF}
ECS cluster with cloudwatch logs encrypted with Customer Managed Key
resource "aws_kms_key" "cluster_logs" { description = "cluster logs KMS key" deletion_window_in_days = 7 enable_key_rotation = true}
resource "aws_cloudwatch_log_group" "cluster_logs" { name = "cluster-logs"}
resource "aws_ecs_cluster" "general" { name = "general-cluster"
configuration { execute_command_configuration { kms_key_id = aws_kms_key.cluster_logs.arn logging = "OVERRIDE"
log_configuration { cloud_watch_encryption_enabled = true cloud_watch_log_group_name = aws_cloudwatch_log_group.cluster_logs.name } } }}
ECS service IAM roles
resource "aws_iam_role" "ecs_task_execution_role" { name = "sample_service_execution_role" assume_role_policy = <<EOF{ "Version": "2012-10-17", "Statement": [ { "Action": "sts:AssumeRole", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Effect": "Allow", "Sid": "" } ]}EOF}
resource "aws_iam_role" "ecs_task_role" { name = "ecs_sample_service_task_role" assume_role_policy = <<EOF{ "Version": "2012-10-17", "Statement": [ { "Action": "sts:AssumeRole", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Effect": "Allow", "Sid": "" } ]}EOF}
resource "aws_iam_role_policy_attachment" "ecs-task-execution-role-policy-attachment" { role = aws_iam_role.ecs_task_execution_role.name policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"}
ECS task definition
resource "aws_ecs_task_definition" "definition" { family = "sample_service" task_role_arn = "${aws_iam_role.ecs_task_role.arn}" execution_role_arn = "${aws_iam_role.ecs_task_execution_role.arn}" network_mode = "awsvpc" cpu = "256" memory = "512" requires_compatibilities = ["FARGATE"]
runtime_platform { operating_system_family = "LINUX" cpu_architecture = "X86_64" }
container_definitions = jsonencode([ { image = "${local.sample_service.image_url}", name = "${local.sample_service.name}", logConfiguration = { "logDriver": "awslogs", "options": { "awslogs-region" : "us-west-2", "awslogs-group" : "ecs-service-logs", "awslogs-stream-prefix" : "${local.sample_service.name}" } } portMappings = [ { containerPort = 8080 hostPort = 8080 } ] essential = true } ])}
ECS service
resource "aws_ecs_service" "sample_service" { name = local.sample_service.name cluster = var.cluster_arn task_definition = aws_ecs_task_definition.definition.arn desired_count = 1 depends_on = [aws_iam_role.ecs_task_role] launch_type = "FARGATE"
load_balancer { target_group_arn = module.alb.default_target_group_arn container_name = local.sample_service.name container_port = 8080 }
network_configuration { subnets = var.dynamic_subnets.private_subnet_ids security_groups = [ aws_security_group.sample_service.id ] assign_public_ip = false }}
What it looks like in the cloud console.
Fine-grain policy for Github Actions deployment
There's a lot going on here, but I tried to be as specific as I can be to allow for least priviledge.
data "aws_iam_policy_document" "ecs_ecr_sample_service_doc" {
# Be able to get a token for ECR statement { actions = [ "ecr:GetAuthorizationToken", "ecs:DescribeTaskDefinition", "ecs:RegisterTaskDefinition", ] resources = [ "*" ] }
# ECR repository related tasks statement { actions = [ "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:ListImages", "ecr:DescribeImages", "ecr:BatchGetImage", "ecr:ListTagsForResource", "ecr:DescribeImageScanFindings", "ecr:CompleteLayerUpload", "ecr:UploadLayerPart", "ecr:InitiateLayerUpload", "ecr:BatchCheckLayerAvailability", "ecr:PutImage" ] resources = [ var.ecr_sample_service_arn, ] }
statement { actions = [ "ecs:DescribeServices", "ecs:CreateTaskSet", "ecs:UpdateServicePrimaryTaskSet", "ecs:DeleteTaskSet", "elasticloadbalancing:DescribeTargetGroups", ] resources = [ "*" ] }
statement { actions = [ "elasticloadbalancing:DescribeListeners", "elasticloadbalancing:ModifyListener", "elasticloadbalancing:DescribeRules", "elasticloadbalancing:ModifyRule", ] resources = [ "arn:aws:elasticloadbalancing:us-west-2:${data.aws_caller_identity.current.account_id}:listener-rule/app/${module.alb.alb_name}/*" ] }
statement { actions = [ "ecs:UpdateService" ] resources = [ aws_ecs_service.sample_service.id ] }
statement { actions = [ "elasticloadbalancing:DescribeTargetGroups", ] resources = [ "arn:aws:elasticloadbalancing:us-west-2:${data.aws_caller_identity.current.account_id}:targetgroup/${module.alb.alb_name}*" ] }
statement { actions = [ "iam:PassRole" ] resources = [ aws_iam_role.ecs_task_execution_role.arn, aws_iam_role.ecs_task_role.arn ] }}
Once all that is stood up we just need to add a credential for a deployment user as a secret in our Github repository.
Setup the pipeline
I ended up using the Github Action provided by each tool to construct my pipeline.
Here are all of the tools I used to set it up and the jobs in my deploy.yml workflow:
Trufflehog - Secrets scan
secrets_scan: name: Trufflehog secrets scan runs-on: ubuntu-latest defaults: run: shell: bash steps: - name: Checkout code uses: actions/checkout@v3 with: fetch-depth: 0 - name: TruffleHog OSS id: trufflehog uses: trufflesecurity/trufflehog@v3.63.2 continue-on-error: true with: path: ./ base: "${{ github.event.repository.default_branch }}" head: HEAD extra_args: --debug --only-verified - name: Scan Results Status if: steps.trufflehog.outcome == 'failure' run: exit 1 # Exit if secrets are found
Built-in Go tests
unit_tests: name: Go Unit Tests runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4
- name: Setup Go uses: actions/setup-go@v4 with: go-version: '1.20.x'
- name: Install dependencies run: go mod download - name: Run Unit Tests run: go test ./...
Semgrep - SAST scan
semgrep: name: Semgrep SAST scan runs-on: ubuntu-latest container: image: returntocorp/semgrep steps: - name: clone application source code uses: actions/checkout@v3
- name: run scan run: | semgrep \ --sarif --output report.sarif \ --metrics=off \ --config="p/default" - name: save report as pipeline artifact uses: actions/upload-artifact@v3 with: name: report.sarif path: report.sarif
- name: Download report uses: actions/download-artifact@v3 with: name: report.sarif
Build Docker image and deploy to ECS dev
After the unit tests, secrets scan, and SAST have run, we run the dev deployment. This step also inspects the scan results from ECR and checks there are no critical or high vulnerabilities in our container.
deploy_dev: name: Deploy Development runs-on: ubuntu-latest environment: dev needs: - secrets_scan - semgrep - unit_tests
steps: - name: Checkout uses: actions/checkout@v4
- name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.DEV_AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.DEV_AWS_SECRET_ACCESS_KEY }} aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR id: login-ecr uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image to Amazon ECR id: build-image env: ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }} IMAGE_TAG: ${{ github.sha }} run: | # Build a docker container and # push it to ECR so that it can # be deployed to ECS. docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REGISTRY/$ECR_REPOSITORY:latest . docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
- name: Scan Docker image id: docker-scan uses: alexjurkiewicz/ecr-scan-image@v1.7.1 env: ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }} IMAGE_TAG: ${{ github.sha }} with: repository: ${{ env.ECR_REPOSITORY }} tag: ${{ env.IMAGE_TAG }} fail_threshold: high - name: Download task def run: | aws ecs describe-task-definition --region ${{ env.AWS_REGION }} --task-definition ${{ env.ECS_TASK_DEFINITION }} --query taskDefinition > dev-task-definition.json - name: Fill in the new image ID in the Amazon ECS task definition id: task-def uses: aws-actions/amazon-ecs-render-task-definition@v1 with: task-definition: dev-task-definition.json container-name: ${{ env.CONTAINER_NAME }} image: ${{ steps.build-image.outputs.image }}
After deployment to dev is successful, we concurrently run DAST and integration tests against the dev service.
ZAP - DAST
dast_scan: name: ZAP DAST scan runs-on: ubuntu-latest needs: deploy_dev steps: - name: Checkout uses: actions/checkout@v4
- name: ZAP Scan uses: zaproxy/action-full-scan@v0.8.0 with: target: ${{ env.DEV_ECS_SERVICE_URL }} fail_action: true # Fail if there are any alerts allow_issue_writing: false # Don't create Github Issues rules_file_name: ./.github/files/ignore-rules-list.txt
Newman - Integration Tests imported from Postman
newman_integration_test_dev: name: Dev Integration Tests runs-on: ubuntu-latest needs: - deploy_dev steps: - name: Checkout uses: actions/checkout@v4
- uses: matt-ball/newman-action@v2.0.0 with: collection: ./integration/sample-service.postman_collection.json environment: ./integration/dev/sample-service-dev.postman_environment.json
Build Docker image and deploy to ECS production
After all of the previous steps we are confident that the service is ready to be deployed to production!
deploy_prod: name: Deploy Production runs-on: ubuntu-latest environment: prod needs: - dast_scan - newman_integration_test_dev
steps: - name: Checkout uses: actions/checkout@v4
- name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.PROD_AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.PROD_AWS_SECRET_ACCESS_KEY }} aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR id: login-ecr uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image to Amazon ECR id: build-image env: ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }} IMAGE_TAG: ${{ github.sha }} run: | # Build a docker container and # push it to ECR so that it can # be deployed to ECS. docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REGISTRY/$ECR_REPOSITORY:latest . docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT - name: Download task def run: | aws ecs describe-task-definition --region ${{ env.AWS_REGION }} --task-definition ${{ env.ECS_TASK_DEFINITION }} --query taskDefinition > prod-task-definition.json - name: Fill in the new image ID in the Amazon ECS task definition id: task-def uses: aws-actions/amazon-ecs-render-task-definition@v1 with: task-definition: prod-task-definition.json container-name: ${{ env.CONTAINER_NAME }} image: ${{ steps.build-image.outputs.image }}
- name: Deploy to Amazon ECS service uses: aws-actions/amazon-ecs-deploy-task-definition@v1 with: task-definition: ${{ steps.task-def.outputs.task-definition }} service: ${{ env.ECS_SERVICE }} cluster: ${{ env.ECS_CLUSTER }} wait-for-service-stability: true
Outside of project scope and potential future improvements
- TLS connections on traffic from the ALB to the ECS service. This would include adding self-signed certificates on the service itself and I felt that it wouldn't be necessary for this specific project. This would necessary if there was a compliance requirement for end-to-end encrpytion (E2EE), like HIPAA. This technically isn't pure E2EE since the ALB decrypts and re-encrypts traffic, but HIPAA deems this as acceptable.
- Performance testing in the Github Actions pipeline. While I agree that performance testing is important, since the application isn't being used by the public, I don't think spending the time using a performance test would be beneficial. In the past I've used JMeter and K6.
- Breaking up the pipeline for PRs. Since I'm the only one that worked on this project I'm not using branching. If this was a real production project, code would have to be peer reviewed before deploying and I would have PRs run the unit tests, SAST, and secrets scan before allowing merges.
- Alerting on pipeline failure. This pipeline already sends alerts to my email account linked to my Github account, but I've always found that to be an inefficient way of doing alerts. In the past, I've added Slack alerts to specific company channels and feel like that is an appropriate way of alerting, if not as well alerting on normal paging channels (ie. PagerDuty, OpsGenie, etc).
- Using OIDC provider to authenticate Github actions pipeline instead of AWS users. This is a better option for credentials for your pipeline because they use short lived STS credentials, but it costs money to setup.
Overview
I really enjoyed working on this project. It has a lot of elements of topics I like to work on; CICD, security, and service management. Thank you for reading the article, please share it if you thought it was interesting or useful.