DevOps

March 21, 2025
in AWS, Terraform, DevOps, Career Development, Cloud Resume Challenge
6 min read

Cloud Resume Challenge with Terraform: Final Reflections & Future Directions 🎯

Journey Complete: What We've Built 🏗️

Over the course of this blog series, we've successfully completed the Cloud Resume Challenge using Terraform as our infrastructure-as-code tool. Let's recap what we've accomplished:

Set up our development environment with Terraform and AWS credentials
Deployed a static website using S3, CloudFront, Route 53, and ACM
Built a serverless backend API with API Gateway, Lambda, and DynamoDB
Implemented CI/CD pipelines with GitHub Actions for automated deployments
Added security enhancements like OIDC authentication and least-privilege IAM policies

The final architecture we've created looks like this:

Basic Project Diagram

The most valuable aspect of this project is that we've built a completely automated, production-quality cloud solution. Every component is defined as code, enabling us to track changes, rollback if needed, and redeploy the entire infrastructure with minimal effort.

Key Learnings from the Challenge 🧠

Technical Skills Gained 💻

Throughout this challenge, I've gained significant technical skills:

Terraform expertise: I've moved from basic understanding to writing modular, reusable infrastructure code
AWS service integration: Learned how multiple AWS services work together to create a cohesive system
CI/CD implementation: Set up professional GitHub Actions workflows for continuous deployment
Security best practices: Implemented OIDC, least privilege, encryption, and more
Serverless architecture: Built and connected serverless components for a scalable, cost-effective solution

Unexpected Challenges & Solutions 🔄

The journey wasn't without obstacles. Here are some challenges I faced and how I overcame them:

1. State Management Complexity

Challenge: As the project grew, managing Terraform state became more complex, especially when working across different environments.

Solution: I restructured the project to use workspaces and remote state with careful output references between modules. This improved state organization and made multi-environment deployments more manageable.

2. CloudFront Cache Invalidation

Challenge: Updates to the website weren't immediately visible due to CloudFront caching.

Solution: Implemented proper cache invalidation in the CI/CD pipeline and set appropriate cache behaviors for different file types.

3. CORS Configuration

Challenge: The frontend JavaScript couldn't connect to the API due to CORS issues.

Solution: Added comprehensive CORS handling at both the API Gateway and Lambda levels, ensuring proper headers were returned.

4. CI/CD Authentication Security

Challenge: Initially used long-lived AWS credentials in GitHub Secrets, which posed security risks.

Solution: Replaced with OIDC for keyless authentication between GitHub Actions and AWS, eliminating credential management concerns.

Real-World Applications of This Project 🌐

The skills demonstrated in this challenge directly translate to real-world cloud engineering roles:

1. Infrastructure as Code Expertise

The ability to define, version, and automate infrastructure is increasingly essential in modern IT environments. This project showcases expertise with Terraform that can be applied to any cloud provider or on-premises infrastructure.

2. DevOps Pipeline Creation

Setting up CI/CD workflows that automate testing and deployment demonstrates key DevOps skills that organizations need to accelerate their development cycles.

3. Serverless Architecture Design

The backend API implementation shows understanding of event-driven, serverless architecture patterns that are becoming standard for new cloud applications.

4. Security Implementation

The security considerations throughout the project - from IAM roles to OIDC authentication - demonstrate the ability to build secure systems from the ground up.

Maintaining Your Cloud Resume 🔧

Now that your resume is live, here are some tips for maintaining it:

1. Regular Updates

Set a schedule to update both your resume content and the underlying infrastructure. I recommend:

Monthly content refreshes to keep your experience and skills current
Quarterly infrastructure reviews to apply security patches and update dependencies
Annual architecture reviews to consider new AWS services or features

2. Cost Management

While this solution is relatively inexpensive, it's good practice to set up AWS Budgets and alerts to monitor costs. My current monthly costs are approximately:

S3: ~$0.10 for storage
CloudFront: ~$0.50 for data transfer
Route 53: $0.50 for hosted zone
Lambda: Free tier covers typical usage
DynamoDB: Free tier covers typical usage
API Gateway: ~$1.00 for API calls
Total: ~$2.10/month

3. Monitoring and Alerting

I've set up CloudWatch alarms for:

API errors exceeding normal thresholds
Unusual traffic patterns that might indicate abuse
Lambda function failures

Consider adding application performance monitoring tools like AWS X-Ray for deeper insights.

Future Enhancements 🚀

There are many ways to extend this project further:

1. Content Management System Integration

Add a headless CMS like Contentful or Sanity to make resume updates easier without needing to edit HTML directly:

module "contentful_integration" {
  source = "./modules/contentful"

  api_key     = var.contentful_api_key
  space_id    = var.contentful_space_id
  environment = var.environment
}

resource "aws_lambda_function" "content_sync" {
  function_name = "resume-content-sync-${var.environment}"
  handler       = "index.handler"
  runtime       = "nodejs14.x"
  role          = aws_iam_role.content_sync_role.arn

  environment {
    variables = {
      CONTENTFUL_API_KEY = var.contentful_api_key
      CONTENTFUL_SPACE_ID = var.contentful_space_id
      S3_BUCKET = module.frontend.website_bucket_name
    }
  }
}

2. Advanced Analytics

Implement sophisticated visitor analytics beyond simple counting:

resource "aws_kinesis_firehose_delivery_stream" "visitor_analytics" {
  name        = "resume-visitor-analytics-${var.environment}"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn   = aws_iam_role.firehose_role.arn
    bucket_arn = aws_s3_bucket.analytics.arn

    processing_configuration {
      enabled = "true"

      processors {
        type = "Lambda"

        parameters {
          parameter_name  = "LambdaArn"
          parameter_value = aws_lambda_function.analytics_processor.arn
        }
      }
    }
  }
}

resource "aws_athena_workgroup" "analytics" {
  name = "resume-analytics-${var.environment}"

  configuration {
    result_configuration {
      output_location = "s3://${aws_s3_bucket.analytics_results.bucket}/results/"
    }
  }
}

3. Multi-Region Deployment

Enhance reliability and performance by deploying to multiple AWS regions:

module "frontend_us_east_1" {
  source = "./modules/frontend"

  providers = {
    aws = aws.us_east_1
  }

  # Configuration for US East region
}

module "frontend_eu_west_1" {
  source = "./modules/frontend"

  providers = {
    aws = aws.eu_west_1
  }

  # Configuration for EU West region
}

resource "aws_route53_health_check" "primary_region" {
  fqdn              = module.frontend_us_east_1.cloudfront_domain_name
  port              = 443
  type              = "HTTPS"
  resource_path     = "/"
  failure_threshold = 3
  request_interval  = 30
}

resource "aws_route53_record" "global" {
  zone_id = data.aws_route53_zone.selected.zone_id
  name    = var.domain_name
  type    = "CNAME"

  failover_routing_policy {
    type = "PRIMARY"
  }

  health_check_id = aws_route53_health_check.primary_region.id
  set_identifier  = "primary"
  records         = [module.frontend_us_east_1.cloudfront_domain_name]
  ttl             = 300
}

4. Infrastructure Testing

Add comprehensive testing using Terratest:

package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestResumeFrontend(t *testing.T) {
    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../modules/frontend",
        Vars: map[string]interface{}{
            "environment": "test",
            "domain_name": "test.example.com",
        },
    })

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verify outputs
    bucketName := terraform.Output(t, terraformOptions, "website_bucket_name")
    assert.Contains(t, bucketName, "resume-website-test")
}

Career Impact & Personal Growth 📈

Completing this challenge has had a significant impact on my career development:

Technical Growth

I've moved from basic cloud knowledge to being able to architect and implement complex, multi-service solutions. The hands-on experience with Terraform has been particularly valuable, as it's a highly sought-after skill in the job market.

Portfolio Enhancement

This project now serves as both my resume and a demonstration of my cloud engineering capabilities. I've included the GitHub repository links on my resume, allowing potential employers to see the code behind the deployment.

Community Engagement

Sharing this project through blog posts has connected me with the broader cloud community. The feedback and discussions have been invaluable for refining my approach and learning from others.

Final Thoughts 💭

The Cloud Resume Challenge has been an invaluable learning experience. By implementing it with Terraform, I've gained practical experience with both AWS services and infrastructure as code - skills that are directly applicable to professional cloud engineering roles.

What makes this challenge particularly powerful is how it combines so many aspects of modern cloud development:

Front-end web development
Back-end serverless APIs
Infrastructure as code
CI/CD automation
Security implementation
DNS configuration
Content delivery networks

If you're following along with this series, I encourage you to customize and extend the project to showcase your unique skills and interests. The foundational architecture we've built provides a flexible platform that can evolve with your career.

For those just starting their cloud journey, this challenge offers a perfect blend of practical skills in a realistic project that demonstrates end-to-end capabilities. It's far more valuable than isolated tutorials or theoretical knowledge alone.

The cloud engineering field continues to evolve rapidly, but the principles we've applied throughout this project - automation, security, scalability, and operational excellence - remain constants regardless of which specific technologies are in favor.

What's Next? 🔮

While this concludes our Cloud Resume Challenge series, my cloud learning journey continues. Some areas I'm exploring next include:

Kubernetes and container orchestration
Infrastructure testing frameworks
Cloud cost optimization
Multi-cloud deployments
Infrastructure security scanning
Service mesh implementations

I hope this series has been helpful in your own cloud journey. Feel free to reach out with questions or to share your own implementations of the challenge!

This post concludes our Cloud Resume Challenge with Terraform series. Thanks for following along!

Want to see the Cloud Resume Challenge in action? Visit my resume website and check out the GitHub repositories for the complete code.

Share on Share on

March 20, 2025
in AWS, Terraform, DevOps, CI/CD, Cloud Resume Challenge
14 min read

Cloud Resume Challenge with Terraform: Automating Deployments with GitHub Actions ⚡

In our previous posts, we built the frontend and backend components of our cloud resume project. Now it's time to take our implementation to the next level by implementing continuous integration and deployment (CI/CD) with GitHub Actions.

Why CI/CD Is Critical for Cloud Engineers 🛠️

When I first started this challenge, I manually ran terraform apply every time I made a change. This quickly became tedious and error-prone. As a cloud engineer, I wanted to demonstrate a professional approach to infrastructure management by implementing proper CI/CD pipelines.

Automating deployments offers several key benefits:

Consistency: Every deployment follows the same process
Efficiency: No more manual steps or waiting around
Safety: Automated tests catch issues before they reach production
Auditability: Each change is tracked with a commit and workflow run

This approach mirrors how professional cloud teams work and is a crucial skill for any cloud engineer.

CI/CD Architecture Overview 🏗️

Here's a visual representation of our CI/CD pipelines:

┌─────────────┐          ┌─────────────────┐          ┌─────────────┐
│             │          │                 │          │             │
│  Developer  ├─────────►│  GitHub Actions ├─────────►│  AWS Cloud  │
│  Workstation│          │                 │          │             │
└─────────────┘          └─────────────────┘          └─────────────┘
       │                          │                          ▲
       │                          │                          │
       ▼                          ▼                          │
┌─────────────┐          ┌─────────────────┐                 │
│             │          │                 │                 │
│   GitHub    │          │  Terraform      │                 │
│ Repositories│          │  Plan & Apply   ├─────────────────┘
│             │          │                 │
└─────────────┘          └─────────────────┘

We'll set up separate workflows for:

Frontend deployment: Updates the S3 website content and invalidates CloudFront
Backend deployment: Runs Terraform to update our API infrastructure
Smoke tests: Verifies that both components are working correctly after deployment

Setting Up GitHub Repositories 📁

For this challenge, I've created two repositories:

cloud-resume-frontend: Contains HTML, CSS, JavaScript, and frontend deployment workflows
cloud-resume-backend: Contains Terraform configuration, Lambda code, and backend deployment workflows

Repository Structure

Here's how I've organized my repositories:

Frontend Repository:

cloud-resume-frontend/
├── .github/
│   └── workflows/
│       └── deploy.yml
├── website/
│   ├── index.html
│   ├── styles.css
│   ├── counter.js
│   └── error.html
├── tests/
│   └── cypress/
│       └── integration/
│           └── counter.spec.js
└── README.md

Backend Repository:

cloud-resume-backend/
├── .github/
│   └── workflows/
│       └── deploy.yml
├── lambda/
│   └── visitor_counter.py
├── terraform/
│   ├── modules/
│   │   ├── backend/
│   │   │   ├── api_gateway.tf
│   │   │   ├── dynamodb.tf
│   │   │   ├── lambda.tf
│   │   │   ├── variables.tf
│   │   │   └── outputs.tf
│   ├── environments/
│   │   ├── dev/
│   │   │   └── main.tf
│   │   └── prod/
│   │       └── main.tf
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
├── tests/
│   └── test_visitor_counter.py
└── README.md

Securing AWS Authentication in GitHub Actions 🔒

Before setting up our workflows, we need to address a critical security concern: how to securely authenticate GitHub Actions with AWS.

In the past, many tutorials recommended storing AWS access keys as GitHub Secrets. This approach works but has significant security drawbacks:

Long-lived credentials are a security risk
Credential rotation is manual and error-prone
Access is typically overly permissive

Instead, I'll implement a more secure approach using OpenID Connect (OIDC) for keyless authentication between GitHub Actions and AWS.

Setting Up OIDC Authentication

First, create an IAM OIDC provider for GitHub in your AWS account:

# oidc-provider.tf
resource "aws_iam_openid_connect_provider" "github" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["6938fd4d98bab03faadb97b34396831e3780aea1"]
}

Then, create an IAM role that GitHub Actions can assume:

# oidc-role.tf
resource "aws_iam_role" "github_actions" {
  name = "github-actions-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRoleWithWebIdentity"
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.github.arn
        }
        Condition = {
          StringEquals = {
            "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
          }
          StringLike = {
            "token.actions.githubusercontent.com:sub" = "repo:${var.github_org}/${var.github_repo}:*"
          }
        }
      }
    ]
  })
}

# Attach policies to the role
resource "aws_iam_role_policy_attachment" "terraform_permissions" {
  role       = aws_iam_role.github_actions.name
  policy_arn = aws_iam_policy.terraform_permissions.arn
}

resource "aws_iam_policy" "terraform_permissions" {
  name        = "terraform-deployment-policy"
  description = "Policy for Terraform deployments via GitHub Actions"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "s3:*",
          "cloudfront:*",
          "route53:*",
          "acm:*",
          "lambda:*",
          "apigateway:*",
          "dynamodb:*",
          "logs:*",
          "iam:GetRole",
          "iam:PassRole",
          "iam:CreateRole",
          "iam:DeleteRole",
          "iam:PutRolePolicy",
          "iam:DeleteRolePolicy",
          "iam:AttachRolePolicy",
          "iam:DetachRolePolicy"
        ]
        Effect   = "Allow"
        Resource = "*"
      }
    ]
  })
}

For a production environment, I would use more fine-grained permissions, but this policy works for our demonstration.

Implementing Frontend CI/CD Workflow 🔄

Let's create a GitHub Actions workflow for our frontend repository. Create a file at .github/workflows/deploy.yml:

name: Deploy Frontend

on:
  push:
    branches:
      - main
    paths:
      - 'website/**'
      - '.github/workflows/deploy.yml'

  workflow_dispatch:

permissions:
  id-token: write
  contents: read

jobs:
  deploy:
    name: 'Deploy to S3 and Invalidate CloudFront'
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-role
          aws-region: us-east-1

      - name: Deploy to S3
        run: |
          aws s3 sync website/ s3://${{ secrets.S3_BUCKET_NAME }} --delete

      - name: Invalidate CloudFront Cache
        run: |
          aws cloudfront create-invalidation --distribution-id ${{ secrets.CLOUDFRONT_DISTRIBUTION_ID }} --paths "/*"

  test:
    name: 'Run Smoke Tests'
    needs: deploy
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Install Cypress
        uses: cypress-io/github-action@v5
        with:
          install-command: npm install

      - name: Run Cypress Tests
        uses: cypress-io/github-action@v5
        with:
          command: npx cypress run
          config: baseUrl=${{ secrets.WEBSITE_URL }}

This workflow:

Authenticates using OIDC
Syncs website files to the S3 bucket
Invalidates the CloudFront cache
Runs Cypress tests to verify the site is working

Creating a Cypress Test for the Frontend

Let's create a simple Cypress test to verify that our visitor counter is working. First, create a package.json file in the root of your frontend repository:

{
  "name": "cloud-resume-frontend",
  "version": "1.0.0",
  "description": "Frontend for Cloud Resume Challenge",
  "scripts": {
    "test": "cypress open",
    "test:ci": "cypress run"
  },
  "devDependencies": {
    "cypress": "^12.0.0"
  }
}

Then create a Cypress test at tests/cypress/integration/counter.spec.js:

describe('Resume Website Tests', () => {
  beforeEach(() => {
    // Visit the home page before each test
    cy.visit('/');
  });

  it('should load the resume page', () => {
    // Check that we have a title
    cy.get('h1').should('be.visible');

    // Check that key sections exist
    cy.contains('Experience').should('be.visible');
    cy.contains('Education').should('be.visible');
    cy.contains('Skills').should('be.visible');
  });

  it('should load and display the visitor counter', () => {
    // Check that the counter element exists
    cy.get('#count').should('exist');

    // Wait for the counter to update (should not remain at 0)
    cy.get('#count', { timeout: 10000 })
      .should('not.contain', '0')
      .should('not.contain', 'Loading');

    // Verify the counter shows a number
    cy.get('#count').invoke('text').then(parseFloat)
      .should('be.gt', 0);
  });
});

Implementing Backend CI/CD Workflow 🔄

Now, let's create a GitHub Actions workflow for our backend repository. Create a file at .github/workflows/deploy.yml:

name: Deploy Backend

on:
  push:
    branches:
      - main
    paths:
      - 'lambda/**'
      - 'terraform/**'
      - '.github/workflows/deploy.yml'

  pull_request:
    branches:
      - main

  workflow_dispatch:

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  test:
    name: 'Run Python Tests'
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install Dependencies
        run: |
          python -m pip install --upgrade pip
          pip install pytest boto3 moto

      - name: Run Tests
        run: |
          python -m pytest tests/

  validate:
    name: 'Validate Terraform'
    runs-on: ubuntu-latest
    needs: test

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.2.0

      - name: Terraform Format
        working-directory: ./terraform
        run: terraform fmt -check

      - name: Terraform Init
        working-directory: ./terraform
        run: terraform init -backend=false

      - name: Terraform Validate
        working-directory: ./terraform
        run: terraform validate

  plan:
    name: 'Terraform Plan'
    runs-on: ubuntu-latest
    needs: validate
    if: github.event_name == 'pull_request' || github.event_name == 'push' || github.event_name == 'workflow_dispatch'
    environment: dev

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-role
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.2.0

      - name: Terraform Init
        working-directory: ./terraform
        run: terraform init -backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" -backend-config="key=${{ secrets.TF_STATE_KEY }}" -backend-config="region=us-east-1"

      - name: Terraform Plan
        working-directory: ./terraform
        run: terraform plan -var="environment=dev" -var="domain_name=${{ secrets.DOMAIN_NAME }}" -out=tfplan

      - name: Comment Plan on PR
        uses: actions/github-script@v6
        if: github.event_name == 'pull_request'
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
            #### Terraform Plan 📖\`${{ steps.plan.outcome }}\`

            <details><summary>Show Plan</summary>

            \`\`\`terraform
            ${{ steps.plan.outputs.stdout }}
            \`\`\`

            </details>`;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

      - name: Upload Plan Artifact
        uses: actions/upload-artifact@v3
        with:
          name: tfplan
          path: ./terraform/tfplan

  apply:
    name: 'Terraform Apply'
    runs-on: ubuntu-latest
    needs: plan
    if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'workflow_dispatch'
    environment: dev

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-role
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.2.0

      - name: Terraform Init
        working-directory: ./terraform
        run: terraform init -backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" -backend-config="key=${{ secrets.TF_STATE_KEY }}" -backend-config="region=us-east-1"

      - name: Download Plan Artifact
        uses: actions/download-artifact@v3
        with:
          name: tfplan
          path: ./terraform

      - name: Terraform Apply
        working-directory: ./terraform
        run: terraform apply -auto-approve tfplan

  test-api:
    name: 'Test API Deployment'
    runs-on: ubuntu-latest
    needs: apply
    environment: dev

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-role
          aws-region: us-east-1

      - name: Fetch API Endpoint
        run: |
          API_ENDPOINT=$(aws cloudformation describe-stacks --stack-name resume-backend-dev --query "Stacks[0].Outputs[?OutputKey=='ApiEndpoint'].OutputValue" --output text)
          echo "API_ENDPOINT=$API_ENDPOINT" >> $GITHUB_ENV

      - name: Test API Response
        run: |
          response=$(curl -s "$API_ENDPOINT/count")
          echo "API Response: $response"

          # Check if the response contains a count field
          echo $response | grep -q '"count":'
          if [ $? -eq 0 ]; then
            echo "API test successful"
          else
            echo "API test failed"
            exit 1
          fi

This workflow is more complex and includes:

Running Python tests for the Lambda function
Validating Terraform syntax and formatting
Planning Terraform changes (with PR comments for review)
Applying Terraform changes to the environment
Testing the deployed API to ensure it's functioning

Implementing Multi-Environment Deployments 🌍

One of the most valuable CI/CD patterns is deploying to multiple environments. Let's modify our backend workflow to support both development and production environments:

# Additional job for production deployment after dev is successful
  promote-to-prod:
    name: 'Promote to Production'
    runs-on: ubuntu-latest
    needs: test-api
    environment: production
    if: github.event_name == 'workflow_dispatch'

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-role
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.2.0

      - name: Terraform Init
        working-directory: ./terraform/environments/prod
        run: terraform init -backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" -backend-config="key=${{ secrets.TF_STATE_KEY_PROD }}" -backend-config="region=us-east-1"

      - name: Terraform Plan
        working-directory: ./terraform/environments/prod
        run: terraform plan -var="environment=prod" -var="domain_name=${{ secrets.DOMAIN_NAME_PROD }}" -out=tfplan

      - name: Terraform Apply
        working-directory: ./terraform/environments/prod
        run: terraform apply -auto-approve tfplan

      - name: Test Production API
        run: |
          API_ENDPOINT=$(aws cloudformation describe-stacks --stack-name resume-backend-prod --query "Stacks[0].Outputs[?OutputKey=='ApiEndpoint'].OutputValue" --output text)
          response=$(curl -s "$API_ENDPOINT/count")
          echo "API Response: $response"

          # Check if the response contains a count field
          echo $response | grep -q '"count":'
          if [ $? -eq 0 ]; then
            echo "Production API test successful"
          else
            echo "Production API test failed"
            exit 1
          fi

Terraform Structure for Multiple Environments

To support multiple environments, I've reorganized my Terraform configuration:

terraform/
├── modules/
│   ├── backend/
│   │   ├── api_gateway.tf
│   │   ├── dynamodb.tf
│   │   ├── lambda.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── prod/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf

Each environment directory contains its own Terraform configuration that references the shared modules.

Implementing GitHub Security Best Practices 🔒

To enhance the security of our CI/CD pipelines, I've implemented several additional measures:

1. Supply Chain Security with Dependabot

Create a file at .github/dependabot.yml in both repositories:

version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10

  # For frontend
  - package-ecosystem: "npm"
    directory: "/"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10

  # For backend
  - package-ecosystem: "pip"
    directory: "/"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10

This configuration automatically updates dependencies and identifies security vulnerabilities.

2. Code Scanning with CodeQL

Create a file at .github/workflows/codeql.yml in the backend repository:

name: "CodeQL"

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
  schedule:
    - cron: '0 0 * * 0'  # Run weekly

jobs:
  analyze:
    name: Analyze
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
      security-events: write

    strategy:
      fail-fast: false
      matrix:
        language: [ 'python', 'javascript' ]

    steps:
    - name: Checkout repository
      uses: actions/checkout@v3

    - name: Initialize CodeQL
      uses: github/codeql-action/init@v2
      with:
        languages: ${{ matrix.language }}

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v2

This workflow scans our code for security vulnerabilities and coding problems.

3. Branch Protection Rules

I've set up branch protection rules for the main branch in both repositories:

Require pull request reviews before merging
Require status checks to pass before merging
Require signed commits
Do not allow bypassing the above settings

Adding Verification Tests to the Workflow 🧪

In addition to unit tests, I've added end-to-end integration tests to verify that the frontend and backend work together correctly:

1. Frontend-Backend Integration Test

Create a file at tests/integration-test.js in the frontend repository:

const axios = require('axios');
const assert = require('assert');

// URLs to test - these should be passed as environment variables
const WEBSITE_URL = process.env.WEBSITE_URL || 'https://resume.yourdomain.com';
const API_URL = process.env.API_URL || 'https://api.yourdomain.com/count';

// Test that the API returns a valid response
async function testAPI() {
  try {
    console.log(`Testing API at ${API_URL}`);
    const response = await axios.get(API_URL);

    // Verify the API response contains a count
    assert(response.status === 200, `API returned status ${response.status}`);
    assert(response.data.count !== undefined, 'API response missing count field');
    assert(typeof response.data.count === 'number', 'Count is not a number');

    console.log(`API test successful. Count: ${response.data.count}`);
    return true;
  } catch (error) {
    console.error('API test failed:', error.message);
    return false;
  }
}

// Test that the website loads and contains necessary elements
async function testWebsite() {
  try {
    console.log(`Testing website at ${WEBSITE_URL}`);
    const response = await axios.get(WEBSITE_URL);

    // Verify the website loads
    assert(response.status === 200, `Website returned status ${response.status}`);

    // Check that the page contains some expected content
    assert(response.data.includes('<html'), 'Response is not HTML');
    assert(response.data.includes('id="count"'), 'Counter element not found');

    console.log('Website test successful');
    return true;
  } catch (error) {
    console.error('Website test failed:', error.message);
    return false;
  }
}

// Run all tests
async function runTests() {
  const apiResult = await testAPI();
  const websiteResult = await testWebsite();

  if (apiResult && websiteResult) {
    console.log('All integration tests passed!');
    process.exit(0);
  } else {
    console.error('Some integration tests failed');
    process.exit(1);
  }
}

// Run the tests
runTests();

Then add a step to the workflow:

- name: Run Integration Tests
  run: |
    npm install axios
    node tests/integration-test.js
  env:
    WEBSITE_URL: ${{ secrets.WEBSITE_URL }}
    API_URL: ${{ secrets.API_URL }}

Implementing Secure GitHub Action Secrets 🔐

For our GitHub Actions workflows, I've set up the following repository secrets:

AWS_ACCOUNT_ID: The AWS account ID used for OIDC authentication
S3_BUCKET_NAME: The name of the S3 bucket for the website
CLOUDFRONT_DISTRIBUTION_ID: The ID of the CloudFront distribution
WEBSITE_URL: The URL of the deployed website
API_URL: The URL of the deployed API
TF_STATE_BUCKET: The bucket for Terraform state
TF_STATE_KEY: The key for Terraform state (dev)
TF_STATE_KEY_PROD: The key for Terraform state (prod)
DOMAIN_NAME: The domain name for the dev environment
DOMAIN_NAME_PROD: The domain name for the prod environment

These secrets are protected by GitHub and only exposed to authorized workflow runs.

Managing Manual Approvals for Production Deployments 🚦

For production deployments, I've added a manual approval step using GitHub Environments:

Go to your repository settings
Navigate to Environments
Create a new environment called "production"
Enable "Required reviewers" and add yourself
Configure "Deployment branches" to limit deployments to specific branches

Now, production deployments will require explicit approval from an authorized reviewer.

Monitoring Deployment Status and Notifications 📊

To stay informed about deployment status, I've added notifications to the workflow:

- name: Notify Deployment Success
  if: success()
  uses: rtCamp/action-slack-notify@v2
  env:
    SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
    SLACK_TITLE: Deployment Successful
    SLACK_MESSAGE: "✅ Deployment to ${{ github.workflow }} was successful!"
    SLACK_COLOR: good

- name: Notify Deployment Failure
  if: failure()
  uses: rtCamp/action-slack-notify@v2
  env:
    SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
    SLACK_TITLE: Deployment Failed
    SLACK_MESSAGE: "❌ Deployment to ${{ github.workflow }} failed!"
    SLACK_COLOR: danger

This sends notifications to a Slack channel when deployments succeed or fail.

Implementing Additional Security for AWS CloudFront 🔒

To enhance the security of our CloudFront distribution, I've added a custom response headers policy:

resource "aws_cloudfront_response_headers_policy" "security_headers" {
  name = "security-headers-policy"

  security_headers_config {
    content_security_policy {
      content_security_policy = "default-src 'self'; img-src 'self'; script-src 'self'; style-src 'self'; object-src 'none';"
      override = true
    }

    content_type_options {
      override = true
    }

    frame_options {
      frame_option = "DENY"
      override = true
    }

    referrer_policy {
      referrer_policy = "same-origin"
      override = true
    }

    strict_transport_security {
      access_control_max_age_sec = 31536000
      include_subdomains = true
      preload = true
      override = true
    }

    xss_protection {
      mode_block = true
      protection = true
      override = true
    }
  }
}

Then reference this policy in the CloudFront distribution:

resource "aws_cloudfront_distribution" "website" {
  # ... other configuration ...

  default_cache_behavior {
    # ... other configuration ...
    response_headers_policy_id = aws_cloudfront_response_headers_policy.security_headers.id
  }
}

Lessons Learned 💡

Implementing CI/CD for this project taught me several valuable lessons:

Start Simple, Then Iterate: My first workflow was basic - just syncing files to S3. As I gained confidence, I added testing, multiple environments, and security features.
Security Is Non-Negotiable: Using OIDC for authentication instead of long-lived credentials was a game-changer for security. This approach follows AWS best practices and eliminates credential management headaches.
Test Everything: Automated tests at every level (unit, integration, end-to-end) catch issues early. The time invested in writing tests paid off with more reliable deployments.
Environment Separation: Keeping development and production environments separate allowed me to test changes safely before affecting the live site.
Infrastructure as Code Works: Using Terraform to define all infrastructure components made the CI/CD process much more reliable. Everything is tracked, versioned, and repeatable.

My Integration Challenges and Solutions 🧩

During implementation, I encountered several challenges:

CORS Issues: The API and website needed proper CORS configuration to work together. Adding the correct headers in both Lambda and API Gateway fixed this.
Environment Variables: Managing different configurations for dev and prod was tricky. I solved this by using GitHub environment variables and separate Terraform workspaces.
Cache Invalidation Delays: Changes to the website sometimes weren't visible immediately due to CloudFront caching. Adding proper cache invalidation to the workflow fixed this.
State Locking: When multiple workflow runs executed simultaneously, they occasionally conflicted on Terraform state. Using DynamoDB for state locking resolved this issue.

DevOps Mod: Multi-Stage Pipeline with Pull Request Environments 🚀

To extend this challenge further, I implemented a feature that creates temporary preview environments for pull requests:

  create_preview:
    name: 'Create Preview Environment'
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-role
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.2.0

      - name: Generate Unique Environment Name
        run: |
          PR_NUMBER=${{ github.event.pull_request.number }}
          BRANCH_NAME=$(echo ${{ github.head_ref }} | tr -cd '[:alnum:]' | tr '[:upper:]' '[:lower:]')
          ENV_NAME="pr-${PR_NUMBER}-${BRANCH_NAME}"
          echo "ENV_NAME=${ENV_NAME}" >> $GITHUB_ENV

      - name: Terraform Init
        working-directory: ./terraform
        run: terraform init -backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" -backend-config="key=preview/${{ env.ENV_NAME }}/terraform.tfstate" -backend-config="region=us-east-1"

      - name: Terraform Apply
        working-directory: ./terraform
        run: |
          terraform apply -auto-approve \
            -var="environment=${{ env.ENV_NAME }}" \
            -var="domain_name=pr-${{ github.event.pull_request.number }}.${{ secrets.DOMAIN_NAME }}"

      - name: Comment Preview URL
        uses: actions/github-script@v6
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const output = `## 🚀 Preview Environment Deployed

            Preview URL: https://pr-${{ github.event.pull_request.number }}.${{ secrets.DOMAIN_NAME }}

            API Endpoint: https://api-pr-${{ github.event.pull_request.number }}.${{ secrets.DOMAIN_NAME }}/count

            This environment will be automatically deleted when the PR is closed.`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

And add a cleanup job to delete the preview environment when the PR is closed:

  cleanup_preview:
    name: 'Cleanup Preview Environment'
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request' && github.event.action == 'closed'

    steps:
      # Similar to create_preview but with terraform destroy

Security Mod: Implementing AWS Secrets Manager for API Keys 🔐

To enhance the security of our API, I added API key authentication using AWS Secrets Manager:

# Create a secret to store the API key
resource "aws_secretsmanager_secret" "api_key" {
  name        = "resume-api-key-${var.environment}"
  description = "API key for the Resume API"
}

# Generate a random API key
resource "random_password" "api_key" {
  length  = 32
  special = false
}

# Store the API key in Secrets Manager
resource "aws_secretsmanager_secret_version" "api_key" {
  secret_id     = aws_secretsmanager_secret.api_key.id
  secret_string = random_password.api_key.result
}

# Add API key to API Gateway
resource "aws_api_gateway_api_key" "visitor_counter" {
  name = "visitor-counter-key-${var.environment}"
}

resource "aws_api_gateway_usage_plan" "visitor_counter" {
  name = "visitor-counter-usage-plan-${var.environment}"

  api_stages {
    api_id = aws_api_gateway_rest_api.visitor_counter.id
    stage  = aws_api_gateway_deployment.visitor_counter.stage_name
  }

  quota_settings {
    limit  = 1000
    period = "DAY"
  }

  throttle_settings {
    burst_limit = 10
    rate_limit  = 5
  }
}

resource "aws_api_gateway_usage_plan_key" "visitor_counter" {
  key_id        = aws_api_gateway_api_key.visitor_counter.id
  key_type      = "API_KEY"
  usage_plan_id = aws_api_gateway_usage_plan.visitor_counter.id
}

# Update the Lambda function to verify the API key
resource "aws_lambda_function" "visitor_counter" {
  # ... existing configuration ...

  environment {
    variables = {
      DYNAMODB_TABLE = aws_dynamodb_table.visitor_counter.name
      ALLOWED_ORIGIN = var.website_domain
      API_KEY_SECRET = aws_secretsmanager_secret.api_key.name
    }
  }
}

Then, modify the Lambda function to retrieve and validate the API key:

import boto3
import json
import os

# Initialize Secrets Manager client
secretsmanager = boto3.client('secretsmanager')

def get_api_key():
    """Retrieve the API key from Secrets Manager"""
    secret_name = os.environ['API_KEY_SECRET']
    response = secretsmanager.get_secret_value(SecretId=secret_name)
    return response['SecretString']

def lambda_handler(event, context):
    # Verify API key
    api_key = event.get('headers', {}).get('x-api-key')
    expected_api_key = get_api_key()

    if api_key != expected_api_key:
        return {
            'statusCode': 403,
            'headers': {
                'Content-Type': 'application/json'
            },
            'body': json.dumps({
                'error': 'Forbidden',
                'message': 'Invalid API key'
            })
        }

    # Rest of the function...

Next Steps ⏭️

With our CI/CD pipelines in place, our Cloud Resume Challenge implementation is complete! In the final post, we'll reflect on the project as a whole, discuss lessons learned, and explore potential future enhancements.

Up Next: [Cloud Resume Challenge with Terraform: Final Thoughts & Lessons Learned] 🔗

Share on Share on

March 17, 2025
in AWS, Terraform, DevOps, Cloud Resume Challenge
5 min read

Cloud Resume Challenge with Terraform: Introduction & Setup 🚀

Introduction 🌍

The Cloud Resume Challenge is a hands-on project designed to build a real-world cloud application while showcasing your skills in AWS, serverless architecture, and automation. Many implementations of this challenge use AWS SAM or manual setup via the AWS console, but in this series, I will demonstrate how to build the entire infrastructure using Terraform. 💡

My Journey to Terraform 🧰

When I first discovered the Cloud Resume Challenge, I was immediately intrigued by the hands-on approach to learning cloud technologies. Having some experience with traditional IT but wanting to transition to a more cloud-focused role, I saw this challenge as the perfect opportunity to showcase my skills.

I chose Terraform over AWS SAM or CloudFormation because:

Multi-cloud flexibility - While this challenge focuses on AWS, Terraform skills transfer to Azure, GCP, and other providers
Declarative approach - I find the HCL syntax more intuitive than YAML for defining infrastructure
Industry adoption - In my research, I found that Terraform was highly sought after in job postings
Strong community - The extensive module registry and community support made learning easier

This series reflects my personal journey through the challenge, including the obstacles I overcame and the lessons I learned along the way.

Why Terraform? 🛠️

Terraform allows for Infrastructure as Code (IaC), which:

Automates resource provisioning 🤖
Ensures consistency across environments ✅
Improves security by managing configurations centrally 🔒
Enables version control for infrastructure changes 📝

This series assumes basic knowledge of Terraform and will focus on highlighting key Terraform code snippets rather than full configuration files.

Project Overview 🏗️

Let's visualize the architecture we'll be building throughout this series:

Basic Project Diagram

AWS Services Used ☁️

The project consists of the following AWS components:

Frontend: Static website hosted on S3 and delivered via CloudFront.
Backend API: API Gateway, Lambda, and DynamoDB to track visitor counts.
Security: IAM roles, API Gateway security, and AWS Certificate Manager (ACM) for HTTPS 🔐.
Automation: CI/CD with GitHub Actions to deploy infrastructure and update website content ⚡.

Terraform Module Breakdown 🧩

To keep the infrastructure modular and maintainable, we will define Terraform modules for each major component:

S3 Module 📂: Manages the static website hosting.
CloudFront Module 🌍: Ensures fast delivery and HTTPS encryption.
Route 53 Module 📡: Handles DNS configuration.
DynamoDB Module 📊: Stores visitor count data.
Lambda Module 🏗️: Defines the backend API logic.
API Gateway Module 🔗: Exposes the Lambda function via a REST API.
ACM Module 🔒: Provides SSL/TLS certificates for secure communication.

Setting Up Terraform ⚙️

Before deploying any resources, we need to set up Terraform and backend state management to store infrastructure changes securely.

1. Install Terraform & AWS CLI 🖥️

Ensure you have the necessary tools installed:

# Install Terraform
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

# Install AWS CLI
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

2. Configure AWS Credentials Securely 🔑

Terraform interacts with AWS via credentials. Setting these up securely is crucial to avoid exposing sensitive information.

Setting up AWS Account Structure

Following cloud security best practices, I recommend creating a proper AWS account structure:

Create a management AWS account for your organization
Enable Multi-Factor Authentication (MFA) on the root account
Create separate AWS accounts for development and production environments
Set up AWS IAM Identity Center (formerly SSO) for secure access

If you're just getting started, you can begin with a simpler setup:

# Configure AWS CLI with a dedicated IAM user (not root account)
aws configure

# Test your configuration
aws sts get-caller-identity

Set up IAM permissions for Terraform by ensuring your IAM user has the necessary policies for provisioning resources. Start with a least privilege approach and add permissions as needed.

3. Set Up Remote Backend for Terraform State 🏢

Using a remote backend (such as an S3 bucket) prevents local state loss and enables collaboration.

Project Directory Structure

Here's how I've organized my Terraform project:

cloud-resume-challenge/
├── modules/
│   ├── frontend/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── backend/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── networking/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
├── environments/
│   ├── dev/
│   │   └── main.tf
│   └── prod/
│       └── main.tf
├── terraform.tf (backend config)
├── variables.tf
├── outputs.tf
└── main.tf

Define the backend in `terraform.tf`

terraform {
  backend "s3" {
    bucket         = "my-terraform-state-bucket"
    key            = "cloud-resume/state.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-lock"
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.16"
    }
  }

  required_version = ">= 1.2.0"
}

Create S3 Bucket and DynamoDB Table for Backend

Before you can use an S3 backend, you need to create the bucket and DynamoDB table. I prefer to do this via Terraform as well, using a separate configuration:

# backend-setup/main.tf
provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "terraform_state" {
  bucket = "my-terraform-state-bucket"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Run these commands to set up your backend:

cd backend-setup
terraform init
terraform apply
cd ..
terraform init  # Initialize with the S3 backend

A Note on Security 🔒

Throughout this series, I'll be emphasizing security best practices. Some key principles to keep in mind:

Never commit AWS credentials to your repository
Use IAM roles with least privilege for all resources
Enable encryption for sensitive data
Implement proper security groups and network ACLs
Regularly rotate credentials and keys

These principles will be applied to our infrastructure as we build it in the upcoming posts.

Lessons Learned 💡

In my initial attempts at setting up the Terraform environment, I encountered several challenges:

State file management: I initially stored state locally, which caused problems when working from different computers. Switching to S3 backend solved this issue.
Module organization: I tried several directory structures before settling on the current one. Organizing by component type rather than AWS service made the most sense for this project.
Version constraints: Not specifying version constraints for providers led to unexpected behavior when Terraform updated. Always specify your provider versions!

Next Steps ⏭️

In the next post, we'll build the static website infrastructure with S3, CloudFront, Route 53, and ACM. We'll create Terraform modules for each component and deploy them together to host our resume.

Developer Mod: Advanced Terraform Techniques 🚀

If you're familiar with Terraform and want to take this challenge further, consider implementing these enhancements:

Terraform Cloud Integration: Connect your repository to Terraform Cloud for enhanced collaboration and run history.
Terratest: Add infrastructure tests using the Terratest framework to validate your configurations.
Custom Terraform Modules: Create reusable modules and publish them to the Terraform Registry.
Terraform Workspaces: Use workspaces to manage multiple environments (dev, staging, prod) within the same Terraform configuration.

Up Next: [Cloud Resume Challenge with Terraform: Deploying the Static Website] 🔗

Share on Share on

March 9, 2025
in DevOps, Microsoft Certifications, AI & Study Techniques
16 min read

How I Used ChatGPT to Create AZ-400 Exam Prep Notes from MSLearn

🚀 TL;DR - Results First

Using the method detailed in this post, I successfully passed the AZ-400 exam while creating a reusable study system. This approach helped me transform 34+ hours of MSLearn content into structured, searchable revision notes that I could quickly reference during my exam preparation.

Let me walk you through how I developed this system and how you can apply it to your own certification journey.

The Challenge

Studying for Microsoft certification exams like AZ-400 can be overwhelming due to the vast amount of content available. Microsoft Learn alone provides over 34 hours of recommended reading, making it difficult to retain everything effectively.

To tackle this challenge, I developed a structured method using MSLearn, third-party exam questions, and ChatGPT to create a comprehensive revision guide. This method helped me organize knowledge into concise notes, cheat sheets, glossaries, and knowledge checks, ultimately leading to a successful exam pass!

This guide documents my step-by-step process so that you can replicate or adapt it for your own Microsoft exam preparation.

🏆 Goals of This Study Approach

My main objectives were:

📌 Summarize each MSLearn module into easily digestible revision notes.
📌 Create a structured, searchable reference for later review.
📌 Store my notes in GitHub using Markdown for easy access.
📌 Use AI (ChatGPT) to extract and summarize key information.
📌 Supplement with third-party practice exams to test my knowledge.

This method is not a quick win but provides an efficient, structured, and reusable way to prepare for any MSLearn-based exam.

🔹 MSLearn Collections Used

To ensure comprehensive coverage of the exam syllabus, I structured my studies around the official Microsoft Learn learning paths. Each path covers a key topic required for AZ-400 certification, including DevOps principles, CI/CD, infrastructure as code, and security best practices. I systematically worked through these collections, summarizing important concepts, capturing key insights, and using ChatGPT to refine the content into structured notes.

Below are the learning paths I followed, each linking directly to its respective Microsoft Learn module:

AZ-400 Development for Enterprise DevOps – Covers Git workflows, branch strategies, and repository management.
AZ-400 Implement CI with Azure Pipelines and GitHub Actions – Focuses on automating build and release processes with Azure Pipelines and GitHub Actions.
AZ-400 Design and Implement a Release Strategy – Covers feature flagging, blue-green deployments, and release management.
AZ-400 Implement Secure Continuous Deployment – Discusses security measures for CI/CD pipelines.
AZ-400 Manage Infrastructure as Code Using Azure – Focuses on managing cloud resources with Terraform, Bicep, and ARM templates.
AZ-400 Design and Implement a Dependency Management Strategy – Covers artifact management and dependency security.
AZ-400 Implement Continuous Feedback – Discusses monitoring and feedback loops for DevOps processes.
AZ-400 Implement Security and Validate Code Bases for Compliance – Focuses on DevSecOps principles and compliance validation.

These resources formed the foundation of my study plan, ensuring alignment with the official exam objectives. I used these collections as the basis for my revision notes, AI-generated summaries, and knowledge checks.

📊 Process Overview

Before diving into the detailed steps, here's an overview of the complete workflow:

MSLearn Content → Link Collection → ChatGPT Summarization → GitHub Storage → Practice Testing → Final Review

Estimated time investment per module:

Manual link collection: ~15 minutes
AI summarization and refinement: ~30-60 minutes
Review and validation: ~30 minutes
Total per module: ~1-1.75 hours (compared to 3-4 hours of traditional study)

These estimates are based on my experience after processing several modules. As you'll see in the learning curve section below, your first few modules might take longer as you refine your workflow.

Let's dive into each step of the process in detail.

📌 Step 1: Collecting and Organizing MSLearn Content

MSLearn provides structured learning paths, but I needed a way to track important links and content. Here's how I manually compiled everything:

🔹 Manual Link Collection Process

Initial Setup: I created a dedicated folder structure on my computer with sub-folders for each learning path, mirroring the eventual GitHub repository structure.
After each lesson: Captured all relevant hyperlinks and stored them in a .txt file within the appropriate folder. This was as simple as copy-pasting links while reading.
At the end of each module: Consolidated all links into the text file and organized them by topic.
Mapped content to official exam objectives: Fed the exam study guide into ChatGPT to check alignment, ensuring I wasn't missing critical areas.

📸

Before & After Example

Raw MSLearn Content:

Unit Title: Implement branch policies in Azure Repos
- Configure branch policies
- Implement pull request approval processes
- Manage branch policy bypasses
- Configure auto-complete pull requests
- Configure branch policy permissions
- Configure build validation

Transformed into Structured Notes:

## Branch Policies in Azure Repos

Branch policies help teams protect important branches by:
- Requiring code reviews before merging
- Setting minimum number of reviewers (typically 2+)
- Enforcing build validation to prevent broken code
- Restricting direct pushes to protected branches

### Key Configuration Options:
| Policy | Purpose | Real-world Usage |
|--------|---------|------------------|
| Minimum reviewers | Ensures code quality | Set to 2+ for production code |
| Build validation | Prevents broken builds | Configure with main CI pipeline |
| Comment resolution | Tracks issue fixes | Require resolution before merge |

Lesson Learned: Consistent link collection during the learning process is much more efficient than trying to gather everything after completing a module. I developed a habit of copying links as I encountered them, which saved significant time later.

💡 Future Improvement: Automating this process via a script could save time. A PowerShell or Python script could potentially scrape MSLearn modules for relevant links.

📌 Step 2: Using AI (ChatGPT) to Summarize Content

To turn raw MSLearn material into usable study notes, I fed collected links into ChatGPT and asked it to scrape and summarize key points.

I used ChatGPT 4 for this process, as it provided better context handling and more accurate summaries than earlier versions.

The summarization workflow consisted of the following steps:

1️⃣ Collected MSLearn Links – Compiled all module-related links into a text file. 2️⃣ Fed the Links into ChatGPT – Asked ChatGPT to analyze and summarize key information. 3️⃣ Refined the Output Iteratively – Adjusted prompts to enhance clarity and completeness.

🔹 Crafting Effective ChatGPT Prompts

Well-structured prompts were essential for generating clear and accurate summaries. Below is an example of my initial prompt:

prompt - ChatGPT first iteration

Please create a .md file in the same format as the previous ones and include the following:

Summarize key information within each unit, including diagrams, tables, and exercises and labs.
List steps performed and order of steps/workflow, where applicable.
Use tables primarily for comparing differences between items.
Include:
Key exam points.
Points to remember.
Prerequisite information.
Include any service limits - maximum minutes per month for a particular tier, difference services available in varying tiers/services/SKUs for example
Permissions required for activities.
Provide real-world applications, troubleshooting scenarios, and advanced tips.
Highlight common pitfalls or mistakes to avoid.
Review the canvas and add/remove any relevant information.
Use the web to search for supplementary material where necessary, and summarize this information within the notes.
Avoid external links—include all relevant information directly in the notes.
Ensure all "Learning objectives" in Unit 1 are met by the material included in the output .md file(s)
Ensure no content is included that doesn't have a real-world example or exam related reference included
Review the output you have created at the end, and make any further improvements automatically be manually revising the file or implementing your comments.

Here is a list of the links contained in this module.
Using the parameters outlined above create a comprehensive exam cram resource cheat sheet, that can be used for my AZ-400 exam prep.
The resulting output needs contain material relevant to the AZ-400 study guide:

https://learn.microsoft.com/en-us/credentials/certifications/resources/study-guides/az-400

Let me know when you are ready for the module links?

While this prompt worked initially, I found it had several limitations:

It was too lengthy and complex for ChatGPT to consistently follow all instructions
It lacked specific formatting guidance
It didn't emphasize accuracy and relevance enough

🔹 Prompt Evolution

I refined the prompt through several iterations to address these issues. The key improvements included:

Adding a clear word count limit (750 words per file) to prevent overly lengthy outputs
Explicitly prohibiting generic statements like "understand X" and "know how to..."
Requiring concrete examples for each concept rather than theoretical explanations
Creating a more structured format with clear section headings and organization
Adding a final validation step to ensure content aligned with learning objectives

Initial Results vs. Refined Results

First Iteration Output (Excerpt):

## Key Exam Points
- Understand the different branch policies in Azure Repos
- Know how to configure pull request approvals
- Understand branch policy permissions

Problems:

Too generic with "understand" and "know how" statements
Lacks specific examples and actionable information
No clear formatting structure

Refined Output (After Improved Prompt):

## Branch Policies in Azure DevOps

### Key Exam Points
- Branch policies in Azure Repos protect branches by enforcing code review and build validation
- Required reviewers policy must be configured with minimum count (2+ recommended for production)
- Build validation policy links CI pipeline to PR process, ensuring code builds successfully
- Policy bypasses can be granted to specific users or groups (Project Administrators have bypass by default)
- Branch policies are set at repository level under Branches → [...] → Branch Policies

### Common Scenarios
When setting up branch policies for a large team:
1. Configure minimum 2 reviewers for main branch
2. Enable "Comment resolution" to require addressing feedback
3. Link build validation to prevent broken builds
4. Set reset votes when new changes are pushed

🔹 Challenges & Solutions

Using AI to generate structured content wasn't always seamless. Here are some key challenges and how I addressed them:

Challenge	Solution	Example
ChatGPT lost context in long sessions	Processed module-by-module instead of bulk inputs	Split "Azure Pipelines" module into 3 separate prompts
Overwrote useful content in iterations	Manually saved outputs before requesting refinements	Created checkpoint files labeled v1, v2, etc.
Large data inputs led to incomplete summaries	Used multiple iterations, focusing on key areas of each module	First pass: core concepts; Second pass: examples and scenarios
Hallucinations on technical details	Cross-validated against official documentation	Corrected service limits and permission details
Generic "understand X" statements	Explicitly requested specific actionable information	Replaced "Understand CI/CD" with actual pipeline YAML examples

Breaking down content into smaller chunks and applying manual validation helped ensure better results.

Learning Curve: My first module took nearly 2 hours to process completely, as I was still figuring out the optimal prompt structure and workflow. By my fifth module, I had reduced this to about 45 minutes through improved prompting and a more streamlined approach.

🔹 Refining the Output

To improve content accuracy, I introduced an additional review prompt:

ChatGPT prompt - second iteration

Objective:
Create a .md file that acts as a comprehensive AZ-400 exam cram resource cheat sheet.

Instructions:
Act as my Azure DevOps training expert with a focus on preparing me for the AZ-400 exam.

The output must adhere to the structure and content requirements outlined below:

Content Requirements:
Each file should contain no more than 750 words (excluding text that make up hyperlinks)

Summarize Key Information:
Include summaries for each unit, diagram, table, exercise, and lab where applicable.
Use clear and concise explanations.

List Steps/Workflows:
Summarize steps performed in labs/exercises and the order of steps/workflows where applicable.

Use Tables:
Create tables primarily for comparing differences between items (examples, but not limited to - features, tiers, SKUs etc ).

Key Exam Points: Highlight crucial information likely to appear in the exam and provide actual examples. 
Do not use generic phrases like "Understand...." and  "Know how to....". 
I need you to provide the information I need to know for each exam tip.

Points to Remember: Provide concise, high-priority notes for studying.

Prerequisite Information: Mention anything needed to understand or implement concepts.

Service Limits: Include tier limitations (e.g., maximum minutes per month), service availability by SKU, etc.

Permissions Required: Specify roles/permissions necessary for activities.

Practical Applications:
Provide real-world applications, troubleshooting scenarios, and advanced tips.
Highlight common pitfalls or mistakes to avoid.

Relevance:
Ensure the output aligns with the Microsoft AZ-400 study guide 
(https://learn.microsoft.com/en-us/credentials/certifications/resources/study-guides/az-400)
Exclude any content that lacks real-world examples or exam-related references.

Final Review:
Evaluate the output to ensure all "Learning Objectives" in Unit 1 are met.
Automatically revise the file manually if needed to enhance clarity and completeness.


Prompt me for a list of URL's or an existing .md file when you have understood the instructions.

Depending on the results, I would often break the prompt up further, and just use a specific part. For example, once I was happy with the results of a certain output I would re-enter the "Final Review: Evaluate the output to ensure all "Learning Objectives" in Unit 1 are met." Automatically revise the file manually if needed to enhance clarity and completeness." prompt, once or maybe several times until was happy with the finished output.

A typical module would go through 2-3 iterations:

Initial generation - Creates the basic structure and content
Content enhancement - Adds real-world examples and specifics
Final validation - Checks against learning objectives and improves clarity

For complex topics like Azure Pipelines, I might need 4-5 iterations to fully refine the content.

Workflow Integration

I integrated this process into my daily study routine by:

Reading a module in the morning
Collecting links as I went
Processing with ChatGPT during lunch break or after work
Reviewing and committing to GitHub in the evening

This approach allowed me to maintain consistent progress without feeling overwhelmed.

Time-Saving Tip

A full module processing cycle typically took about 30-45 minutes, compared to 2-3 hours of traditional study and note-taking. The time investment was front-loaded, but paid dividends during revision.

📌 Step 3: Integrating Third-Party Exam Resources

While MSLearn is great for structured content, real-world practice questions are crucial for exam success. I incorporated:

✅ MSLearn Official Practice Questions
✅ Third-Party Providers: Tutorials Dojo, MeasureUp
✅ Exam Explanation Links (to validate answers)

🔹 How I Validated Third-Party Questions

Checked explanations for links to MSLearn documentation.
Manually cross-referenced answers with official study material.
Used my GitHub repo's search function to quickly verify concepts.

🔹 Identifying Valuable Practice Questions

Not all practice questions are created equal. I prioritized questions that:

Included detailed explanations with documentation links
Covered scenarios rather than simple definition recall
Tested practical knowledge rather than obscure facts
Referenced multiple concepts in a single question

📸

🔹 Supplementary Resources Worth The Investment

Based on my experience, these additional resources provided the best value:

Tutorials Dojo Practice Exams - Excellent explanations and documentation links
MeasureUp Official Practice Tests - Most similar to actual exam format
WhizLabs Labs - Hands-on practice for key scenarios

The combination of AI-summarized MSLearn content and targeted practice questions created a comprehensive exam preparation strategy.

Real-World Application Example: During a practice exam, I encountered a question about configuring branch policies with required reviewers. Using my GitHub repository's search function, I quickly found the related notes I had created, which included the exact setting location and recommended configuration values. This allowed me to answer correctly and understand the underlying concept, rather than just memorizing an answer.

📌 Step 4: Storing Notes in GitHub for Easy Reference

One key advantage of this method was storing all notes in a GitHub repository, allowing easy searchability.

🔹 Initial GitHub Repository Setup

Created a new GitHub repository specifically for my AZ-400 exam notes
Established a folder structure that mirrored the MSLearn learning paths
Set up a README with quick navigation links to major sections
Created a consistent file naming convention (numbered by sequence in the learning path)

🔹 How I Structured My Notes

(VS Code folder structure represented in Markdown)

📂 AZ-400 - MS LEARN - EXAM NOTES/
 ├── 📁 .github/
 │
 ├── 📁 1. AZ-400 Development for Enterprise DevOps/
 │    ├── 1. Introduction to DevOps.md
 │    ├── 2. Plan Agile with GitHub Projects and Azure Boards.md
 │    ├── 3. Design and implement branch strategies and workflows.md
 │    ├── 4. Collaborate with pull requests in Azure Repos.md
 │    ├── 5. Explore Git hooks.md
 │    ├── 6. Plan foster inner source.md
 │    ├── 7. Manage and configure repositories.md
 │    ├── 8. Identify technical debt.md
 │
 ├── 📁 2. AZ-400 Implement CI with Azure Pipelines and GitHub Actions/
 │    ├── 1. Explore Azure Pipelines.md
 │    ├── 2. Manage Azure Pipeline agents and pools.md
 │    ├── 3. Describe pipelines and concurrency.md
 │    ├── 4. Design and implement a pipeline strategy.md
 │    ├── 5. Integrate with Azure Pipelines.md
 │    ├── 6. Introduction to GitHub Actions.md
 │    ├── 7. Learn continuous integration with GitHub Actions.md
 │    ├── 8. Design Container Build Strategy.md
 ├── 📁 3. AZ-400 Design and Implement a Release Strategy/
 ├── ...
 ├── 📁 4. AZ-400 Implement a Secure Continuous Deployment using Azure Pipelines/
 ├── ...
 ├── 📁 5. AZ-400 Manage Infrastructure as Code using Azure and DSC/
 ├── ...
 ├── 📁 6. AZ-400 Design and Implement a Dependency Management Strategy/
 ├── ...
 ├── 📁 7. AZ-400 Implement Continuous Feedback
 ├── ...
 ├── 📁 8. AZ-400 Implement Security and Validate Code Bases for Compliance/
 ├── ...
 ├── 📁 AZ-400 Tutorials Dojo - Exam Notes/
 │    ├── Azure DevOps - Terms and Definitions - Pt 1.md
 │    ├── Azure DevOps - Terms and Definitions - Pt 2.md
 │    ├── Azure DevOps - Terms and Definitions - Pt 3.md

📸

🔹 Workflow for Adding New Notes

Create new markdown file in the appropriate folder
Paste ChatGPT-generated content and review for accuracy
Commit changes with descriptive commit messages (e.g., "Add Azure Pipelines agents notes")
Push to GitHub to make available for searching during study sessions

🔍 Searching Notes Efficiently

Used GitHub's search function to quickly find terms (e.g., MendBolt).
This allowed me to cross-check answers during study sessions.

Example Search Workflow

Practice question mentions "Which Azure DevOps deployment strategy minimizes downtime during releases?"
One of the answers mnetions "Blue/Green" deployment
Search repository for "Blue/Green"
Results show multiple matching files.
Quickly identify that "Blue/Green deployment" is the correct answer based on my notes.
Verify with documentation reference that Blue/Green deployments maintain two identical production environments, allowing for instant switching between versions.

📸

🔹 Real-World Performance Impact

During practice exams, I could typically locate key information in under 30 seconds using this method, compared to several minutes when using traditional notes or searching documentation directly.

📌 Results & Key Takeaways

🎯 Outcome:

✅ Passed the AZ-400 exam! 🎉
✅ Created a structured, reusable study guide.
✅ Used AI efficiently to save time and condense information.
✅ Reduced overall study time by approximately 40% compared to traditional methods.

🔹 Time Investment Analysis

Activity	Traditional Approach	AI-Enhanced Approach	Time Savings
Reading MSLearn	34+ hours	8-10 hours	~70%
Note-taking	10-15 hours	5-7 hours	~50%
Organization	3-5 hours	2-3 hours	~40%
Practice & Review	15-20 hours	15-20 hours	0%
Total	62-74 hours	30-40 hours	~45%

These figures are based on my own experience and tracking of study time. Your results may vary depending on your familiarity with the subject matter and the tools involved. The key insight is that the most significant time savings came from condensing the initial reading phase while maintaining or even improving knowledge retention through structured notes.

🔹 What I'd Do Differently Next Time

🔹 Further break down the ChatGPT input process into smaller steps
🔹 Explore alternative AI tools like Claude or Bard to compare summary quality
🔹 Consider automating link collection from MSLearn using a simple web scraper
🔹 Create a standardized template for each module type from the beginning
🔹 Add more visual elements like diagrams to represent complex relationships

✅ Yes—but only if you're prepared for an iterative, hands-on study process.

The greatest benefits were:

Structured organization of complex information
Quick searchability during practice tests
Forced engagement with the material (rather than passive reading)
Creation of a reusable resource for future reference

🚀 How to Get Started Yourself

Initial Setup Steps

Before diving into the content summarization, take these important setup steps:

Create your repository structure:
Set up a GitHub repository with appropriate folder structure
Create a README.md with navigation links
Establish a consistent file naming convention
Set up a local workflow:
Create a folder structure on your computer mirroring your GitHub repo
Establish a system for collecting and storing links as you study
Set up a template for your ChatGPT prompts
Gather your resources:
Bookmark all relevant MSLearn collections
Organize them according to the exam objectives
Create a schedule for working through them systematically

Quick Start Template

Here's a simplified prompt template to get you started:

I'm studying for the [EXAM CODE] certification. Please help me create concise, exam-focused notes for the following module: [MODULE NAME]

For each key concept, please:
1. Explain it in 1-2 sentences
2. Provide a real-world example or scenario
3. Note any configuration options or limitations
4. Mention if it's likely to appear on the exam

Please format your response in Markdown with clear headings and avoid generic "understand X" statements.

Here are the links to the module content:
[PASTE LINKS HERE]

Complete Process

1️⃣ Set up a GitHub repo for your notes.
2️⃣ Manually collect MSLearn hyperlinks as you study.
3️⃣ Use ChatGPT to summarize module-by-module.
4️⃣ Validate third-party questions with official docs.
5️⃣ Store and search your notes in GitHub for quick reference.

❓ Frequently Asked Questions

Q: Is this approach considered cheating? A: No. This method enhances learning by actively engaging with the material rather than replacing study. You're creating custom notes by directing AI to extract and organize information you need to know.

Q: How much technical knowledge do I need to implement this approach? A: Basic GitHub knowledge and familiarity with markdown formatting are helpful but not essential. The core process can be adapted to use any note-taking system.

Q: Does this work for all Microsoft certification exams? A: Yes, this approach works well for any exam with structured MSLearn paths.

Q: How do you handle inaccurate information from AI? A: Always verify key technical details against official documentation. When in doubt, trust Microsoft's documentation over AI-generated content.

Q: How long did it take you to become proficient with this workflow? A: After completing about 3-4 modules, I had established an efficient workflow. The learning curve is relatively quick if you're already familiar with GitHub and ChatGPT.

💬 Final Thoughts

This method made my exam prep structured and efficient, though it required significant manual effort. If you're preparing for a Microsoft certification, consider trying this approach!

The combination of AI-powered summarization, structured GitHub storage, and focused practice testing created a powerful study system that both saved time and improved retention.

The most valuable aspect wasn't just passing the exam, but creating a reusable knowledge base that continues to serve as a reference in my professional work. While traditional study methods might help you pass an exam, this approach helps build a lasting resource.

💡 Have you used AI tools for exam prep? Share your thoughts in the comments!

🔗 Additional Resources

Share on Share on