All posts
AIClaude CodeTerraformAWSDevOps

I Let AI Write My Terraform

What Happened When Claude Code Met Production Infrastructure

Luke Halley

Luke Halley

Cloud Developer

January 8, 2026
7 min read

"AI can't write production infrastructure code."

That's what I told myself for months. I'd seen the demos—impressive but cherry-picked. Real infrastructure is messy. Edge cases everywhere. Security implications in every line.

Then I actually tried Claude Code on a Terraform module I needed to build. Not a toy example. A production ECS Fargate service with ALB, autoscaling, secrets management, and proper IAM.

Here's what happened.

The Task

I needed a reusable Terraform module for deploying containerized services to ECS Fargate. Requirements:

  • Application Load Balancer with HTTPS
  • ECS service with task definitions
  • Autoscaling based on CPU/memory
  • Secrets from AWS Secrets Manager
  • Proper IAM roles (task role vs execution role)
  • CloudWatch logs and alarms
  • VPC integration with security groups
  • Normally, this takes me 2-3 hours to write properly. Longer if I'm being careful about security.

    The Experiment

    I opened Claude Code in my terminal and described what I needed:

    code
    Create a Terraform module for ECS Fargate services. It should create an ALB, ECS service, task definition, autoscaling, and IAM roles. I need to pass secrets from Secrets Manager to containers. Use security groups that only allow necessary traffic.

    What followed was a 45-minute session of iteration. Here's what I learned.

    What Claude Code Got Right

    1. Module Structure

    The generated module structure was textbook:

    code
    modules/ecs-service/ ├── main.tf ├── variables.tf ├── outputs.tf ├── iam.tf ├── alb.tf ├── ecs.tf ├── autoscaling.tf └── security-groups.tf

    Clean separation. Logical file names. This matched how I'd structure it myself.

    2. IAM Role Separation

    This is where many tutorials get it wrong. Claude correctly separated:

    hcl
    # Execution role - for ECS agent to pull images, write logs resource "aws_iam_role" "execution" { name = "${var.name}-execution" assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [{ Action = "sts:AssumeRole" Effect = "Allow" Principal = { Service = "ecs-tasks.amazonaws.com" } }] }) } # Task role - for the application to access AWS services resource "aws_iam_role" "task" { name = "${var.name}-task" assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [{ Action = "sts:AssumeRole" Effect = "Allow" Principal = { Service = "ecs-tasks.amazonaws.com" } }] }) }

    It even added the correct managed policy for execution role (AmazonECSTaskExecutionRolePolicy) and scoped Secrets Manager access to specific secret ARNs.

    3. Security Group Logic

    The security groups followed least-privilege:

    hcl
    # ALB security group - allows inbound HTTPS resource "aws_security_group" "alb" { name_prefix = "${var.name}-alb-" vpc_id = var.vpc_id ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = var.allowed_cidr_blocks } egress { from_port = var.container_port to_port = var.container_port protocol = "tcp" security_groups = [aws_security_group.ecs.id] } } # ECS security group - only allows traffic from ALB resource "aws_security_group" "ecs" { name_prefix = "${var.name}-ecs-" vpc_id = var.vpc_id ingress { from_port = var.container_port to_port = var.container_port protocol = "tcp" security_groups = [aws_security_group.alb.id] } }

    ALB can reach ECS. ECS can't be reached directly. Exactly right.

    What Needed Fixing

    1. Hardcoded Values

    First draft had hardcoded values that should be variables:

    hcl
    # Bad - hardcoded cpu = 256 memory = 512 # Fixed - variable with sensible defaults cpu = var.cpu memory = var.memory

    Easy fix once spotted. I just asked "make cpu and memory configurable" and it updated correctly.

    2. Missing Health Check Configuration

    The ALB target group health check used defaults. For containers, you often need custom paths and intervals:

    hcl
    health_check { enabled = true healthy_threshold = 2 unhealthy_threshold = 3 timeout = 5 interval = 30 path = var.health_check_path # Added matcher = var.health_check_matcher # Added }

    3. Log Retention

    CloudWatch log group was created without retention:

    hcl
    # Original - logs forever (expensive) resource "aws_cloudwatch_log_group" "this" { name = "/ecs/${var.name}" } # Fixed - 30 day retention resource "aws_cloudwatch_log_group" "this" { name = "/ecs/${var.name}" retention_in_days = var.log_retention_days }

    This would've cost money in production. Caught it in review.

    4. The Secrets Gotcha

    Here's where it got interesting. The initial secrets implementation used:

    hcl
    secrets = [ for secret in var.secrets : { name = secret.name valueFrom = secret.arn } ]

    This works, but assumes full secret ARN. In practice, you often want to reference specific JSON keys within a secret:

    hcl
    secrets = [ for secret in var.secrets : { name = secret.name valueFrom = "${secret.arn}:${secret.json_key}::" } ]

    The :: suffix is required for JSON key extraction. Claude didn't know this initially. After I explained the pattern, it updated correctly and even added a comment explaining the syntax.

    The Bigger Picture

    What AI Does Well

  • Boilerplate generation: Resource blocks, variable definitions, outputs
  • Best practice patterns: Module structure, naming conventions
  • Documentation: Comments explaining why, not just what
  • Iteration speed: Changes in seconds, not minutes
  • Where Humans Still Win

  • Edge cases: Production has infinite edge cases
  • Security nuances: That :: suffix matters
  • Organizational context: Your naming conventions, your CIDR ranges, your tagging strategy
  • Integration testing: AI can write code but can't terraform apply
  • My New Workflow

    I don't write Terraform from scratch anymore. My workflow:

  • Describe the module to Claude Code in plain English
  • Review generated code for security and correctness
  • Iterate on specific fixes ("add log retention", "make this a variable")
  • Validate with terraform validate and tflint
  • Test with terraform plan against real AWS
  • Apply and monitor
  • Steps 1-3 take 30 minutes instead of 2 hours. Steps 4-6 are unchanged.

    The Trust Question

    Should you trust AI-generated infrastructure code?

    No. And yes.

    No: Don't blindly apply anything AI generates. Review every line. Run security scanners. Test in non-production first.

    Yes: Trust it as a starting point. Trust it to handle boilerplate. Trust it to remember syntax you've forgotten.

    The right mental model: AI is a junior engineer who's read every Terraform tutorial but never managed production. Fast, knowledgeable, needs supervision.

    Try It Yourself

    If you're skeptical (I was), try this experiment:

  • Pick a module you've already written
  • Describe it to Claude Code without showing your code
  • Compare the output to your implementation
  • You'll find:

  • 80% is nearly identical
  • 15% is different but valid
  • 5% needs fixing
  • That 80% is the time savings. That 5% is why review still matters.


    I went from "AI can't write infrastructure" to "AI saves me hours per week." The key was treating it as a collaborator, not a replacement.