AIClaude CodeTerraformAWSDevOps

I Let AI Write My Terraform

What Happened When Claude Code Met Production Infrastructure

Luke Halley

Cloud Developer

January 8, 2026

7 min read

"AI can't write production infrastructure code."

That's what I told myself for months. I'd seen the demos—impressive but cherry-picked. Real infrastructure is messy. Edge cases everywhere. Security implications in every line.

Then I actually tried Claude Code on a Terraform module I needed to build. Not a toy example. A production ECS Fargate service with ALB, autoscaling, secrets management, and proper IAM.

Here's what happened.

The Task

I needed a reusable Terraform module for deploying containerized services to ECS Fargate. Requirements:

Application Load Balancer with HTTPS

ECS service with task definitions

Autoscaling based on CPU/memory

Secrets from AWS Secrets Manager

Proper IAM roles (task role vs execution role)

CloudWatch logs and alarms

VPC integration with security groups

Normally, this takes me 2-3 hours to write properly. Longer if I'm being careful about security.

The Experiment

I opened Claude Code in my terminal and described what I needed:

code
Create a Terraform module for ECS Fargate services.
It should create an ALB, ECS service, task definition,
autoscaling, and IAM roles. I need to pass secrets
from Secrets Manager to containers. Use security
groups that only allow necessary traffic.

What followed was a 45-minute session of iteration. Here's what I learned.

What Claude Code Got Right

1. Module Structure

The generated module structure was textbook:

code
modules/ecs-service/
├── main.tf
├── variables.tf
├── outputs.tf
├── iam.tf
├── alb.tf
├── ecs.tf
├── autoscaling.tf
└── security-groups.tf

Clean separation. Logical file names. This matched how I'd structure it myself.

2. IAM Role Separation

This is where many tutorials get it wrong. Claude correctly separated:

hcl
# Execution role - for ECS agent to pull images, write logs
resource "aws_iam_role" "execution" {
  name = "${var.name}-execution"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}

# Task role - for the application to access AWS services
resource "aws_iam_role" "task" {
  name = "${var.name}-task"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}

It even added the correct managed policy for execution role (AmazonECSTaskExecutionRolePolicy) and scoped Secrets Manager access to specific secret ARNs.

3. Security Group Logic

The security groups followed least-privilege:

hcl
# ALB security group - allows inbound HTTPS
resource "aws_security_group" "alb" {
  name_prefix = "${var.name}-alb-"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = var.allowed_cidr_blocks
  }

  egress {
    from_port       = var.container_port
    to_port         = var.container_port
    protocol        = "tcp"
    security_groups = [aws_security_group.ecs.id]
  }
}

# ECS security group - only allows traffic from ALB
resource "aws_security_group" "ecs" {
  name_prefix = "${var.name}-ecs-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = var.container_port
    to_port         = var.container_port
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }
}

ALB can reach ECS. ECS can't be reached directly. Exactly right.

What Needed Fixing

1. Hardcoded Values

First draft had hardcoded values that should be variables:

hcl
# Bad - hardcoded
cpu    = 256
memory = 512

# Fixed - variable with sensible defaults
cpu    = var.cpu
memory = var.memory

Easy fix once spotted. I just asked "make cpu and memory configurable" and it updated correctly.

2. Missing Health Check Configuration

The ALB target group health check used defaults. For containers, you often need custom paths and intervals:

hcl
health_check {
  enabled             = true
  healthy_threshold   = 2
  unhealthy_threshold = 3
  timeout             = 5
  interval            = 30
  path                = var.health_check_path  # Added
  matcher             = var.health_check_matcher  # Added
}

3. Log Retention

CloudWatch log group was created without retention:

hcl
# Original - logs forever (expensive)
resource "aws_cloudwatch_log_group" "this" {
  name = "/ecs/${var.name}"
}

# Fixed - 30 day retention
resource "aws_cloudwatch_log_group" "this" {
  name              = "/ecs/${var.name}"
  retention_in_days = var.log_retention_days
}

This would've cost money in production. Caught it in review.

4. The Secrets Gotcha

Here's where it got interesting. The initial secrets implementation used:

hcl
secrets = [
  for secret in var.secrets : {
    name      = secret.name
    valueFrom = secret.arn
  }
]

This works, but assumes full secret ARN. In practice, you often want to reference specific JSON keys within a secret:

hcl
secrets = [
  for secret in var.secrets : {
    name      = secret.name
    valueFrom = "${secret.arn}:${secret.json_key}::"
  }
]

The :: suffix is required for JSON key extraction. Claude didn't know this initially. After I explained the pattern, it updated correctly and even added a comment explaining the syntax.

The Bigger Picture

What AI Does Well

Boilerplate generation: Resource blocks, variable definitions, outputs

Best practice patterns: Module structure, naming conventions

Documentation: Comments explaining why, not just what

Iteration speed: Changes in seconds, not minutes

Where Humans Still Win

Edge cases: Production has infinite edge cases

Security nuances: That :: suffix matters

Organizational context: Your naming conventions, your CIDR ranges, your tagging strategy

Integration testing: AI can write code but can't terraform apply

My New Workflow

I don't write Terraform from scratch anymore. My workflow:

Describe the module to Claude Code in plain English

Review generated code for security and correctness

Iterate on specific fixes ("add log retention", "make this a variable")

Validate with terraform validate and tflint

Test with terraform plan against real AWS

Apply and monitor

Steps 1-3 take 30 minutes instead of 2 hours. Steps 4-6 are unchanged.

The Trust Question

Should you trust AI-generated infrastructure code?

No. And yes.

No: Don't blindly apply anything AI generates. Review every line. Run security scanners. Test in non-production first.

Yes: Trust it as a starting point. Trust it to handle boilerplate. Trust it to remember syntax you've forgotten.

The right mental model: AI is a junior engineer who's read every Terraform tutorial but never managed production. Fast, knowledgeable, needs supervision.

Try It Yourself

If you're skeptical (I was), try this experiment:

Pick a module you've already written

Describe it to Claude Code without showing your code

Compare the output to your implementation

You'll find:

80% is nearly identical

15% is different but valid

5% needs fixing

That 80% is the time savings. That 5% is why review still matters.

I went from "AI can't write infrastructure" to "AI saves me hours per week." The key was treating it as a collaborator, not a replacement.