Terraform is the standard for infrastructure as code, but most teams only scratch the surface of what it can do. After managing dozens of production environments, here are the patterns that keep infrastructure maintainable, safe, and team-friendly at scale.
Project Structure That Scales
The biggest mistake teams make is treating Terraform like a single script. As your infrastructure grows, a flat structure becomes unmanageable.
The Module Pattern
Split your infrastructure into reusable modules. A module is a self-contained unit that encapsulates a logical piece of infrastructure — a VPC, an ECS cluster, a database, a CDN distribution.
```
infrastructure/
modules/
vpc/
ecs-cluster/
rds/
cdn/
environments/
staging/
production/
shared/
state-backend/
iam/
```
Each environment directory calls the shared modules with environment-specific variables. Staging and production use identical module code — only the inputs differ.
Separate State Per Environment
Never share Terraform state between staging and production. A corrupted or accidentally applied state file in production is a serious incident. Use separate S3 buckets (or equivalent) with separate DynamoDB lock tables for each environment.
State Management
Remote state is non-negotiable for team environments. Local state files get lost, corrupted, and cause conflicts.
S3 Backend with Locking
```hcl
terraform {
backend "s3" {
bucket = "your-org-terraform-state-prod"
key = "services/api/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
```
The DynamoDB table prevents two engineers from running `terraform apply` simultaneously — a race condition that can corrupt state.
State File Security
Your state file contains sensitive values — database passwords, API keys, private IPs. Ensure:
- S3 bucket has versioning enabled (so you can roll back)
- Server-side encryption is on
- Bucket is not publicly accessible
- Access is restricted to the CI/CD role and specific engineers
CI/CD Integration
Manual `terraform apply` from a developer's laptop is a liability. Every infrastructure change should go through a pull request and an automated pipeline.
The GitOps Workflow
1. Engineer opens a PR with infrastructure changes
2. CI runs `terraform plan` and posts the output as a PR comment
3. A second engineer reviews the plan — not just the code, but the actual diff
4. PR is merged to main
5. CD pipeline runs `terraform apply` automatically
This gives you a full audit trail of every infrastructure change, who approved it, and what the plan showed before it was applied.
Plan Output in PRs
Use a tool like Atlantis or a custom GitHub Actions workflow to post the plan output directly in the PR. Reviewers should be looking at what Terraform will actually do, not just the HCL diff.
Preventing Drift
Infrastructure drift — when the real state of your infrastructure diverges from what Terraform thinks it is — is one of the most common sources of production incidents.
Automated Drift Detection
Run `terraform plan` on a schedule (daily is usually sufficient) and alert if the plan is non-empty. A non-empty plan means something changed outside of Terraform.
```yaml
# GitHub Actions scheduled drift detection
on:
schedule:
jobs:
drift-check:
steps:
```
Exit code 2 means changes are pending. Alert your team.
- cron: '0 8 * * *'
- run: terraform plan -detailed-exitcode
Enforce Terraform-Only Changes
Use IAM policies to restrict who can make changes to production infrastructure. The CI/CD role should have the permissions needed to apply Terraform. Human engineers should have read-only access to production by default.
Variable Management
Hardcoding values in Terraform is a fast path to security incidents and configuration drift.
Separate Variables by Sensitivity
```hcl
variable "db_password" {
description = "Database master password"
type = string
sensitive = true
}
```
The `sensitive = true` flag prevents the value from appearing in plan output or logs.
- Non-sensitive variables: store in `terraform.tfvars` files, committed to the repo
- Sensitive variables: store in AWS Secrets Manager, HashiCorp Vault, or your CI/CD secret store — never in the repo
Tagging Strategy
Every resource should be tagged consistently. Tags are how you track costs, identify owners, and automate operations.
```hcl
locals {
common_tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
Owner = var.team_name
CostCenter = var.cost_center
}
}
```
Apply `local.common_tags` to every resource. This makes cost allocation, security audits, and cleanup operations dramatically easier.
Testing Infrastructure Changes
Terraform changes can have cascading effects that are not obvious from the plan output. A few practices that catch problems before they reach production.
Always Test in Staging First
This sounds obvious, but it is frequently skipped under time pressure. Staging should be as close to production as possible — same module versions, same configuration, smaller instance sizes.
Use `terraform plan -target` Carefully
Targeted applies are useful for emergencies but dangerous as a habit. They can leave your state in an inconsistent condition. Prefer full applies whenever possible.
Validate Before Apply
```bash
terraform fmt -check
terraform validate
terraform plan -out=tfplan
```
Run these in CI before any apply. `terraform validate` catches syntax errors and obvious configuration mistakes. `terraform fmt -check` enforces consistent formatting.
The Patterns That Matter Most
After managing infrastructure for dozens of production systems, the practices that prevent the most incidents are:
1. Remote state with locking — always
2. Every change through a PR with plan review
3. Separate state per environment
4. Automated drift detection
5. Sensitive values never in the repo
Get these right and Terraform becomes a reliable, auditable foundation for your infrastructure. Skip them and you will eventually have a bad day.






