Why I Started Looking for Terraform Alternatives (And You Probably Should Too)
The BSL change hit differently than most license drama. In August 2023, HashiCorp moved Terraform from the Mozilla Public License 2.0 to the Business Source License 1.1 — and the key clause is this one: you can’t use Terraform to build a product that competes with HashiCorp. That sounds narrow until you read it closely. The restriction applies to any “competitive offering,” which HashiCorp gets to define. Terraform Cloud, their managed platform, means that any SaaS tool, internal developer platform, or managed IaC service you build on top of Terraform is potentially in scope. Your legal team will not like that sentence.
The moment it got real for me: I was building an internal platform — basically a self-service infra provisioning tool for our dev teams. It wrapped Terraform calls, stored state, exposed a UI. Legal flagged it immediately. Was it “competing” with Terraform Cloud? Probably not in spirit. But the language was ambiguous enough that we couldn’t get a clean sign-off. We were using Terraform as a library, calling it programmatically, and building workflows on top of it. That’s exactly the gray area BSL creates. We couldn’t ship until we had a clear answer, and HashiCorp wasn’t giving one. That’s when I started actually evaluating alternatives instead of just bookmarking them.
The licensing issue is what kicked off the search, but I want to be honest: some of what I found is just better for certain workloads regardless of the legal situation. OpenTofu is the obvious fork and drop-in replacement — it’s maintained by the Linux Foundation and stayed on MPL 2.0 — but Pulumi, Crossplane, and Winglang each solve problems that Terraform handles awkwardly. Pulumi especially: if you’ve ever written a 300-line HCL file to express something that would be 40 lines of TypeScript with a loop and a type-checked object, you’ll understand the appeal immediately. The license change is the forcing function, not the whole argument.
One thing worth layering in: if you’re rethinking your IaC stack, you’re probably also rethinking your broader toolchain. I’ve been pairing these workflows with AI-assisted coding tools to speed up writing provider configs, generating resource definitions, and catching drift in module outputs. If that’s on your radar too, the Best AI Coding Tools in 2026 (thorough Guide) has a solid breakdown of what actually integrates well with infrastructure-as-code workflows — not just generic autocomplete hype.
Crossplane + CI/CD: How I Stopped Fighting Kubernetes Config Drift and Actually Shipped Faster
The Candidates I Actually Tested
I spent about three months running these tools against real workloads — a multi-region AWS setup, a GKE cluster with about a dozen namespaces, and a hybrid environment where we had some on-prem VMware mixed in. Not toy examples. Here’s what I found.
OpenTofu is the one I recommend first when a team asks “we need to get off Terraform without rewriting everything.” It’s a direct fork, MPL-2.0 licensed, and the CLI is a drop-in replacement for Terraform 1.5.x and earlier. The Linux Foundation stewards it, which matters for enterprise procurement conversations. My migration was literally:
# on macOS with Homebrew
brew install opentofu
# verify it picks up your existing state
tofu init
tofu plan
# output is identical to what you saw with terraform plan
The thing that caught me off guard was how fast the OpenTofu team has been shipping features that HashiCorp held back — provider-defined functions landed in 1.7, and stacks (their answer to workspaces being a mess) are actively in development. The community is genuinely active, not in a “there’s a Discord with 200 people” way — I mean PRs merged weekly, GitHub issues getting responses from maintainers within hours. If your team is already deep in HCL, this is your lowest-friction exit.
Pulumi is where I’d point anyone who’s already running a TypeScript monorepo or a Python-heavy org. The pitch isn’t just “use real languages” — it’s that you get real abstractions. You can write a function that creates a VPC, loops over a list of CIDR blocks, and throws an error if the region isn’t in your approved list, all without learning 14 obscure HCL meta-arguments. The free tier on pulumi.com covers individual use and small teams (up to 1 user free, then $50/month per user for the team plan — check their current pricing). Self-hosting the state backend is fully supported though, and I run mine against an S3 bucket with DynamoDB locking, exactly like I did with Terraform:
# set backend to S3-compatible storage
pulumi login s3://my-pulumi-state-bucket
# create a new stack for prod
pulumi stack init prod
# preview changes — equivalent of terraform plan
pulumi preview
The honest trade-off: the Pulumi provider ecosystem is thinner than Terraform’s. If you’re using a niche provider — say, Aiven, Render, or some internal corporate provider — you may find yourself wrapping a Terraform provider using Pulumi’s terraform-bridge, which adds complexity. Also, if your team has one TypeScript person and four who live in YAML, Pulumi’s learning curve feels steeper than it should.
Crossplane is the one that changes your mental model the most, which is both its strength and why it’s not for everyone. Everything is a Kubernetes Custom Resource. Your RDS instance is a DBInstance CRD. Your S3 bucket is a Bucket CRD. You reconcile infrastructure the same way the kube-controller-manager reconciles pods. I wrote a whole separate post about it (linked above), but the short version: if your team lives in kubectl and GitOps is already your workflow, Crossplane clicks fast. If you don’t already have a Kubernetes cluster you trust, you’re paying the cost of running Kubernetes just to provision non-Kubernetes things, which is a hard sell. Also, debugging failed reconciliation loops in Crossplane is… an experience. You will spend time reading controller logs.
Ansible gets dismissed more than it deserves. Yes, it’s older. Yes, the playbook syntax gets hairy fast. But if your problem is “I need to provision some cloud resources AND configure what’s running on them,” Ansible does both in one tool without glue code. I use it specifically for the overlap — bootstrapping EC2 instances, configuring packages, then handing off to a proper IaC tool for the stateful infrastructure. The amazon.aws collection is genuinely solid now:
# install the AWS collection
ansible-galaxy collection install amazon.aws
# create an EC2 instance inline in a playbook
- name: Launch app server
amazon.aws.ec2_instance:
name: "prod-app-01"
instance_type: t3.medium
image_id: ami-0abcdef1234567890
region: us-east-1
tags:
Environment: production
Terragrunt is technically a Terraform wrapper, not a Terraform alternative — I want to be honest about that. But I’m including it because half the teams I talk to who say “we need to replace Terraform” actually mean “we need to escape Terraform’s DX pain without a full rewrite.” Terragrunt solves the real problems: DRY configs across environments, proper module composition, remote state bootstrapping without the chicken-and-egg problem. It works equally well wrapping OpenTofu, so you can migrate to OpenTofu first, adopt Terragrunt second, and you’ve gained a lot without a paradigm shift.
A few tools I deliberately ruled out: CDK for Terraform (CDKTF) has a reasonable idea but requires you to install Terraform anyway, so you’re not actually escaping the BSL license concern — you just moved HCL into generated JSON. Farmer (the F# DSL for Azure Resource Manager) is genuinely clever if you’re a .NET shop, but “F# IaC” is a hard team hire. Winglang — I watched the demos, read the docs, and it’s fascinating research, but the provider support is too thin and the community too small for anything I’d stake production on in 2025. Check back in two years.
OpenTofu: The Closest Drop-In If You’re Already on Terraform
The thing that surprised me most about OpenTofu wasn’t the migration — it was how fast I forgot I’d switched. I had a 300-line AWS networking stack, ran a find-and-replace on my CI YAML, and the plan output looked identical. That’s either impressive engineering or a sign that the fork happened at exactly the right moment before the codebases diverged. Either way, if you’re running Terraform 1.5.x or earlier, the migration tax is nearly zero.
Installation takes under a minute. On macOS:
# Homebrew — done in ~30 seconds
brew install opentofu
# Or the official installer script if you're on Linux CI
curl --proto '=https' --tlsv1.2 -fsSL https://get.opentofu.org/install-opentofu.sh | sh
# Confirm you're on 1.7.x
tofu version
# OpenTofu v1.7.3
# on linux_amd64
The CI migration is usually just a string swap. Here’s what I changed in my GitHub Actions pipeline:
# Before
- name: Terraform Plan
run: terraform plan -var-file=env/prod.tfvars
# After — that's genuinely it
- name: OpenTofu Plan
run: tofu plan -var-file=env/prod.tfvars
Your .terraform.lock.hcl still works. Your state files still work. Your modules still work. The only thing you need to swap out is the binary and the GitHub Action from hashicorp/setup-terraform to opentofu/setup-opentofu. Here’s a real AWS VPC config that runs identically on OpenTofu 1.7.x with zero modifications:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.40"
}
}
required_version = ">= 1.6"
# OpenTofu 1.7+ supports native state encryption here — Terraform can't do this
# encryption {
# key_provider "pbkdf2" "main" {
# passphrase = var.state_passphrase
# }
# method "aes_gcm" "default" {
# keys = key_provider.pbkdf2.main
# }
# state { method = method.aes_gcm.default }
# }
}
provider "aws" {
region = var.aws_region
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.env}-vpc"
ManagedBy = "opentofu"
}
}
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = { Name = "${var.env}-public-${count.index}" }
}
data "aws_availability_zones" "available" {
state = "available"
}
Where OpenTofu actually pulls ahead of Terraform 1.5+: native state encryption (shown commented above — no third-party KMS wrapper needed), provider-defined functions as of 1.7, and removed blocks that let you explicitly deprecate resources from state without destroying them. The state encryption feature alone was the thing that finally got my security team to stop asking why we were shipping plaintext state to S3. The passphrase-based approach is good enough for most teams; the KMS-backed provider config is better for anything regulated.
The honest rough edge I hit: newer AWS resource types that only exist in hashicorp/aws provider versions 5.45+ sometimes have a lag before they’re fully validated in the OpenTofu registry. I ran into this with aws_verifiedaccess_* resources — the provider binary installs fine, but you’ll occasionally see schema validation errors on modern resources that haven’t been exercised much in the OpenTofu community yet. The fix is usually pinning the provider version and waiting a release cycle, but it’s annoying when you’re trying to ship. If your stack is mostly compute, networking, IAM, and RDS — things that have been stable for years — you will never hit this. If you’re using services AWS launched in the last six months, test before you commit to the migration.
OpenTofu is the right call for teams with existing Terraform codebases over a few thousand lines who cannot afford a rewrite and need to move off HashiCorp’s BSL license for compliance or open-source policy reasons. It’s not the right call if you’re starting from scratch with a Kubernetes-heavy stack — in that case, Crossplane or Pulumi give you better primitives for that environment. But if your team already knows HCL, already has modules, and already has CI built around Terraform’s CLI shape, the migration is a Friday afternoon task, not a sprint.
Pulumi: When You’re Tired of Fighting HCL
The thing that converted me wasn’t a blog post — it was the moment I had to write a Terraform loop that conditionally created resources based on a variable-length list of environments with different configs per environment. The HCL looked like a ransom note. I rewrote it in Pulumi TypeScript in about 20 minutes and it was just… a for loop. With types. With IDE autocomplete. That was the moment.
Getting started is fast. One curl, then scaffold a project:
# Install
curl -fsSL https://get.pulumi.com | sh
# New AWS TypeScript project
pulumi new aws-typescript
# Prompts you for project name, stack name, AWS region
# Generates index.ts, Pulumi.yaml, package.json — runnable immediately
Here’s what creating an S3 bucket with a policy looks like in TypeScript versus HCL. The Terraform version isn’t bad for the simple case — but watch what happens the moment you need logic:
// Pulumi TypeScript — bucket + conditional policy based on environment
import * as aws from "@pulumi/aws";
import * as pulumi from "@pulumi/pulumi";
const config = new pulumi.Config();
const env = config.require("environment");
const isProd = env === "production";
const bucket = new aws.s3.Bucket("app-assets", {
acl: "private",
versioning: { enabled: isProd }, // real conditional — no count hack
});
// Policy only attached in prod — zero ceremony
if (isProd) {
const policy = new aws.s3.BucketPolicy("app-assets-policy", {
bucket: bucket.id,
policy: pulumi.all([bucket.arn]).apply(([arn]) =>
JSON.stringify({
Version: "2012-10-17",
Statement: [{
Effect: "Deny",
Principal: "*",
Action: "s3:GetObject",
Resource: `${arn}/*`,
Condition: { Bool: { "aws:SecureTransport": "false" } },
}],
})
),
});
}
export const bucketName = bucket.id;
The equivalent Terraform HCL for that conditional policy requires count = var.environment == "production" ? 1 : 0, a data.aws_iam_policy_document block, and then you still can’t reference the count resource cleanly in other places without [0] indexing everywhere. It’s not unmanageable, but it doesn’t compose. Pulumi code composes because it’s just code — you can extract that bucket + policy combo into a function and call it 10 times with different configs, which is exactly what platform teams need.
The free tier gotcha is real and will bite you if you don’t plan for it. Pulumi Cloud’s free tier is one user with limited state history. That’s fine for solo projects, genuinely not fine for a team. The fix is to self-host state on S3 — it’s two lines and you own the backend completely:
# Point Pulumi at your own S3 bucket for state
pulumi login s3://your-state-bucket
# Pulumi.yaml — specify the backend so it's committed and reproducible
name: my-infra-project
runtime: nodejs
backend:
url: s3://your-state-bucket # supports s3://, gs://, azblob:// too
# Stack-specific config lives in Pulumi.dev.yaml, Pulumi.prod.yaml automatically
# Make sure the bucket has versioning enabled — Pulumi state files are small
# but you absolutely want recovery if something corrupts
Make sure the IAM role running your CI has s3:GetObject, s3:PutObject, and s3:DeleteObject on that bucket. I forgot s3:ListBucket the first time and spent 20 minutes debugging a confusing 403.
Where Pulumi genuinely wins hard over every HCL-based tool: cross-stack references, dynamic loops, and abstraction layers. Referencing an output from another stack is new pulumi.StackReference("org/networking/prod") and then calling .getOutput("vpcId"). It’s typed. Your IDE tells you the shape. Looping over a dynamic list of microservices to create ECR repos, ECS services, and CloudWatch alarms for each is a services.forEach() — no for_each map gymnastics, no fighting with toset().
The rough edges I actually hit, in order of annoyance: First, the feedback loop is slower than HCL for trivial changes. pulumi preview has to run your TypeScript, compile it, evaluate the Pulumi program, and then diff — it’s noticeably slower than terraform plan for a small stack. Second, the generated SDK types from AWS providers can throw cryptic TypeScript errors, especially when you’re passing an Output<string> where the SDK expects a string | Input<string> — the error messages don’t always point you at the actual problem. Third, and this one cost me a production incident: pulumi up‘s preview is sometimes optimistic. It will say “no changes” and then during the actual apply realize it needs to replace a resource. Always read the detailed diff, not just the summary line.
This tool is genuinely for teams with a software engineering culture. If your infra team already writes Python or TypeScript, thinks in abstractions, and is building internal developer platforms, Pulumi pays off fast. If you have a two-person ops team who mostly copy-pastes HCL modules from the registry and needs something the whole company can read and contribute to without a coding background, OpenTofu is the honest answer. Pulumi’s power is real — but it comes with the full weight of a software project: package.json, dependency management, TypeScript compilation, and all the ways that can go sideways.
Crossplane: IaC That Lives Inside Your Kubernetes Cluster
The thing that caught me off guard with Crossplane isn’t the installation — it’s the mental model flip. You stop thinking about “running Terraform to provision cloud resources” and start thinking about Kubernetes controllers that constantly reconcile your cloud state. Your RDS instance is a Kubernetes object. Your S3 bucket is a Kubernetes object. kubectl get managed shows you the actual state of your AWS account. Once that clicks, it’s hard to go back to a separate IaC tool that lives outside your cluster.
Getting it running is genuinely simple compared to what it does under the hood:
# Add the stable Helm repo
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update
# Install into its own namespace — crossplane-system is the convention
helm install crossplane crossplane-stable/crossplane \
-n crossplane-system \
--create-namespace
# Install the AWS provider (community-maintained, free)
kubectl apply -f - <<EOF
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-aws-rds
spec:
package: xpkg.upbound.io/upbound/provider-aws-rds:v0.46.0
EOF
# Verify the provider is healthy before touching anything else
kubectl get providers
The real power comes from Compositions and XRDs. Here’s a stripped-down but functional XRD that gives developers a clean API for requesting an RDS Postgres instance — without them ever needing to know about subnet groups, parameter groups, or backup windows:
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xpostgresdatabases.platform.example.com
spec:
group: platform.example.com
names:
kind: XPostgresDatabase
plural: xpostgresdatabases
claimNames:
kind: PostgresDatabase
plural: postgresdatabases
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
parameters:
type: object
required: [size, dbName]
properties:
size:
type: string
enum: [small, medium, large] # hide instance class complexity
dbName:
type: string
---
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: xpostgresdatabases.aws.platform.example.com
spec:
compositeTypeRef:
apiVersion: platform.example.com/v1alpha1
kind: XPostgresDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-east-1
engine: postgres
engineVersion: "15.4"
instanceClass: db.t3.micro # overridden by patch below
allocatedStorage: 20
skipFinalSnapshot: true
backupRetentionPeriod: 7 # sane default baked in — devs don't choose this
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.size
toFieldPath: spec.forProvider.instanceClass
transforms:
- type: map
map:
small: db.t3.micro
medium: db.t3.medium
large: db.t3.large
A developer on your platform just applies this:
apiVersion: platform.example.com/v1alpha1
kind: PostgresDatabase
metadata:
name: my-app-db
namespace: production
spec:
parameters:
size: small
dbName: myapp
That’s the actual value proposition for platform teams. Developers get a three-field YAML. Operators control the Composition — they decide which AWS account, which VPC, which backup policy. The cognitive surface area for app developers shrinks to nearly nothing, and the platform team retains full control without becoming a bottleneck. If your company has 15 teams all asking for databases, this pattern scales in a way that “teach everyone Terraform” absolutely does not.
Now for the honest frustration. Debugging a broken Composition is painful in ways that Terraform never is. When a terraform plan fails, you get a clean error. When a Crossplane Composition fails, you get a cascade of Kubernetes events across multiple objects — the composite resource, the managed resource, and the provider pod logs. You’ll end up doing something like this regularly:
# The cascade of places you need to check when something breaks
kubectl describe composite xpostgresdatabase/my-app-db
kubectl get managed # find the underlying rds.Instance name
kubectl describe instance.rds.aws.upbound.io/my-app-db-abc123
kubectl -n crossplane-system logs -l pkg.crossplane.io/revision=provider-aws-rds-abc123 --tail=100
# Composition revision errors show up here — not always obvious
kubectl get compositionrevisions
Composition revision errors in particular are not always obvious. If you update a Composition with a bad patch path, the existing resources keep running fine but new claims silently fail to reconcile — you’ll see a condition like Synced: False with a cryptic message about a field path. You need to be comfortable reading controller-runtime log output, which is dense.
The learning curve is real. If you’re not already comfortable with CRDs, admission webhooks, and how Kubernetes operators work in general, Crossplane will feel like fighting three abstractions at once. I’d estimate it takes a platform engineer who knows Kubernetes well about two weeks to go from installation to a production-ready Composition. If your team is still learning Kubernetes itself, this is the wrong time. But if you’re already running operators and you want your cloud infra to follow the same GitOps reconciliation loop as your app deployments — ArgoCD watching a repo, automatic drift correction, everything in one control plane — Crossplane is genuinely the strongest free option available right now.
Ansible: Don’t Dismiss It for Cloud Provisioning
Most IaC conversations skip Ansible entirely because people mentally file it under “config management, not provisioning.” That’s a mistake. I’ve used Ansible to provision AWS infrastructure and configure the instances in the same playbook run — no Terraform, no separate Chef/Puppet layer, no pipeline hand-off between tools. For hybrid shops or ops-heavy teams, that unified workflow is genuinely hard to beat.
Setup is fast. Install the core engine and the AWS SDK, then pull the official Amazon collection:
pip install ansible boto3 botocore
# Install the AWS collection — this gives you amazon.ec2_instance, amazon.vpc, etc.
ansible-galaxy collection install amazon.aws
# Verify the collection landed correctly
ansible-galaxy collection list | grep amazon
Your AWS credentials can live in ~/.aws/credentials or environment variables — Ansible picks them up via boto3 the same way the AWS CLI does. No custom provider configuration needed.
Here’s a real playbook that provisions an EC2 instance and then drops an nginx config onto it in a single run — no separate “apply then configure” phase:
---
- name: Provision and configure web server
hosts: localhost
gather_facts: false
vars:
region: us-east-1
instance_type: t3.micro
ami_id: ami-0c101f26f147fa7fd # Amazon Linux 2023 in us-east-1
tasks:
- name: Launch EC2 instance
amazon.aws.ec2_instance:
name: "web-01"
region: "{{ region }}"
instance_type: "{{ instance_type }}"
image_id: "{{ ami_id }}"
key_name: my-keypair
security_groups: ["sg-0abc123def456"]
network:
assign_public_ip: true
tags:
Environment: staging
wait: true # block until instance is running — critical
state: running
register: ec2_result
- name: Add new instance to in-memory inventory
ansible.builtin.add_host:
name: "{{ ec2_result.instances[0].public_ip_address }}"
groups: newly_provisioned
ansible_user: ec2-user
ansible_ssh_private_key_file: ~/.ssh/my-keypair.pem
- name: Wait for SSH to come up
ansible.builtin.wait_for:
host: "{{ ec2_result.instances[0].public_ip_address }}"
port: 22
delay: 10
timeout: 120
- name: Configure the instance
hosts: newly_provisioned
become: true
gather_facts: true
tasks:
- name: Install nginx
ansible.builtin.dnf:
name: nginx
state: present
- name: Drop custom nginx config
ansible.builtin.template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
notify: restart nginx
- name: Ensure nginx is running and enabled
ansible.builtin.service:
name: nginx
state: started
enabled: true
handlers:
- name: restart nginx
ansible.builtin.service:
name: nginx
state: restarted
The add_host trick is the linchpin here — it dynamically injects the new instance into an in-memory group so the second play can SSH into it immediately. You run ansible-playbook provision.yml once and come back to a running, configured server. Terraform + Ansible pipelines typically need a glue script or a CI step to pass outputs between tools. This skips that entirely.
The limitation you need to understand before committing: Ansible has no real state file. Terraform tracks what it created and can destroy it cleanly. Ansible just re-runs your tasks against whatever is there. The modules are supposed to be idempotent — ec2_instance with state: running won’t create a duplicate if the instance already exists. But if someone manually terminates that instance and renames another one identically, Ansible won’t notice the drift the way Terraform would on a plan. You can paper over this with check_mode runs and careful tagging strategies, but you’re doing it manually. Drift detection is your problem, not the tool’s.
I reach for Ansible specifically in two scenarios: bootstrapping bare metal or on-prem VMs where there’s no cloud API — just SSH and a RHEL install — and hybrid workflows where I’m genuinely touching both an AWS API and SSH targets in the same run. If your team already has a library of Ansible roles for config management, adding cloud provisioning tasks to existing playbooks is a 30-minute job. Rewriting everything in Pulumi or OpenTofu just to provision an EC2 instance that you’d configure with Ansible anyway is organizational overhead that rarely pays off. The people this actually fits: ops teams who live in Ansible, anyone managing hybrid cloud where half the fleet doesn’t have a cloud API, and shops that need a single on-call person to understand the entire provisioning-to-configuration chain without switching mental models mid-incident.
Side-by-Side Comparison: What Actually Matters
The comparison table everyone posts online usually stops at “language” and “license.” That’s not enough to make an actual decision. The thing that catches teams off guard is almost always one specific column — state management or drift detection — not the syntax. I’ve seen teams migrate from Terraform to Pulumi only to discover the free cloud tier has backend state API limits that bite you at scale, or move to Crossplane and spend two months just understanding how Composition works before writing a single productive line.
Tool | License | State Backend | Language(s) | K8s-Native | Free Self-Hosted | Learning Curve | Best Fit
------------|--------------|----------------------------|------------------------------|------------|------------------|-----------------|----------------------------------
OpenTofu | MPL-2.0 | S3, GCS, Azure Blob, etc. | HCL | No | Yes | Low (for TF users) | Migration path from Terraform
Pulumi | Apache 2.0 | S3/GCS self-hostable | TypeScript, Python, Go, C# | No | Yes | Medium | Platform/app teams who hate HCL
Crossplane | Apache 2.0 | Kubernetes etcd | YAML + CRDs | Yes | Yes | High | Platform engineering on K8s
Ansible | GPL-3.0 | None natively | YAML | No | Yes | Low-Medium | Hybrid config + provision overlap
The dealbreaker column is the one you should read first. For Pulumi, the OSS engine is genuinely Apache 2.0, but if you let developers use pulumi.com as the state backend, the free tier limits concurrent updates and history retention. The workaround is straightforward — self-host with S3 and the Pulumi Service alternative — but you have to configure it explicitly from day one:
# Set self-managed S3 backend instead of pulumi.com
export PULUMI_BACKEND_URL=s3://your-state-bucket/pulumi
# Then init your stack — this writes state locally, not to pulumi.com
pulumi stack init prod --secrets-provider=awskms://alias/your-key
Crossplane’s dealbreaker isn’t licensing or cost — it’s the operator knowledge requirement. Before you can provision a single RDS instance, you need a working Kubernetes cluster, you need to install the AWS provider (which itself runs as a pod pulling ~200MB images), understand Composite Resource Definitions, write a Composition that maps your abstraction to actual AWS resources, and then debug why your XR is stuck in a reconcile loop. That’s not exaggeration — I spent a full sprint getting the Crossplane AWS provider v0.43+ stable before I could even demo it to stakeholders. If your team doesn’t already live in Kubernetes, this is the wrong tool.
# Crossplane: just installing the provider before you can do anything
kubectl apply -f https://raw.githubusercontent.com/crossplane-contrib/provider-aws/master/package/crds/
# Then you still need a ProviderConfig, a Composition, and a CompositeResourceDefinition
# before a single resource gets created. Expect 300+ lines of YAML for a basic VPC setup.
Ansible’s missing drift detection is a real operational liability that the YAML simplicity masks. With Terraform or OpenTofu, running plan tells you what changed out-of-band in your infrastructure. Ansible’s --check mode does something similar only for specific modules, and “idempotent” doesn’t mean “declarative” — if someone manually resizes an EC2 instance, Ansible has no built-in mechanism to flag that as drift. You can bolt on AWX or use ansible-lint in CI, but it’s duct tape. OpenTofu gives you actual state reconciliation out of the box, which is why it’s the lowest-friction Terraform replacement for teams who want parity without the BSL.
- OpenTofu — pick this if your team already has Terraform state files and you want zero retraining. Drop-in compatible with HCL configs through Terraform 1.5.x.
- Pulumi — pick this if your engineers are TypeScript/Python developers who find HCL limiting and you’re willing to configure a self-hosted state backend from day one.
- Crossplane — pick this only if you have a dedicated platform engineering team that already manages Kubernetes clusters and wants infrastructure definitions to live as K8s custom resources alongside app workloads.
- Ansible — pick this when you’re managing config alongside provisioning (think: spinning up VMs and configuring them), not as a primary IaC tool for cloud resource lifecycle management.
When to Pick What: Matching the Tool to Your Actual Situation
Matching the Tool to Your Actual Situation
The single most common mistake I see teams make is picking a tool based on blog post hype rather than their actual constraints. So let me be blunt about each scenario, because the right answer genuinely changes depending on where you’re starting from.
You have 50,000+ lines of existing Terraform HCL. OpenTofu, no question. Don’t let anyone sell you on a full rewrite — that’s months of risk for zero user-facing value. Run tofu init today and your existing state files, modules, and providers work as-is. OpenTofu 1.7+ added encrypted state backends and provider-defined functions that Terraform OSS still doesn’t have. Migration is genuinely a one-command operation for most stacks:
# your existing .tfstate and modules are immediately compatible
tofu init
tofu plan # verify nothing changed
tofu apply
Building an internal developer platform on top of EKS or GKE. This one splits depending on your team’s background. If your engineers already understand Kubernetes operators and reconciliation loops, Crossplane is the natural fit — your platform team writes Compositions, your dev teams self-serve cloud resources through kubectl apply without touching AWS consoles. If your team writes Go services and has never touched a CRD in anger, Pulumi in Go or TypeScript will get you to a working IDP faster, because Crossplane’s XRD + Composition authoring has a genuinely steep first week. I’ve seen teams burn two sprints on a single Composition that Pulumi’s ComponentResource would’ve done in a day.
Small team, mixed AWS plus on-prem, need velocity now. This is the case where a hybrid approach actually makes sense instead of being a cop-out. Use Ansible for on-prem provisioning — it connects over SSH, needs no agent, and your ops person already knows it. Use OpenTofu or Pulumi for the cloud-native resources where you want state management and drift detection. Don’t try to shoehorn Ansible into managing S3 buckets and IAM roles at scale; the Ansible AWS modules work but the state model is procedural, not declarative, and you’ll eventually spend a Friday untangling it.
You want AI code completion and real unit tests for your infra code. Pulumi in TypeScript is the only serious answer here. Your existing Jest setup works immediately — no special test runner, no mocking framework to learn:
// pulumi-test.ts — actual unit test, not a policy check
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
pulumi.runtime.setMocks({
newResource: (args) => ({ id: `${args.name}-id`, state: args.inputs }),
call: (args) => ({ outputs: args.inputs }),
});
test("S3 bucket has versioning enabled", async () => {
const { bucket } = await import("./infra");
const versioningConfig = await bucket.versioning;
expect(versioningConfig?.enabled).toBe(true);
});
Copilot and Cursor also complete Pulumi TypeScript significantly better than HCL — the type signatures are in your node_modules, so autocomplete actually knows what properties an aws.ec2.Instance takes.
Greenfield startup, three engineers, no legacy baggage. Pulumi with an S3 backend in TypeScript. The S3 backend avoids paying for Pulumi Cloud while keeping state remote and versioned:
pulumi login s3://your-state-bucket/pulumi-state
pulumi stack init dev
TypeScript is also the pragmatic hiring call — a mid-level frontend or backend engineer can contribute to infra code without a Terraform or YAML crash course. You can refactor a Pulumi program like you refactor application code, because it is application code.
Locked into GitOps with ArgoCD or Flux. Crossplane is the only tool that actually fits this model without duct tape. Everything is a manifest, ArgoCD syncs it, Crossplane’s controller reconciles it against the real cloud state — it’s the same loop you already trust for your application deployments. Trying to run Pulumi or OpenTofu inside ArgoCD means wrapping them in Jobs or exec hooks, which breaks the GitOps reconciliation model the moment a job fails silently. Crossplane’s ManagedResource status conditions show up in ArgoCD health checks natively; you get drift detection for free because the controller is always running.
The Setup I Actually Run in Production Right Now
The migration from Terraform to OpenTofu took me about three hours on a Friday afternoon — and that includes the time I spent being paranoid and taking manual state backups first. No module rewrites, no provider changes, no terraform init weirdness. You literally swap the binary and run tofu init. Everything we had in AWS — VPCs, EKS clusters, RDS instances, IAM roles — came up clean on the first plan with zero diff. That’s because OpenTofu 1.6+ reads Terraform 1.5 state files natively. We have state files from Terraform 1.5 that we deliberately haven’t touched, and that’s not a bug, it’s a decision.
For new services — anything greenfield — I reach for Pulumi with TypeScript. The moment you need conditional logic beyond HCL’s awkward count/for_each gymnastics, Pulumi pays for itself. I’m talking real abstractions: a SecureService component class that wires up an ECS service, its IAM role, its target group, and its CloudWatch log group as one reusable unit. That’s 200 lines you write once and new everywhere. We store state in S3 instead of Pulumi Cloud because we already trust that bucket and don’t want another SaaS dependency:
# one-time login, stored in ~/.pulumi/credentials.json
pulumi login s3://our-infra-state-bucket?region=us-east-1
# stack init after that is normal
pulumi stack init prod --secrets-provider=awskms://alias/pulumi-secrets
The secrets-provider flag is the gotcha nobody mentions in tutorials. If you skip it, Pulumi defaults to its own passphrase encryption, which breaks in CI unless you set PULUMI_CONFIG_PASSPHRASE everywhere. Using KMS aliases means your CI role just needs kms:Decrypt on that key — cleaner and auditable.
Ansible handles everything that isn’t a cloud resource: initial host bootstrapping on our bare-metal nodes, configuring the containerd runtime, joining nodes to EKS, and any stateful software that doesn’t fit neatly into a container. The hybrid setup means we have EC2 instances that Pulumi creates and bare-metal nodes that Ansible owns — the two tools don’t step on each other because they operate at different layers. Cloud resource lifecycle is IaC, OS and process configuration is Ansible. That boundary has saved us from at least two “who owns this?” arguments.
The GitHub Actions pipeline is where this all ties together. PRs get a tofu plan whose output is posted as a comment — reviewers see exactly what’s changing before merge. Merges to main trigger tofu apply with no manual gate:
name: OpenTofu
on:
pull_request:
paths: ['infra/tofu/**']
push:
branches: [main]
paths: ['infra/tofu/**']
jobs:
plan:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
permissions:
id-token: write # OIDC auth to AWS — no long-lived keys
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: opentofu/setup-opentofu@v1
with:
tofu_version: "1.7.1"
- name: AWS auth via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-tofu-plan
aws-region: us-east-1
- run: tofu init
working-directory: infra/tofu
- name: Plan and comment
uses: dflook/tofu-plan@v1 # posts plan output as PR comment
with:
path: infra/tofu
apply:
if: github.event_name == 'push'
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: opentofu/setup-opentofu@v1
with:
tofu_version: "1.7.1"
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-tofu-apply
aws-region: us-east-1
- run: tofu init && tofu apply -auto-approve
working-directory: infra/tofu
Two separate IAM roles — one for plan (read-only), one for apply (write) — is worth the extra five minutes of setup. A compromised PR can’t nuke your infra. The OIDC trust policy on each role scopes it to the right GitHub event type using the sub claim condition, so the plan role literally cannot be used from a push to main and vice versa. That’s the kind of thing you add after your first near-miss, so I’m saving you that experience.
Gotchas I Hit That the Docs Don’t Warn You About
The migration from Terraform to OpenTofu looks clean on paper — same HCL, same state format, same provider ecosystem. The thing that caught me off guard was tofu providers lock. It regenerates the .terraform.lock.hcl file, but the hash algorithm for provider packages changed slightly between versions. If you just copy your existing lock file over and call it a day, some CI pipelines will fail on provider verification with a cryptic checksum mismatch error. Re-run tofu providers lock -platform=linux_amd64 -platform=darwin_arm64 explicitly after migration — don’t assume your old lock file is compatible.
# Don't just copy the old .terraform.lock.hcl — regenerate it
tofu providers lock \
-platform=linux_amd64 \
-platform=darwin_arm64 \
-platform=windows_amd64
# Then verify it committed correctly before your first CI run
tofu init -backend=false
Pulumi’s secret handling in stack outputs is smart until it bites you. When you mark an output as a secret — or when Pulumi auto-detects one — it encrypts that value in the state file using your stack passphrase. With the default local backend, if you lose the passphrase, that state is effectively bricked for those values. You can’t recover them. I learned this when a contractor rotated a machine and didn’t document the passphrase. The fix going forward is to either use the Pulumi Cloud backend (which handles key management for you) or store your passphrase in a secrets manager like AWS Secrets Manager and pull it into your CI env explicitly. Don’t treat PULUMI_CONFIG_PASSPHRASE as a throwaway env var.
Crossplane burned me with Composition revisions. When you update a Composition resource — adding a new field, changing a patch — existing CompositeResource objects don’t automatically reconcile against the new revision. They sit there on the old revision until something triggers reconciliation. In practice that means you think you’ve deployed a change and you haven’t. The blunt fix is to annotate the affected composite resources to force a reconcile:
# Force reconciliation on a specific composite resource
kubectl annotate compositepostgresqlinstance my-db \
crossplane.io/paused=true --overwrite
kubectl annotate compositepostgresqlinstance my-db \
crossplane.io/paused- --overwrite
# Or if you want to touch all of them at once (dangerous in prod)
kubectl get compositepostgresqlinstance -o name | \
xargs -I{} kubectl annotate {} crossplane.io/paused=true --overwrite
Ansible’s register + when combination on cloud modules is the kind of thing that looks fine in a dry-run and fails you in production. The problem: most cloud modules return different structures depending on whether a resource was created, already existed, or was skipped. If you register the result and then when on a key inside it, you can get a KeyError or a silent skip when the resource was already present and the module returned a shorter response. Always test idempotency by running your playbook twice on a live environment — not just once on a fresh one. The second run is where these bugs surface:
- name: Create S3 bucket
amazon.aws.s3_bucket:
name: my-infra-bucket
state: present
register: bucket_result
# This WILL fail on the second run if bucket already existed
# because bucket_result.changed is False and some keys may be absent
- name: Tag the bucket
amazon.aws.s3_bucket:
name: my-infra-bucket
tags:
env: prod
when: bucket_result.changed # Don't gate on this — gate on existence instead
Every tool here suffers the same provider version drift problem that Terraform users know well, but the culture around lock files is looser in the alternatives community. OpenTofu has .terraform.lock.hcl. Pulumi has requirements.txt or package-lock.json depending on your language, and provider SDK versions are pinned there — skip that and a pulumi up six months from now will pull a breaking provider version silently. Crossplane uses provider package versions in its Provider CRD, and it’s common to see tutorials that omit the version tag entirely, meaning you’re pulling :latest into production. Ansible collections have requirements.yml. Pin everything. The failure mode is always the same: works today, breaks on the next run three weeks from now when someone does a fresh install.