Claude Code with Fly.io: fly.toml, Machines, Multi-Region
The Fly.io CLAUDE.md template
Fly.io is a different shape from the typical PaaS. Instead of one build container that produces one running app, Fly runs Firecracker microVMs ("Machines") in regions you pick, behind an Anycast network that routes traffic to the closest healthy instance. fly.toml declares services, ports, health checks, mounts, and concurrency. The Dockerfile builds the image. fly secrets and fly volumes are separate primitives. A misconfigured line in any of these surfaces as a deploy that boots, fails health checks, gets rolled back, and leaves you reading logs.
Claude Code knows the rough shape of Fly: that fly.toml exists, that there is a [http_service] block, that secrets are env vars at runtime. What it does not know is your project. Without a CLAUDE.md telling it which regions you run in, where the primary write database lives, and which fly commands are safe to run unattended, Claude generates fly.toml files that look right and ship machines that boot into a crash loop. If you are new to Claude Code, the Claude Code setup guide covers installation first.
The CLAUDE.md at your project root is read at the start of every session. For a Fly project, it declares the platform (Machines, not legacy Apps), the regions, the build target, the secrets workflow, and the hard rules.
# Fly.io project rules
## Stack
- Platform: Fly Machines (NOT legacy Nomad-based Apps)
- App name: app-api (matches `app` field in fly.toml)
- Primary region: lhr (London Heathrow)
- Replica regions: iad, fra (read-only replicas, no write traffic)
- Build: Dockerfile in repo root, multi-stage, distroless final image
- Runtime: Node 22.x on linux/amd64
## Region rules
- Primary write region: lhr (matches PRIMARY_REGION env var)
- All write requests routed to lhr machines via fly-replay header
- Read replicas in iad and fra accept GET only
- Volumes pinned per region, NEVER assume cross-region volume access
## Secrets workflow
- ALL secrets set via `fly secrets set KEY=value`
- NEVER bake secrets into Dockerfile via ARG or ENV
- NEVER commit .env, .env.production, or any file with real secret values
- Document every required secret in .env.example with a comment
## Deploy commands
- Local dev: `pnpm dev`
- Build image locally: `fly deploy --build-only --image-label test`
- Stage deploy: `fly deploy --app app-api-staging`
- Production deploy: ONLY via git push to main + GitHub Actions, never `fly deploy --app app-api` from Claude
## Hard rules
- NEVER run `fly deploy` against the production app from Claude Code without explicit user request
- NEVER run `fly volumes destroy` or `fly machines destroy` without explicit user confirmation
- NEVER edit fly.toml `app =` field or change primary region without confirming
- NEVER add a secret without updating .env.example
- NEVER write a Dockerfile that runs as root in the final stage
- NEVER use `fly ssh console` to mutate state (run migrations via release_command instead)
The platform declaration on line one matters more than it looks. Fly has two runtime models in the wild: legacy Apps (Nomad-based, deprecated for new projects) and Machines (Firecracker-based, current default). Claude trained on docs covering both. Without an explicit "Machines, not legacy Apps" line, Claude occasionally mixes syntax from both, which fails validation in non-obvious ways.
The region rules block is the second high-leverage section. Most stateful workloads have a single primary write region. If Claude does not know which region holds your primary Postgres node, it writes code that issues writes from any replica, which either errors or, worse, succeeds locally and desynchronises across regions. The PRIMARY_REGION env var is Fly's standard, and codifying it in CLAUDE.md keeps generated output aligned with the platform.
fly.toml patterns
fly.toml is the single config file Fly reads at deploy time. It is TOML with strict ordering rules for table arrays like [[mounts]] and [[services]]. Claude can edit it safely with the right patterns and a hard rule against rewriting from scratch.
A typical pattern for a stateful Node service on Machines, with one primary region and a mounted volume:
app = "app-api"
primary_region = "lhr"
kill_signal = "SIGINT"
kill_timeout = "30s"
[build]
dockerfile = "Dockerfile"
[deploy]
release_command = "/app/scripts/release.sh"
strategy = "rolling"
[env]
NODE_ENV = "production"
PORT = "8080"
PRIMARY_REGION = "lhr"
LOG_LEVEL = "info"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 1
processes = ["app"]
[[http_service.checks]]
grace_period = "10s"
interval = "30s"
method = "GET"
timeout = "5s"
path = "/healthz"
[[mounts]]
source = "app_data"
destination = "/data"
initial_size = "10gb"
auto_extend_size_threshold = 80
auto_extend_size_increment = "5gb"
auto_extend_size_limit = "50gb"
[[vm]]
size = "shared-cpu-2x"
memory = "1gb"
cpus = 2
cpu_kind = "shared"
A few things matter. The app and primary_region keys are effectively immutable after first deploy. Renaming the app means destroying and recreating, which kills volumes. Changing the primary region without a coordinated cutover orphans volumes. The CLAUDE.md hard rule against editing these exists because Claude has no way to know they are not safe to change.
The [http_service] block replaces the older [[services]] array for HTTP apps. It handles TLS, force-https, and the auto-start/auto-stop behaviour that makes Fly cheap to run idle. min_machines_running = 1 keeps one machine warm in the primary region. Setting it to zero gives full scale-to-zero, fine for low traffic but adds cold-start latency on the first request after idle.
The [[mounts]] block is a table array. The double brackets are not optional. A single-bracket [mounts] is a syntax error that fails at deploy time. Claude has been known to generate single-bracket variants when transcribing config from memory. Fly does not publish a JSON schema for fly.toml, so the rule is: edit specific keys, never rewrite wholesale, and run fly config validate before every deploy.
The [[vm]] block declares the machine size. Fly bills per machine per second based on it. A performance-2x machine costs roughly 4x a shared-cpu-2x of the same memory. CLAUDE.md should set a default size and a rule against upgrading without explicit user request.
Dockerfile, secrets, and volumes
Fly builds your image from a Dockerfile in the repo root by default. The Nixpacks fallback is opaque, slow on cold caches, and hard to debug. For production, write your own Dockerfile and put it under Claude's purview with explicit rules.
The pattern that produces small, fast, secure images on Fly:
# syntax=docker/dockerfile:1.7
# ---- Stage 1: install dependencies ----
FROM node:22-bookworm-slim AS deps
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && \
pnpm install --frozen-lockfile --prod
# ---- Stage 2: build application ----
FROM node:22-bookworm-slim AS build
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
COPY . .
RUN pnpm build
# ---- Stage 3: distroless runtime, non-root ----
FROM gcr.io/distroless/nodejs22-debian12:nonroot AS runtime
WORKDIR /app
ENV NODE_ENV=production \
PORT=8080
COPY --from=deps --chown=nonroot:nonroot /app/node_modules ./node_modules
COPY --from=build --chown=nonroot:nonroot /app/dist ./dist
COPY --from=build --chown=nonroot:nonroot /app/package.json ./package.json
USER nonroot
EXPOSE 8080
CMD ["dist/server.js"]
Three stages. deps installs only production dependencies. build installs everything and runs the build. runtime starts from gcr.io/distroless/nodejs22-debian12:nonroot, which contains a Node binary, the libraries it needs, and nothing else. No shell, no package manager, no apt. The image weighs around 150 MB compressed instead of 1.2 GB for node:22: faster pulls, faster rolling restarts, smaller attack surface.
The nonroot tag suffix matters. The default distroless image runs as UID 0. The nonroot variant runs as UID 65532. For a Fly Machine that mounts a volume at /data, the volume must be owned by UID 65532 or the application cannot write to it. Forget this and the first write fails silently.
The "no ARG for secrets" rule is the most-violated one in Claude-generated Dockerfiles. ARG values are stored in the image manifest and visible to anyone who can pull. If Claude writes ARG STRIPE_SECRET_KEY and the build runs with --build-arg STRIPE_SECRET_KEY=sk_live_..., the live key sits in the image layers indefinitely. Inject secrets at runtime via fly secrets set, where they live encrypted in Fly's vault and are mounted as env vars at boot.
The secrets workflow:
# Set a secret on the production app, takes effect on next deploy
fly secrets set STRIPE_SECRET_KEY=sk_live_xxx --app app-api
# Set multiple secrets in one command (atomic, single restart)
fly secrets set DATABASE_URL=postgres://... REDIS_URL=redis://... --app app-api
# List all secret names (values are not shown, ever)
fly secrets list --app app-api
# Remove a secret, takes effect on next deploy
fly secrets unset OLD_API_KEY --app app-api
# Stage a secret without restarting (applies on next manual deploy)
fly secrets set KEY=value --stage --app app-api
There is no read-back. If a secret is set then forgotten, the only recovery is to set it again. This trips up Claude, which assumes config can be read. The CLAUDE.md rule: every secret name lives in .env.example with a comment. The Claude Code environment variables guide covers the pattern across hosts.
The volumes workflow is more dangerous because volumes are physical state that survives app deletion until destroyed explicitly:
# Create a 10GB volume in London for the production app
fly volumes create app_data --size 10 --region lhr --app app-api
# List volumes for the app, shows region, size, attached machine
fly volumes list --app app-api
# Extend a volume in place (no machine restart for the extend itself)
fly volumes extend vol_xxxxx --size 20 --app app-api
# Destroy a volume, IRREVERSIBLE, only run with explicit user confirmation
fly volumes destroy vol_xxxxx --app app-api
The "one volume per machine" constraint is platform, not preference. Volumes are local SSDs attached to a single Firecracker VM. They cannot be mounted by two machines at once, and they cannot move between regions. When scaling a stateful app from 1 to 3 machines in the same region, the correct sequence is: create 2 more volumes, then fly scale count 3. Skip the volume step and two of the three machines boot without storage and fail health checks. For wider Docker patterns across hosts, see the Claude Code with Docker guide.
Machines vs Apps and multi-region databases
Fly's runtime is Machines: per-VM scheduling, per-machine billing, fast scale-up via the API. The legacy Apps platform (Nomad-scheduled, application-wide instances) still works for older projects but is not the path for new ones. The CLAUDE.md rule "Machines, not legacy Apps" keeps generated config in the right dialect.
The practical differences:
| Aspect | Machines | Legacy Apps |
|---|---|---|
| Scheduling | Per-machine, you control region and count | Region-wide, Fly schedules instances |
| Scale-to-zero | First-class, default for new apps | Possible but awkward |
| API | fly machines run, fly machines update |
fly scale count, fly regions add |
| Ephemeral workers | Trivial, run a one-shot machine | Requires a separate process group |
| New projects | Yes | No |
The decision rule for CLAUDE.md: new project means Fly Machines, full stop. One-shot jobs use fly machines run --rm. Long-running workers go in a separate process group or a separate app.
The multi-region database question is where most Fly projects either get it right early or pay for it later. flyctl postgres provisions a primary node in your chosen region and read replicas elsewhere. The primary accepts writes, the replicas accept reads only, and your app handles routing.
The standard pattern: every machine sets PRIMARY_REGION. The app checks the request's Fly-Region header against PRIMARY_REGION. If they match, handle normally. If they do not match and the request is a write, return a fly-replay header asking the proxy to replay in the primary region. Fly's edge proxy honours fly-replay and the request lands on a primary-region machine.
// middleware/region-replay.js
export function regionReplay(req, res, next) {
const requestRegion = req.headers['fly-region']
const primaryRegion = process.env.PRIMARY_REGION
const isWrite = ['POST', 'PUT', 'PATCH', 'DELETE'].includes(req.method)
if (isWrite && requestRegion && requestRegion !== primaryRegion) {
res.setHeader('fly-replay', `region=${primaryRegion}`)
res.status(409).end()
return
}
next()
}
CLAUDE.md addition: reads hit local replicas, writes replay to PRIMARY_REGION, background jobs and cron pin to primary via [processes], and migrations target the primary DB endpoint. For cross-region read latency, the Claude Code with Cloudflare Workers guide covers an alternative where the edge handles reads and Fly handles the primary database.
Hard rules and deploy workflow
Claude Code can run fly commands via Bash. The pattern that works: Claude runs fly config validate, fly secrets list, fly logs --no-tail, and fly deploy --build-only. It does not run fly deploy against production, fly machines destroy, fly volumes destroy, or any command that mutates the running app without explicit instruction. The local-to-production flow:
# 1. Make changes locally
pnpm dev
# 2. Validate fly.toml syntax (catches the [mounts] vs [[mounts]] class of error)
fly config validate
# 3. Build the image without deploying, catches Dockerfile + lockfile issues
fly deploy --build-only --image-label preview
# 4. Push to feature branch
git push origin feat/region-replay
# 5. CI runs tests, posts results on PR
# 6. Merge to main, GitHub Actions runs `fly deploy --app app-api`
# Production deploy happens in CI, never from Claude's terminal directly
The fly config validate step is the cheapest insurance you can buy. It runs the same validation Fly's API runs at deploy and exits non-zero on any issue. Run it before any push that touches fly.toml.
The permission hooks for a Fly project, in .claude/settings.local.json:
{
"permissions": {
"allow": [
"Bash(pnpm dev*)",
"Bash(pnpm build*)",
"Bash(pnpm test*)",
"Bash(fly config validate*)",
"Bash(fly secrets list*)",
"Bash(fly logs --no-tail*)",
"Bash(fly deploy --build-only*)",
"Bash(fly status*)",
"Bash(fly machines list*)",
"Bash(fly volumes list*)",
"Bash(git push origin feat/*)",
"Bash(git push origin fix/*)"
],
"deny": [
"Bash(fly deploy --app app-api*)",
"Bash(fly secrets set*--app app-api*)",
"Bash(fly secrets unset*--app app-api*)",
"Bash(fly machines destroy*)",
"Bash(fly volumes destroy*)",
"Bash(fly apps destroy*)",
"Bash(fly scale count*--app app-api*)",
"Bash(fly ssh console*--app app-api*)",
"Bash(git push origin main*)",
"Bash(git push --force*)"
]
}
}
The deny list blocks production-app mutations, secret writes, machine and volume destruction, scale changes, SSH into production, and direct pushes to main. Staging commands stay allowed via the absence of --app app-api-staging in the deny list. The Claude Code permissions guide covers the full pattern.
Three areas warrant manual review. First, fresh fly.toml files written from scratch: Claude occasionally produces [mounts] instead of [[mounts]], or mixes legacy [[services]] syntax with current [http_service] syntax. Always validate after generation. Second, release_command scripts: they run in a one-shot Machine before the new version takes traffic, and a failure aborts the deploy. Verify the script runs from a clean state with only the secrets and DB connection it needs. Third, volumes during scale-out: adding a machine in a new region without creating a volume there first is the most common Fly footgun.
Seven hard rules are the difference between deploys that ship and deploys that break:
- The platform is Machines, not legacy Apps. State this on line one of CLAUDE.md.
fly config validateruns before every deploy that touchesfly.toml.- Every secret goes in
.env.examplefirst, then viafly secrets set. Never in Dockerfile ARGs. - Final Dockerfile stage is distroless or alpine, runs as non-root, copies with
--chown. - Volumes are created per-region before scaling into that region. Never the other way around.
- Writes replay to
PRIMARY_REGIONvia thefly-replayheader. Reads can land anywhere. - Production deploys go through CI on git push to main, never
fly deployfrom Claude's terminal.
Claude Code performs at the level of the context you give it. Without a Fly-specific CLAUDE.md, it generates fly.toml with TOML syntax errors, bakes secrets into Dockerfile build args, scales stateful apps without provisioning volumes, and treats every region as primary. With the configuration above, it respects the platform's primitives, declares the primary region, manages secrets through fly secrets set, and ships through CI. Claudify ships a Fly-specific CLAUDE.md, Dockerfile, and .claude/settings.local.json deny list pre-configured for Node, Python, and Go on Fly Machines. For broader patterns, the Claude Code best practices guide and the CLAUDE.md explained guide cover the principles, and the Claude Code deploy guide compares Fly to other hosts head-to-head.
More like this
Ready to upgrade your Claude Code setup?
Get Claudify