Claude Code with Kubernetes: Manifests, Kustomize, RBAC
Using Claude Code on Kubernetes projects
Kubernetes is the platform where Claude Code can produce the most output per hour and the most carnage per careless command. The surface area covers Deployments, Services, Ingresses, ConfigMaps, Secrets, HPAs, NetworkPolicies, RBAC, kustomize, Helm, and a kubectl that will happily delete a namespace if you give it the chance. Claude knows the whole catalogue. What Claude does not know is which conventions your platform team enforces and which clusters are production.
Without a project-specific CLAUDE.md, Claude generates Deployments without resource requests, Services with the wrong selector, RBAC that grants cluster-admin to a single pod, and kubectl commands that mutate production. The fix is the same as every other infrastructure surface: write the rules down, let Claude follow them, gate the destructive primitives.
This guide covers the CLAUDE.md patterns that prevent those failures. If you are new to Claude Code, the Claude Code setup guide covers installation. For the foundational concepts, the CLAUDE.md explained guide walks through the file format.
The Kubernetes CLAUDE.md template
The CLAUDE.md at your project root is read before every Claude Code session. For a Kubernetes project, it needs to declare which packaging tool you use, which API versions are current, which namespaces and clusters are in scope, and the manifest conventions that govern every YAML file Claude writes.
# Kubernetes project rules
## Stack
- Cluster: EKS 1.32 (prod), kind 0.24 (local), k3d 5.7 (CI)
- Packaging: kustomize v5 (built into kubectl), Helm 3.16 for third-party charts only
- Ingress: AWS Load Balancer Controller (ALB), NGINX Ingress on local
- Service mesh: none (do not introduce without approval)
- Secrets: External Secrets Operator -> AWS Secrets Manager
- Container registry: ECR, image tags = git SHA, never `latest`
- Namespace per service: orders, payments, billing, shared-infra
- Cluster contexts: dev-eks, staging-eks, prod-eks (never aliased to short names)
## Repository structure
- base/: kustomize bases (one folder per workload: deployment, service, hpa, configmap, serviceaccount, role, rolebinding)
- overlays/dev/, overlays/staging/, overlays/prod/: environment-specific patches
- charts/: vendored third-party Helm charts (read-only, do not modify upstream)
- scripts/: operational scripts, never run in CI without review
- policy/: OPA/Gatekeeper constraints, admission webhook configs
## API versions (current, do not regress)
- Deployment, StatefulSet, DaemonSet: apps/v1
- Service, ConfigMap, Secret, ServiceAccount, Namespace: v1
- Ingress, NetworkPolicy: networking.k8s.io/v1
- HorizontalPodAutoscaler: autoscaling/v2
- Role, RoleBinding, ClusterRole, ClusterRoleBinding: rbac.authorization.k8s.io/v1
- CronJob, Job: batch/v1
- PodDisruptionBudget: policy/v1
- NEVER use deprecated APIs (extensions/v1beta1, autoscaling/v1, policy/v1beta1)
## Manifest conventions (HARD)
- Every workload sets: resources.requests, resources.limits, livenessProbe, readinessProbe, securityContext
- Every workload has its own ServiceAccount, never `default`
- Every pod runs as non-root: runAsNonRoot: true, runAsUser >= 10000
- Every container drops ALL capabilities, adds back only what is needed
- Every container: readOnlyRootFilesystem: true unless justified
- Mandatory labels: app.kubernetes.io/name, /instance, /version, /component, /part-of
- Image tags = git SHA, never `latest` or floating tags
- imagePullPolicy: IfNotPresent (Always only on `latest`, which is forbidden)
## kubectl rules
- NEVER run kubectl against prod-eks
- All mutating commands require `--context dev-eks` or `--context staging-eks` explicitly
- Prefer `kubectl apply` over `kubectl create`, except for Job
- NEVER use `kubectl edit`, edits are not reproducible
- `kubectl delete` denied except on ephemeral pods (`kubectl delete pod`)
- Destructive verbs (delete on Deployment, StatefulSet, Namespace, PVC) all denied
## Build and validate
- Deploy: `kustomize build overlays/dev | kubectl apply --context dev-eks -f -`
- Pre-commit: `kustomize build overlays/dev | kubeconform -strict -summary`
- Pre-commit: `kustomize build overlays/dev | conftest test --policy policy/`
- PRs run kustomize build, kubeconform, conftest, dry-run apply for every overlay
Three rules in this CLAUDE.md prevent the most common Claude Code failures with Kubernetes.
The manifest convention block is the highest-leverage instruction. Kubernetes training data is dominated by tutorial-grade YAML: no resource limits, no probes, no security context, the default service account, container as root. Without explicit prohibitions, Claude reproduces that style, and manifests fail admission control or pass review and detonate at scale.
The context-pinning rule stands between Claude and a production incident. If Claude can run kubectl without specifying a context, the active context wins, and the active context can be anything. Forcing --context dev-eks on every mutating call makes the cluster target visible in the command itself.
The API version block prevents regression to deprecated APIs. Claude has read a lot of stale documentation. Without the block you get extensions/v1beta1 Deployments in 2026 (rejected outright by 1.22+) or autoscaling/v1 HPAs that ignore custom metrics.
Manifest patterns that hold up under review
A correct Kubernetes manifest is least-privilege, observable, and scheduled correctly by the kubelet. Claude produces manifests at this bar when the patterns are explicit.
# base/orders-api/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: orders-api
labels: &labels
app.kubernetes.io/name: orders-api
app.kubernetes.io/version: "1.42.0"
app.kubernetes.io/part-of: orders
spec:
replicas: 3
revisionHistoryLimit: 5
strategy:
type: RollingUpdate
rollingUpdate: { maxSurge: 1, maxUnavailable: 0 }
selector:
matchLabels: { app.kubernetes.io/name: orders-api }
template:
metadata:
labels: *labels
spec:
serviceAccountName: orders-api
securityContext:
runAsNonRoot: true
runAsUser: 10001
fsGroup: 10001
seccompProfile: { type: RuntimeDefault }
containers:
- name: api
image: 123456789012.dkr.ecr.eu-west-2.amazonaws.com/orders-api:7a3f9c2
imagePullPolicy: IfNotPresent
ports: [{ name: http, containerPort: 8080 }]
envFrom:
- configMapRef: { name: orders-api-config }
- secretRef: { name: orders-api-secrets }
resources:
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 500m, memory: 512Mi }
livenessProbe:
httpGet: { path: /healthz, port: http }
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: http }
initialDelaySeconds: 5
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities: { drop: ["ALL"] }
volumeMounts: [{ name: tmp, mountPath: /tmp }]
volumes: [{ name: tmp, emptyDir: {} }]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels: { app.kubernetes.io/name: orders-api }
The runAsNonRoot: true plus a high UID prevents the container running as root even if the image's USER directive is missing. The readOnlyRootFilesystem: true with an emptyDir mounted at /tmp catches the common case where an app writes scratch data. Dropping ALL capabilities is the right default for a typical HTTP service.
The Service, ConfigMap, and Ingress manifests follow the same conventions:
# base/orders-api/service.yaml
apiVersion: v1
kind: Service
metadata: { name: orders-api }
spec:
type: ClusterIP
selector: { app.kubernetes.io/name: orders-api }
ports: [{ name: http, port: 80, targetPort: http }]
---
# base/orders-api/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata: { name: orders-api-config }
data:
LOG_LEVEL: info
ORDERS_TABLE_NAME: orders
AWS_REGION: eu-west-2
---
# base/orders-api/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: orders-api
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
rules:
- host: orders.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service: { name: orders-api, port: { name: http } }
Secrets are not committed in plaintext. They are produced by an ExternalSecret resource that pulls from AWS Secrets Manager:
# base/orders-api/externalsecret.yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata: { name: orders-api-secrets }
spec:
refreshInterval: 1h
secretStoreRef: { name: aws-secrets-manager, kind: ClusterSecretStore }
target: { name: orders-api-secrets, creationPolicy: Owner }
data:
- secretKey: DATABASE_URL
remoteRef: { key: prod/orders-api, property: database_url }
- secretKey: STRIPE_SECRET_KEY
remoteRef: { key: prod/orders-api, property: stripe_secret_key }
Hard rule in CLAUDE.md: no plaintext kind: Secret with a data: block lives in the repo. Claude reaches for inline Secrets if you let it, because that is the shape of every tutorial. ExternalSecret with a Secrets Manager backend is the only form that survives a security review. For environment variable conventions inside the container, the Claude Code environment variables guide covers the patterns.
kustomize, Helm, and the multi-environment overlay
The kustomize-versus-Helm decision shapes every other choice. Both work. The right answer for application teams: kustomize for your own workloads, Helm for vendor charts.
kustomize is built into kubectl, has no templating language, and overlays are real YAML you can diff. Claude generates kustomize patches accurately because there is no templating to misalign. The overlay model maps directly onto environments: base/ holds the canonical manifest, overlays/dev/ and overlays/prod/ patch what differs.
Helm is the right choice for third-party software you do not own (cert-manager, external-secrets, ingress-nginx, Prometheus). Vendor Helm releases via helm template into a charts/ directory so the rendered output is in version control.
# base/orders-api/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels: { app.kubernetes.io/part-of: orders }
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
- externalsecret.yaml
- serviceaccount.yaml
- role.yaml
- rolebinding.yaml
- hpa.yaml
- pdb.yaml
- ingress.yaml
images:
- name: 123456789012.dkr.ecr.eu-west-2.amazonaws.com/orders-api
newTag: 7a3f9c2
---
# overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: orders
resources: [../../base/orders-api]
patches:
- { path: deployment-patch.yaml, target: { kind: Deployment, name: orders-api } }
- { path: hpa-patch.yaml, target: { kind: HorizontalPodAutoscaler, name: orders-api } }
configMapGenerator:
- name: orders-api-config
behavior: merge
literals: [LOG_LEVEL=warn, ORDERS_TABLE_NAME=orders-prod]
The overlay patches only the values that differ between environments: replica counts, resource limits, HPA thresholds, log level, table names. Everything else flows through from base/. A change to base propagates to all environments, which is what you want for security defaults and what you do not want for replica counts.
CLAUDE.md keeps Claude inside the overlay pattern instead of forking divergent manifests per environment:
## kustomize rules
- New workloads start in base/, then get an overlay per environment
- Overlays only patch values that legitimately vary per environment
- NEVER copy a base manifest into an overlay and edit it inline, use a patch
- `configMapGenerator` with `behavior: merge` for per-environment config
- `images:` block in base sets the registry, overlays override `newTag` only
- All overlay patches are JSON Patch or strategic merge, never literal copy
For Helm-installed components, treat the chart as a black box. helm template the chart with your values file, commit the rendered output, review changes in PR. Claude reasons about rendered output the same way it reads kustomize output.
RBAC, ServiceAccounts, and namespace strategy
RBAC is where careless generations turn into root-on-cluster. The discipline: one ServiceAccount per workload, one Role per ServiceAccount, no ClusterRoles unless required, and a default-deny NetworkPolicy in every namespace.
# base/orders-api/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: orders-api
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/orders-api-irsa
---
# base/orders-api/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata: { name: orders-api }
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["orders-api-config", "orders-api-feature-flags"]
verbs: ["get", "watch", "list"]
---
# base/orders-api/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata: { name: orders-api }
subjects: [{ kind: ServiceAccount, name: orders-api }]
roleRef: { kind: Role, name: orders-api, apiGroup: rbac.authorization.k8s.io }
Three patterns matter here. The ServiceAccount carries the IRSA annotation that maps it to an IAM role, so AWS calls from the pod use scoped credentials rather than node-level access. The Role names specific ConfigMaps instead of using a wildcard. The RoleBinding stays inside the workload's namespace.
CLAUDE.md rules for RBAC need to be uncompromising:
## RBAC rules (HARD)
- One ServiceAccount per workload, never reuse the `default` ServiceAccount
- One Role per ServiceAccount, namespaced where possible
- ClusterRole only when the workload genuinely needs cluster-scope access (operators, controllers)
- NEVER use ClusterRoleBinding to a default ClusterRole (cluster-admin, edit, admin, view) for application workloads
- All verbs explicit, NEVER `verbs: ["*"]`
- All resources explicit, NEVER `resources: ["*"]`
- resourceNames when the workload accesses a known fixed set of objects
- AWS access via IRSA only, never via node IAM role or static credentials
- If a needed RBAC verb is unknown, ASK before guessing
Namespace strategy: one per service, plus shared infrastructure namespaces (ingress-system, cert-manager, external-secrets, monitoring). Each namespace gets a default-deny NetworkPolicy at creation, and workloads add explicit allow rules.
# base/orders/networkpolicy-default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny }
spec:
podSelector: {}
policyTypes: ["Ingress", "Egress"]
---
# base/orders-api/networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: orders-api }
spec:
podSelector: { matchLabels: { app.kubernetes.io/name: orders-api } }
policyTypes: ["Ingress", "Egress"]
ingress:
- from: [{ namespaceSelector: { matchLabels: { kubernetes.io/metadata.name: ingress-system } } }]
ports: [{ protocol: TCP, port: 8080 }]
egress:
- to: [{ namespaceSelector: { matchLabels: { kubernetes.io/metadata.name: kube-system } } }]
ports: [{ protocol: UDP, port: 53 }]
- to: []
ports: [{ protocol: TCP, port: 443 }]
A default-deny in the namespace plus an explicit allow per workload produces zero-trust without manual auditing. Claude needs the pattern in CLAUDE.md or it skips NetworkPolicies entirely. For locking Claude Code's runtime kubectl permissions, the Claude Code permissions guide covers the allowlist.
Resource limits, HPA, and debugging pods
Requests are what the scheduler reads to place a pod. Limits are what the kubelet enforces to evict misbehaving pods. Getting both right is the difference between an autoscaler that works and one that thrashes.
# base/orders-api/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: orders-api }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: orders-api }
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
- type: Resource
resource: { name: memory, target: { type: Utilization, averageUtilization: 80 } }
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies: [{ type: Percent, value: 100, periodSeconds: 30 }]
scaleDown:
stabilizationWindowSeconds: 300
policies: [{ type: Percent, value: 25, periodSeconds: 60 }]
The autoscaling/v2 API supports multiple metrics and explicit scale behaviour. Slow scale-down (5-minute window, 25% per minute) prevents replica flapping. Fast scale-up (30-second window, 100% per 30 seconds) handles real spikes. Pin these in CLAUDE.md so Claude does not fall back to the documented defaults that thrash under spiky load.
Requests-to-limits ratio is the second tuning knob. Set requests at the steady-state median, limits at the peak you tolerate. For a typical HTTP API: 100m CPU request, 500m CPU limit, 256Mi memory request, 512Mi memory limit. Requests equal to limits (Guaranteed QoS) suits latency-critical or stateful workloads. Limits without requests is wrong and should be denied in CLAUDE.md.
## Resource and scaling rules
- All workloads set requests AND limits, never just limits
- CPU limit usually 2-5x request, memory limit usually 2x request (tune per workload)
- HPA: minReplicas >= 3 for production, >= 1 for dev
- HPA: separate scaleUp and scaleDown policies, never accept defaults
- HPA: CPU target around 60-70%, memory target around 75-80%
- PodDisruptionBudget: minAvailable: N-1 for stateless, minAvailable: 50% for larger fleets
- Never set memory `request` higher than `limit` (kubelet rejects)
PodDisruptionBudgets prevent voluntary evictions from draining the deployment during node upgrades:
# base/orders-api/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: orders-api }
spec:
minAvailable: 2
selector: { matchLabels: { app.kubernetes.io/name: orders-api } }
Debugging uses the same kubectl primitives, scoped to dev or staging:
# Always lead with events, they explain most failures
kubectl --context dev-eks -n orders get events --sort-by=.lastTimestamp
# State + rollout
kubectl --context dev-eks -n orders get deploy,pod,svc,ingress
kubectl --context dev-eks -n orders rollout status deployment/orders-api
# Pod-level
kubectl --context dev-eks -n orders logs deployment/orders-api -c api --since=10m
kubectl --context dev-eks -n orders logs --previous -l app.kubernetes.io/name=orders-api
kubectl --context dev-eks -n orders exec -it deployment/orders-api -c api -- /bin/sh
# Resource pressure + network debug pod
kubectl --context dev-eks -n orders top pod
kubectl --context dev-eks -n orders run debug --rm -it --image=nicolaka/netshoot --restart=Never -- bash
Include the debugging recipes in CLAUDE.md verbatim so Claude reaches for them instead of inventing variants:
## Debugging conventions
- ALWAYS pass `--context dev-eks` or `--context staging-eks` explicitly
- First step on any failure: `kubectl get events --sort-by=.lastTimestamp -n <namespace>`
- For crashlooping pods: `kubectl logs --previous`
- For unscheduled pods: `kubectl describe pod` (look at Events + Conditions)
- For OOMKilled: check `lastState.terminated.reason`, increase memory limit
- For exec sessions: use `nicolaka/netshoot` for network debugging, busybox is too sparse
- NEVER use `kubectl edit`, change the manifest in base/ or overlay and re-apply
"Events first" is the single biggest debugging speed-up. Most Kubernetes failures are explained in events. Without the rule Claude goes straight to logs, which is the wrong order. For CI patterns that catch failures before they reach the cluster, the Claude Code deploy guide covers the pipeline side.
Hard rules and final guardrails
A short, durable rule list at the bottom of every Kubernetes CLAUDE.md is the last line of defence. Claude never violates these regardless of what the user asks for.
## Hard rules
1. NEVER use `verbs: ["*"]` or `resources: ["*"]` in a Role or ClusterRole.
2. NEVER bind an application ServiceAccount to a default ClusterRole (cluster-admin, admin, edit).
3. NEVER commit a plaintext `kind: Secret` with a `data:` block, use ExternalSecret.
4. NEVER use the `default` ServiceAccount for a workload, every workload gets its own.
5. NEVER set image tag to `latest`, use a git SHA or immutable tag.
6. NEVER skip resources.requests AND resources.limits, both are required on every container.
7. NEVER skip livenessProbe AND readinessProbe on a long-running workload.
8. NEVER run kubectl against the prod-eks context from a Claude Code session.
9. NEVER `kubectl edit`, all changes go through manifests in version control.
10. NEVER use deprecated APIs (extensions/v1beta1, autoscaling/v1, policy/v1beta1).
11. If an API resource, verb, or admission constraint is uncertain, ASK before generating manifests that depend on it.
Claude can hold eleven constraints in mind and apply them consistently. These cover the failure modes that produce real Kubernetes incidents.
The "ask if uncertain" rule deserves emphasis. The API surface is wide enough that Claude's training has gaps, especially around custom resources, admission webhooks, and version-specific schema changes. The honest behaviour, when Claude is unsure whether a field is valid on the API version in use, is to ask. CLAUDE.md makes that explicit.
The .claude/settings.local.json for a Kubernetes project should allow read-only kubectl freely, allow mutating kubectl only on non-prod contexts, and deny destructive verbs outright. Pair the kubectl allowlist with the kustomize and kubeconform commands Claude needs to validate work before applying. For projects combining Kubernetes manifests with Terraform-managed cluster infrastructure, the Claude Code Terraform guide covers the IaC side. For container builds, the Claude Code Docker guide covers the image side. For cluster providers, the Claude Code AWS guide, Claude Code GCP guide, and Claude Code Azure guide cover platform-specific patterns.
Running real Kubernetes systems with Claude Code
The configuration in this guide produces a development environment where manifests pass admission control on the first try, kustomize overlays survive promotion across environments, RBAC stays least-privilege, NetworkPolicies enforce default-deny, resource specs let the autoscaler work, and kubectl mutations stay off production. The result is Claude generating Kubernetes code at the level of a careful platform engineer, not a tutorial copy-paste.
Claude Code performs at the level of context you give it. Without CLAUDE.md it generates manifests without probes, RBAC with wildcard verbs, Secrets in plaintext, and kubectl without context flags. With the configuration above it follows your conventions, asks when uncertain, and lets you focus on the parts of Kubernetes engineering that need a human. The Claude Code best practices guide covers principles across project types. Claudify includes a Kubernetes-specific CLAUDE.md template, pre-configured for kustomize overlays, IRSA-backed ServiceAccounts, default-deny NetworkPolicies, and the manifest conventions in this guide.
More like this
Ready to upgrade your Claude Code setup?
Get Claudify