Build a Production Cloud-Native Platform with Kairos, k0rdent, and bindy

Kubernetes
Build a Production Cloud-Native Platform with Kairos, k0rdent, and bindy

The Problem with Commercial PaaS

Platform teams face a recurring dilemma: build a production-grade internal developer platform from scratch or pay for a heavyweight commercial PaaS that locks you in, costs a fortune at scale, and hides the machinery you actually need to understand.

The open-source ecosystem has quietly matured to the point where a small platform team can assemble a credible alternative using composable, CNCF-aligned tools. The combination of Kairos, k0rdent, and bindy covers the three layers every cluster fleet needs: an immutable node OS, cluster lifecycle management, and workload delivery.

Layer 1: Kairos — Immutable Nodes, No SSH Required

Kairos is an immutable Linux distribution purpose-built for Kubernetes nodes. The core idea is straightforward: your node OS is a container image. Upgrades are OCI pulls, not apt upgrade runs. The filesystem is read-only at runtime. If a node goes sideways, you do not SSH in and patch it—you roll a new image.

This approach eliminates an entire class of configuration drift. Every node running the same image tag is, by definition, identical. There are no snowflake nodes that accumulated months of manual changes.

Bootstrapping with Peer-to-Peer

Kairos uses a peer-to-peer network layer built on libp2p to bootstrap cluster nodes without a central coordinator. You burn a Kairos image to a node, supply a shared network token, and nodes discover each other and form a cluster—no DHCP reservation, no external bootstrap server required.

# Minimal cloud-config for a first control-plane node
cat > cloud-config.yaml <<EOF
#cloud-config
kairos:
  network_token: "$(openssl rand -base64 20)"
k3s:
  enabled: true
  args:
    - --cluster-init
EOF

For edge and bare-metal deployments, this matters enormously. You can ship pre-flashed nodes to a remote site, power them on, and have a working cluster without a technician touching a keyboard.

OCI-Based OS Upgrades

Upgrades flow through the System Upgrade Controller. You update an image tag in a manifest, the controller drains nodes one at a time, applies the new OS image, and reboots. Rollbacks are a tag revert.

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: kairos-upgrade
spec:
  version: "v2.5.0"
  serviceAccountName: system-upgrade
  upgrade:
    image: quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v2.5.0

This is GitOps-native: your OS version lives in a repo alongside your application manifests. The same pull-request workflow that ships features ships OS upgrades.

Layer 2: k0rdent — Cluster Lifecycle at Scale

Once you have a reliable node substrate, the next problem is managing many clusters. A single platform team might own dozens of clusters across environments, regions, and cloud providers. k0rdent is a Kubernetes-native cluster lifecycle manager built on top of Cluster API (CAPI), adding a higher-level abstraction that makes multi-cluster operations tractable.

Why Not Raw Cluster API?

Cluster API is powerful but verbose. Provisioning one cluster requires coordinating several interdependent CRDs across infrastructure, bootstrap, and control-plane providers. k0rdent wraps this with ClusterTemplates—opinionated, versioned blueprints that encode your organization’s cluster standards.

apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ClusterDeployment
metadata:
  name: prod-eu-west-1
spec:
  template: aws-standalone-cp-0.0.5
  credential: aws-creds
  config:
    region: eu-west-1
    controlPlane:
      instanceType: m5.xlarge
      replicas: 3
    worker:
      instanceType: m5.2xlarge
      replicas: 5

One manifest, one cluster. k0rdent handles the CAPI orchestration underneath.

Multi-Cloud Without the YAML Avalanche

k0rdent ships provider integrations for AWS, Azure, GCP, vSphere, and bare metal. The ClusterTemplate abstraction means you describe what you want, not how each provider achieves it. Switching from AWS to Azure means swapping the template name and credential reference—your higher-level config stays largely the same.

The management cluster runs k0rdent’s controllers. Target clusters register themselves as they come online, making them immediately available for workload delivery.

Layer 3: bindy — Placing Workloads Across the Fleet

With a fleet of clusters under management, the final challenge is placing workloads correctly. bindy handles cluster-aware application delivery: it evaluates cluster capabilities, labels, and constraints to route applications to the right clusters without manual targeting.

Think of bindy as a policy engine sitting between your application manifests and your cluster fleet. You declare where an application should run—by region, environment tier, or capability tag—and bindy resolves that to concrete clusters and drives delivery through a GitOps engine.

apiVersion: bindy.io/v1alpha1
kind: ApplicationBinding
metadata:
  name: payments-service
spec:
  application:
    source:
      repoURL: https://github.com/org/payments
      path: helm/payments
  selector:
    matchLabels:
      tier: production
      region: eu-west

This pays dividends once cluster counts grow past what one person can track mentally. Rather than maintaining per-cluster Argo CD ApplicationSets by hand, bindy turns cluster metadata into a routing layer.

Wiring It Together: The Reference Architecture

┌──────────────────────────────────┐
│  bindy  (workload distribution)  │
├──────────────────────────────────┤
│  k0rdent (cluster lifecycle)     │
├──────────────────────────────────┤
│  Kairos  (immutable node OS)     │
└──────────────────────────────────┘
          Management Cluster

A management cluster—itself running on Kairos—hosts k0rdent’s controllers and bindy’s control plane. Target clusters are provisioned by k0rdent, either on cloud infrastructure via CAPI or on bare metal via Kairos P2P bootstrapping. bindy watches the cluster inventory and routes applications based on selector rules.

Day-0: Standing Up the Management Cluster

  1. Flash a Kairos image to your first node with a cloud-config that enables k3s or k0s.
  2. Bootstrap additional control-plane nodes using the shared P2P network token.
  3. Install k0rdent into the management cluster via Helm.
  4. Configure provider credentials (AWS IAM role, Azure service principal, and so on).
helm repo add k0rdent https://k0rdent.github.io/charts
helm install k0rdent k0rdent/k0rdent \
  --namespace k0rdent-system \
  --create-namespace

Day-1: Provisioning Target Clusters

Apply ClusterDeployment manifests for each environment. k0rdent reconciles them, calling CAPI to provision infrastructure and install a Kubernetes distribution. Once a cluster reports Ready, k0rdent registers it in the inventory automatically.

Day-2: Shipping Applications

Add ApplicationBinding resources pointing at your Helm charts or Kustomize overlays. bindy resolves the selector to matching clusters and handles delivery. Upgrades are manifest updates—the same GitOps loop that manages cluster infrastructure manages application rollouts.

What You Do Not Need to Build

This stack deliberately leaves room for existing tools:

  • Observability: Add Prometheus and Grafana as cluster add-ons inside your ClusterTemplate.
  • Secrets: External Secrets Operator syncs from Vault or cloud secret managers into target clusters.
  • Networking: Cilium or Calico installs as a CAPI addon; bindy can enforce a network-policy capability label.
  • GitOps engine: Flux or Argo CD can sit underneath bindy for the actual apply step.

You are assembling, not re-inventing. The platform is the integration, not the components.

Trade-offs to Understand Before Committing

Operational complexity is real. Three unfamiliar projects mean three sets of CRDs, three upgrade tracks, and three communities to follow. Budget meaningful ramp-up time before production traffic lands.

Kairos on cloud VMs is a weaker fit. The immutable OS story shines on bare metal and edge. On cloud instances with managed node groups (EKS, GKE, AKS), the value is thinner and the overhead is higher. Use Kairos where it makes the most sense: self-managed nodes, edge, and hybrid deployments.

k0rdent is reduced CAPI complexity, not zero complexity. It cuts verbosity substantially, but Cluster API itself has a real learning curve. If you are managing fewer than roughly ten clusters, simpler tooling may be enough.

The Payoff

A team that invests in this stack gets a repeatable, auditable, GitOps-native platform it fully owns. Cluster definitions, OS images, and application placements all live in version control. There is no vendor console to log into, no proprietary API to reverse-engineer, and no per-seat license growing linearly with your headcount.

For platform teams willing to put in the initial setup work, the open-source stack is no longer a compromise—it is the pragmatic choice.

Sources