SSH Access at Scale: Drop Static Keys for Short-Lived Certificates
The Hidden Time Bomb in Your ~/.ssh Directory
If your team has more than ten engineers and more than a handful of servers, you almost certainly have an SSH key problem—you just may not know it yet.
Static SSH keys are convenient. A developer generates a key pair once, drops the public key into ~/.ssh/authorized_keys on every server they need, and gets to work. It works fine for small teams. But as headcount grows, servers multiply, and engineers come and go, this model quietly accumulates risk:
- No expiry. A key created three years ago is just as valid today as the day it was added—even if the person who created it left the company.
- No central inventory. Keys live in
authorized_keysfiles scattered across hundreds of machines. There is no single place to audit who has access to what. - Painful revocation. When you need to remove access—for offboarding, a suspected breach, or a role change—you must hunt down every server that holds the key and remove it manually.
- No context. A public key is just a public key. It tells you nothing about when it was issued, what role it was issued for, or what restrictions should apply.
This is key sprawl. The operational cost compounds quietly until a security audit, an incident, or a wake-up call forces a reckoning.
SSH Certificates: A Better Mental Model
SSH certificates are not a new idea—OpenSSH has supported them since version 5.6—but they remain underused because the “copy the key” habit is hard to break.
Instead of trusting individual public keys, your servers trust a Certificate Authority (CA). Developers do not place their public keys directly on servers. Instead, they ask the CA to sign their public key, producing a certificate. That certificate carries:
- A TTL (time-to-live), after which it becomes automatically invalid
- A list of principals (the Linux usernames the holder may log in as)
- Optional critical options such as source-address restrictions or forced commands
The server checks that the certificate was signed by a trusted CA and has not expired. That is it. No per-server authorized_keys files to maintain.
The operational flip is dramatic: instead of managing N keys across M servers (N×M combinations), you manage one CA trust relationship per server and issue certificates on demand.
Setting Up HashiCorp Vault as Your SSH CA
Vault’s SSH secrets engine is the most widely adopted implementation of this pattern. The setup takes minutes.
Enable the SSH secrets engine
vault secrets enable -path=ssh ssh
vault write ssh/config/ca generate_signing_key=true
This generates a CA key pair inside Vault. The private key never leaves Vault’s encrypted storage. Retrieve the public key to place on your servers:
vault read -field=public_key ssh/config/ca > /etc/ssh/vault_ca.pub
Tell sshd to trust the Vault CA
Add a single line to /etc/ssh/sshd_config on every server:
TrustedUserCAKeys /etc/ssh/vault_ca.pub
Reload sshd. Any certificate signed by Vault now grants access on this host.
Define roles
Roles encode the access policy—which principals are allowed, the TTL, and any constraints:
vault write ssh/roles/prod-readonly \
key_type=ca \
allowed_users=app-readonly \
ttl=1h \
max_ttl=4h
Different roles for different environments (prod versus staging, read-only versus admin) give you fine-grained control without maintaining separate key pairs per environment.
Issue a certificate
When a developer needs access, they sign their public key against the appropriate role:
vault write -field=signed_key ssh/sign/prod-readonly \
public_key=@~/.ssh/id_ed25519.pub > ~/.ssh/id_ed25519-cert.pub
ssh -i ~/.ssh/id_ed25519 app-readonly@prod-server
If the role TTL is one hour, access automatically expires in one hour. No cleanup, no revocation chase.
The vault ssh Helper
Vault ships a convenience command that wraps the entire flow:
vault ssh -role=prod-readonly -mode=ca app-readonly@prod-server
It signs the key, writes the certificate to a temporary file, opens the SSH connection, and cleans up when the session ends. Developers run one command; the certificate lifecycle is invisible to them.
Tie Certificates to Your Identity Provider
Vault’s auth backends let you connect certificate issuance to your existing SSO. With OIDC or LDAP auth, a developer authenticates with their company credentials, receives a scoped Vault token, and can only request certificates for roles their group membership permits.
Offboarding becomes a one-step action: disable the account in your IdP. The Vault token immediately loses access to certificate roles. Any live certificates expire within their TTL window—no server changes required.
Migrating Without Disrupting Developers
The transition does not have to be a big bang. A staged rollout works well:
- Add CA trust first. Append
TrustedUserCAKeysto your servers without removing existingauthorized_keysentries. Both methods work in parallel during rollout. - Onboard new joiners with certificates only. Stop distributing static keys for anyone new.
- Set a retirement window for old keys. Give current engineers 30 days to switch over. Use auth logs to confirm which static keys are still active before removing them.
- Handle service accounts separately. Machine-to-machine auth should use Vault’s AppRole or cloud auth backends—not SSH keys.
Alternatives Worth Knowing
Vault is not the only path:
- AWS EC2 Instance Connect issues short-lived OTP-style keys for EC2, integrated with IAM. Minimal new infrastructure if your fleet is AWS-only.
- Teleport adds session recording and a web UI on top of certificate-based SSH. Higher operational complexity, good for compliance-heavy environments.
- Step CA (Smallstep) is a lightweight open-source CA focused on SSH and TLS certificates—simpler than Vault if you only need certificate authority functionality.
- Google Cloud OS Login ties SSH access to Google identities and IAM roles, ideal for GCP-native teams.
The underlying pattern is identical across all of them: CA-issued, short-lived certificates instead of manually distributed static keys.
What You Actually Gain
Once your fleet trusts a CA, the operational wins are concrete:
- Revocation is instant at the identity layer, not per server.
- Audit trails in Vault show exactly who requested access to what, when, and why.
- Certificate TTLs enforce least-privilege automatically—a developer who gets a one-hour certificate cannot quietly maintain standing access.
- No per-server key inventory. The CA public key file on each server is the only artifact you manage centrally.
- Smaller blast radius on compromise. A stolen certificate with a one-hour TTL is a far smaller problem than a stolen private key with no expiry.
The migration is measured in days, not months. The payoff is an access model that scales with your team without accumulating hidden technical debt in scattered authorized_keys files across your fleet.