GitOps JIT privilege elevation
GitOps JIT privilege elevation
Purpose
Default developers and agents to read-only access for high-blast-radius systems (application databases, infrastructure-as-code state, secret stores). Write operations require a pull request requesting time-boxed elevation, reviewed and approved by a human, auto-granted on merge and auto-revoked on TTL expiry. The PR itself becomes the audit artefact.
Solves the “admin-by-default for convenience” failure mode without introducing friction that developers route around: the JIT elevation path is the sanctioned path, and it is agent-invokable.
Architecture
- Default credentials are read-only for app DB, Terraform/Pulumi state, secret stores, production IAM.
- When an agent or developer needs to write, they open a pull request against a dedicated GitOps repository (e.g.
ops-elevations). The PR describes the requested elevation using a structured template: target system, scope of write, duration (TTL), justification, rollback plan, risk tier. - A reviewer — policy-enforced: specific humans per risk tier, agent-approval explicitly forbidden — approves the PR.
- On merge, a CI pipeline grants the elevated credential with the TTL baked in. Common implementations: HashiCorp Vault with dynamic credentials, AWS STS session tokens, GCP short-lived service account keys, cloud-native IAM session grants.
- On TTL expiry (or explicit revoke PR), the credential is revoked automatically. Unused elevations auto-expire without manual cleanup.
- The PR history, merge log, and grant/revoke events together form the audit trail.
The load-bearing design choice is structural enforcement: the default credential genuinely cannot write, regardless of the agent’s request, until the elevation pipeline has fired. A procedural “please don’t write” policy is not this recipe.
Criteria advanced
PL4-least-privilegeIAM scoped read-only by default — direct level-2 contributor. “Strict least-privilege, write requires explicit elevation” is the recipe’s definition. Level-3 reachable when elevation patterns are logged, recurring legitimate elevations get scoped permanent grants, and unused permissions are auto-revoked (all three naturally fall out of a GitOps-based implementation with good PR analytics).PL4-branch-protectionBranch protection and source-control write scoping — the recipe’s integrity depends on the GitOps repo itself being a protected branch with human-approval requirement. This is a prerequisite, not an advancement — but deploying GitOps JIT forces the team to takePL4-branch-protectionseriously on the GitOps repo specifically, which is often aPL4-branch-protectiongap in practice.PL2-external-pr-reviewExternal PR review — direct level-2 contributor on the subset of PRs that are elevation requests. Layered review (agent pre-check that the diff is within policy + human approval) naturally shapes elevation PRs.PL2-agent-audit-trailAgent action audit trail — direct level-2 contributor. The PR is the audit artefact: decision reasoning (PR description), diff (the requested scope), reviewer identity (approver), timestamp, merge outcome, and downstream grant/revoke events are all captured by the git platform natively and queryable via the source control integration.PL3-source-controlSource control interaction — level-2 prerequisite; the recipe requires agents to open elevation PRs and query PR history natively, which is exactly whatPL3-source-controllevel-2 demands.PL5-pipeline-reliabilityPipeline reliability — the trigger from PR-merge to actual privilege grant is pipeline plumbing. Deploying this recipe forces investment in reliable webhook handling, which advancesPL5-pipeline-reliabilityregardless of other pipeline work.PL4-release-strategyCanary / blue-green / partial release — same “agent bounded by platform constraints” shape applied to writes rather than deploys. The agent cannot modify the elevation policy template and cannot skip the approval gate; these bounds are structural.PL1-decision-recordsDecision records (ADRs) — each elevation PR description is a micro-ADR: the decision, the context, the alternatives considered, the rationale. Well-structured PR templates make this automatic.
Prerequisites
Four structural prerequisites must be at level-2 for this recipe to score, not merely exist:
PL4-branch-protection≥ 2. Protected branches, human approval required, bypass audited. Without this, the GitOps repo can be self-merged and the gate is procedural.PL5-pipeline-reliability≥ 2. Reliable pipeline with agent-driven transitions. Without this, elevation PRs get stuck pending; developers learn to bypass the mechanism via backdoor credentials.PL3-source-control≥ 2. Agent can open PRs and query PR history natively. Without this, elevation is human-only — the recipe still works but loses agent-invocability, which is most of the value.PL2-external-pr-review≥ 2. PR review discipline, at least “layered agent + human glance”. Rubber-stamp reviews on elevation PRs defeat the mechanism silently.
These prerequisites are not just gating; they are the reason this one recipe advances so many criteria. The mechanism is load-bearing because the substrate is load-bearing.
Failure modes
- Agent self-approval. If an agent can approve another agent’s elevation PR (or its own via a proxy identity), the gate collapses. Mitigation: policy rules on approver identity — humans-only for elevation PRs, explicit rejection of agent approvers at the platform level.
- Missing TTL. Elevation granted, task completed, privilege never revoked → read-write drift back toward admin-by-default. Mitigation: TTL is required at grant time, not optional; pipelines refuse elevation PRs without a TTL field.
- Single-tier approval. Treating
DROP TABLEthe same as+1 replicais wrong. Mitigation: risk tiers in the elevation template with differentiated approver rules — high-risk requires named senior approvers; low-risk may accept any human reviewer. - Pipeline flakiness. Elevations stuck pending for minutes or hours teach developers to bypass. Mitigation: monitor pipeline SLO on elevation-path specifically; page on failures.
- Rubber-stamp reviews. The human approves without understanding, especially under deadline pressure. Mitigation: structured PR template forces the author to articulate justification and rollback plan; reviewer checklist makes skipped sections visible; periodic audit of approval-to-incident correlation.
- Credential leak via CI logs. The elevated credential flows through the pipeline; poorly configured CI logging echoes it. Mitigation: masked variables, log scanners, and the standard secret-hygiene disciplines.
- Privilege accretion via “just make it permanent”. Recurring elevations become permanent grants, defeating the recipe. Mitigation:
PL4-least-privilegelevel-3 explicitly distinguishes “recurring legitimate elevations get scoped permanent grants” (good — the permanent grant is scoped, not broad admin). The failure mode is scope inflation, not permanence itself.
Open design questions
- Risk-tier taxonomy. What’s the canonical set of tiers? Minimum probably:
cosmetic(cache clear, read replica promote),reversible-write(INSERT,+1 replica, config flag flip),semi-destructive(UPDATE, migration, IAM change),destructive(DROP,DELETE, infra teardown). Each tier needs its own approver policy and TTL ceiling. Worth a dedicated design pass before rolling out broadly. - Which credential-issuing substrate? Vault, cloud-native STS, short-lived JWT, something else? Depends on target platform; a multi-platform project may need more than one. First deployment picks the easiest substrate and the recipe generalises over time.
- How does the agent discover what elevations it’s allowed to request? A hard-coded template forces human intervention on every new pattern; a permissive template allows silent scope expansion via JIT requests. Middle ground: agent-readable policy document listing allowed elevation shapes per risk tier; requests outside the document require a separate policy-change PR.
- Integration with incident response. During an incident, the normal elevation flow may be too slow. Emergency-elevation path (e.g. break-glass with post-hoc audit) needs design, not ad-hoc invention under pressure.
Cost estimate
Medium to high. First deployment in a project: 2–4 engineer-weeks depending on platform maturity, most of which is plumbing (credential issuer integration, pipeline triggers, policy-rule enforcement) rather than the recipe pattern itself. Once the substrate exists, extending coverage to new target systems is 1–3 days per system.
Ongoing maintenance burden is moderate: policy evolves (new risk tiers, new target systems), approver rotations happen, pipeline observability needs care. Pays back in every incident where the “admin by default” failure mode is structurally impossible.
Related recipes
- Composes with: bot-token credential tenancy — the default read-only credential is typically a bot/service account whose identity the elevation temporarily expands, not a user identity being impersonated.
- Composes with: indexed per-entry registry — elevation-request history is itself a corpus worth indexing. Querying “which elevations were requested against prod DB in Q1” is the analytic value of keeping this history in structured form.
- Depends on (recipe-wise): none directly, but presumes a functioning source-control-first engineering culture. GitOps JIT in a team that merges to
mainwithout PRs is not this recipe. - Alternatives to: permanent admin with post-hoc audit (weaker: damage can’t be prevented, only documented); on-call break-glass credentials (weaker: all-or-nothing, no scope control); sudo-style command-level elevation (comparable in outcome but much harder to audit coherently).