Skip to content

GitOps JIT privilege elevation

Family
gate
Status
proposed
Complexity
high
Advances
PL4-least-privilege PL4-branch-protection PL2-external-pr-review PL2-agent-audit-trail PL3-source-control PL5-pipeline-reliability PL4-release-strategy PL1-decision-records
Prerequisites
PL4-branch-protection ≥ 2, PL5-pipeline-reliability ≥ 2, PL3-source-control ≥ 2, PL2-external-pr-review ≥ 2

GitOps JIT privilege elevation

Purpose

Default developers and agents to read-only access for high-blast-radius systems (application databases, infrastructure-as-code state, secret stores). Write operations require a pull request requesting time-boxed elevation, reviewed and approved by a human, auto-granted on merge and auto-revoked on TTL expiry. The PR itself becomes the audit artefact.

Solves the “admin-by-default for convenience” failure mode without introducing friction that developers route around: the JIT elevation path is the sanctioned path, and it is agent-invokable.

Architecture

  • Default credentials are read-only for app DB, Terraform/Pulumi state, secret stores, production IAM.
  • When an agent or developer needs to write, they open a pull request against a dedicated GitOps repository (e.g. ops-elevations). The PR describes the requested elevation using a structured template: target system, scope of write, duration (TTL), justification, rollback plan, risk tier.
  • A reviewer — policy-enforced: specific humans per risk tier, agent-approval explicitly forbidden — approves the PR.
  • On merge, a CI pipeline grants the elevated credential with the TTL baked in. Common implementations: HashiCorp Vault with dynamic credentials, AWS STS session tokens, GCP short-lived service account keys, cloud-native IAM session grants.
  • On TTL expiry (or explicit revoke PR), the credential is revoked automatically. Unused elevations auto-expire without manual cleanup.
  • The PR history, merge log, and grant/revoke events together form the audit trail.

The load-bearing design choice is structural enforcement: the default credential genuinely cannot write, regardless of the agent’s request, until the elevation pipeline has fired. A procedural “please don’t write” policy is not this recipe.

Criteria advanced

  • PL4-least-privilege IAM scoped read-only by default — direct level-2 contributor. “Strict least-privilege, write requires explicit elevation” is the recipe’s definition. Level-3 reachable when elevation patterns are logged, recurring legitimate elevations get scoped permanent grants, and unused permissions are auto-revoked (all three naturally fall out of a GitOps-based implementation with good PR analytics).
  • PL4-branch-protection Branch protection and source-control write scoping — the recipe’s integrity depends on the GitOps repo itself being a protected branch with human-approval requirement. This is a prerequisite, not an advancement — but deploying GitOps JIT forces the team to take PL4-branch-protection seriously on the GitOps repo specifically, which is often a PL4-branch-protection gap in practice.
  • PL2-external-pr-review External PR review — direct level-2 contributor on the subset of PRs that are elevation requests. Layered review (agent pre-check that the diff is within policy + human approval) naturally shapes elevation PRs.
  • PL2-agent-audit-trail Agent action audit trail — direct level-2 contributor. The PR is the audit artefact: decision reasoning (PR description), diff (the requested scope), reviewer identity (approver), timestamp, merge outcome, and downstream grant/revoke events are all captured by the git platform natively and queryable via the source control integration.
  • PL3-source-control Source control interaction — level-2 prerequisite; the recipe requires agents to open elevation PRs and query PR history natively, which is exactly what PL3-source-control level-2 demands.
  • PL5-pipeline-reliability Pipeline reliability — the trigger from PR-merge to actual privilege grant is pipeline plumbing. Deploying this recipe forces investment in reliable webhook handling, which advances PL5-pipeline-reliability regardless of other pipeline work.
  • PL4-release-strategy Canary / blue-green / partial release — same “agent bounded by platform constraints” shape applied to writes rather than deploys. The agent cannot modify the elevation policy template and cannot skip the approval gate; these bounds are structural.
  • PL1-decision-records Decision records (ADRs) — each elevation PR description is a micro-ADR: the decision, the context, the alternatives considered, the rationale. Well-structured PR templates make this automatic.

Prerequisites

Four structural prerequisites must be at level-2 for this recipe to score, not merely exist:

  • PL4-branch-protection ≥ 2. Protected branches, human approval required, bypass audited. Without this, the GitOps repo can be self-merged and the gate is procedural.
  • PL5-pipeline-reliability ≥ 2. Reliable pipeline with agent-driven transitions. Without this, elevation PRs get stuck pending; developers learn to bypass the mechanism via backdoor credentials.
  • PL3-source-control ≥ 2. Agent can open PRs and query PR history natively. Without this, elevation is human-only — the recipe still works but loses agent-invocability, which is most of the value.
  • PL2-external-pr-review ≥ 2. PR review discipline, at least “layered agent + human glance”. Rubber-stamp reviews on elevation PRs defeat the mechanism silently.

These prerequisites are not just gating; they are the reason this one recipe advances so many criteria. The mechanism is load-bearing because the substrate is load-bearing.

Failure modes

  • Agent self-approval. If an agent can approve another agent’s elevation PR (or its own via a proxy identity), the gate collapses. Mitigation: policy rules on approver identity — humans-only for elevation PRs, explicit rejection of agent approvers at the platform level.
  • Missing TTL. Elevation granted, task completed, privilege never revoked → read-write drift back toward admin-by-default. Mitigation: TTL is required at grant time, not optional; pipelines refuse elevation PRs without a TTL field.
  • Single-tier approval. Treating DROP TABLE the same as +1 replica is wrong. Mitigation: risk tiers in the elevation template with differentiated approver rules — high-risk requires named senior approvers; low-risk may accept any human reviewer.
  • Pipeline flakiness. Elevations stuck pending for minutes or hours teach developers to bypass. Mitigation: monitor pipeline SLO on elevation-path specifically; page on failures.
  • Rubber-stamp reviews. The human approves without understanding, especially under deadline pressure. Mitigation: structured PR template forces the author to articulate justification and rollback plan; reviewer checklist makes skipped sections visible; periodic audit of approval-to-incident correlation.
  • Credential leak via CI logs. The elevated credential flows through the pipeline; poorly configured CI logging echoes it. Mitigation: masked variables, log scanners, and the standard secret-hygiene disciplines.
  • Privilege accretion via “just make it permanent”. Recurring elevations become permanent grants, defeating the recipe. Mitigation: PL4-least-privilege level-3 explicitly distinguishes “recurring legitimate elevations get scoped permanent grants” (good — the permanent grant is scoped, not broad admin). The failure mode is scope inflation, not permanence itself.

Open design questions

  • Risk-tier taxonomy. What’s the canonical set of tiers? Minimum probably: cosmetic (cache clear, read replica promote), reversible-write (INSERT, +1 replica, config flag flip), semi-destructive (UPDATE, migration, IAM change), destructive (DROP, DELETE, infra teardown). Each tier needs its own approver policy and TTL ceiling. Worth a dedicated design pass before rolling out broadly.
  • Which credential-issuing substrate? Vault, cloud-native STS, short-lived JWT, something else? Depends on target platform; a multi-platform project may need more than one. First deployment picks the easiest substrate and the recipe generalises over time.
  • How does the agent discover what elevations it’s allowed to request? A hard-coded template forces human intervention on every new pattern; a permissive template allows silent scope expansion via JIT requests. Middle ground: agent-readable policy document listing allowed elevation shapes per risk tier; requests outside the document require a separate policy-change PR.
  • Integration with incident response. During an incident, the normal elevation flow may be too slow. Emergency-elevation path (e.g. break-glass with post-hoc audit) needs design, not ad-hoc invention under pressure.

Cost estimate

Medium to high. First deployment in a project: 2–4 engineer-weeks depending on platform maturity, most of which is plumbing (credential issuer integration, pipeline triggers, policy-rule enforcement) rather than the recipe pattern itself. Once the substrate exists, extending coverage to new target systems is 1–3 days per system.

Ongoing maintenance burden is moderate: policy evolves (new risk tiers, new target systems), approver rotations happen, pipeline observability needs care. Pays back in every incident where the “admin by default” failure mode is structurally impossible.

  • Composes with: bot-token credential tenancy — the default read-only credential is typically a bot/service account whose identity the elevation temporarily expands, not a user identity being impersonated.
  • Composes with: indexed per-entry registry — elevation-request history is itself a corpus worth indexing. Querying “which elevations were requested against prod DB in Q1” is the analytic value of keeping this history in structured form.
  • Depends on (recipe-wise): none directly, but presumes a functioning source-control-first engineering culture. GitOps JIT in a team that merges to main without PRs is not this recipe.
  • Alternatives to: permanent admin with post-hoc audit (weaker: damage can’t be prevented, only documented); on-call break-glass credentials (weaker: all-or-nothing, no scope control); sudo-style command-level elevation (comparable in outcome but much harder to audit coherently).