Safe Agentic A working canon · v0.27
Recipe · proposed

Branch protection as code

Branch protection as code

Purpose

Branch-protection settings configured in a Git host’s admin UI are paper rules. Any admin — human or agent with admin scope — can disable a check on a Friday afternoon, ship the bypass, and forget to put the rule back. The recipe: declare the protection bundle in version control, reconcile it continuously from CI, and treat the admin UI as a display surface, not a source of truth. A drifted setting only outlives the next workflow run.

Solves the “settings clicked off and never clicked back on” failure mode by making the declared state structurally re-asserted, not procedurally remembered.

Architecture

Three load-bearing parts: the rule bundle, the declaration substrate, and the reconciliation loop.

The bundle

The protected branch (main, develop, or whatever the repo’s main development branch is named) requires all of:

  1. Pull-request review required. No direct push or direct merge to the protected branch by any actor.
  2. At least one CODEOWNERS approval. Self-approval by the PR author — including agent service accounts authoring as themselves — does not count. CODEOWNERS-derived approvers are computed from the touched paths, so the gate is content-aware. A repo without a CODEOWNERS file cannot satisfy this rule, so the file is a hard precondition.
  3. All required status checks pass. The list of required check names is part of the declaration; a check that disappears from CI also disappears from the declared list, in the same PR that removes the workflow.
  4. All PR conversations resolved. Unresolved review threads block merge. Forces explicit closure of reviewer questions instead of merge-and-forget.
  5. Signed commits on the protected branch. Every commit landing on the branch carries a verified GPG or SSH signature. Combined with squash-only merge, this means the squash commit itself must be signed by the merging actor.
  6. Squash-merge only. Other merge methods (merge commit, rebase-and-merge) are disabled at the repo level. Linear history; one commit per PR; bisect and revert stay simple. The trade-off — loss of intermediate commit granularity — is paid back in audit clarity.

The bundle is one unit. Declaring five of six rules and skipping one defeats the purpose; the six together are what level-2 PL4-branch-protection looks like in practice.

The declaration substrate

Every rule lives in version-controlled config. Common shapes:

  • IaC tool — Terraform / OpenTofu with the integrations/github provider, Pulumi with the GitHub package, or equivalent. The github_repository_ruleset resource (or classic github_branch_protection) carries the rule set. State backend is wherever the project keeps its other IaC state.
  • API-applied JSON / YAML — for projects that don’t run IaC, a checked-in branch-protection.yml (or equivalent) plus a thin script that reads it and calls the host’s REST API (PATCH /repos/{owner}/{repo}/rulesets/{id} on GitHub).

The substrate choice doesn’t matter for the recipe; what matters is that the declared state lives in the repo, is reviewed when changed, and is the input the reconciliation loop reads.

Scope is one repository. GitHub org-level rulesets cover many repos with one declaration. They have different blast-radius properties — a bad declaration affects every covered repo at once. That variant is a separate sibling recipe, not folded in here.

The reconciliation loop

A scheduled CI workflow re-applies the declaration. Triggers:

  • On push to the protected branch when infra/, the declaration file, or the workflow itself changes.
  • On a schedule — typically hourly or daily. The cadence sets the maximum drift window. Hourly is a common default for repos where blast radius matters; daily is enough for most projects.
  • On workflow_dispatch — manual reconcile, used after a known-good emergency override to put the rules back.

Each run plans the diff, applies it, and posts a summary (drift detected vs. drift corrected vs. no-op). Unexpected drift fires an alert — drift is normal during a known apply, but unexplained drift means someone clicked something.

The reconciler runs with admin scope on the target repo only. A scoped GitHub App installation token is the typical credential — short-lived, scoped to declared permissions, scoped to declared repos. A long-lived PAT defeats the recipe by becoming a side-channel admin credential.

Agent inspection and remediation

A tool surface that an agent uses to audit a repository’s protection posture, independent of whether the recipe is deployed there yet, and then to close any gaps it finds. The read path:

  • GET /repos/{owner}/{repo}/rulesets and GET /repos/{owner}/{repo}/rulesets/{id} — what rules are live?
  • GET /repos/{owner}/{repo}/branches/{branch}/protection — classic branch-protection on the same branch (rulesets and classic protection coexist on GitHub).
  • GET /repos/{owner}/{repo}/contents/CODEOWNERS — does the file exist? GET /repos/{owner}/{repo}/codeowners/errors — do its entries resolve?
  • GET /repos/{owner}/{repo} — repo-level merge-method settings (allow_squash_merge, allow_merge_commit, allow_rebase_merge).
  • Workflow listing — is a reconciliation workflow present? Has it run recently and succeeded?

The agent compares findings against the bundle and produces a per-rule gap report: which of the six rules is present, missing, or partial; whether the IaC declaration is in-repo; whether reconciliation actually runs.

For closure, the agent may then author the IaC PR itself — editing the declaration file to add the missing rules — and open it through the same review gate the recipe is declaring. The gate’s properties (CODEOWNERS-1 approval not from the PR author, conversation resolution, signed commits, status checks pass) are what preserve safety, not the author’s identity. Agent-authored + human-approved is the shape PL2-external-pr-review level-2 describes; the recipe doesn’t weaken when the agent moves from advisor to remediator.

One precondition makes this safe: the IaC declaration file path must itself be CODEOWNERS-protected, listing humans only. Otherwise an agent service account could land a declaration weakening the gate with only its own approval, and the recipe’s structural property collapses. The inspection mode flags any agent / bot identity it finds in CODEOWNERS as a finding of its own — a self-check the recipe applies to itself.

The load-bearing design choice is continuous reconciliation, not one-shot apply. A one-shot Terraform apply leaves admin-UI overrides intact until someone remembers to apply again. The reconciliation loop closes that window structurally; the admin UI becomes a read-only display of the declaration’s effect.

Criteria advanced

  • PL4-branch-protection level 1 → level 2 (max 2) — the bundle described in Architecture is what the level-2 anchor describes: protected branches locked, direct push blocked, merge requires approval, current-with-target enforced (status-checks rule subsumes this when a “branch is up to date” check is part of the required list), bypass auditable. The recipe is a direct level-2 contributor and saturates the criterion’s ceiling.
  • PL2-external-pr-review partial contributor — the CODEOWNERS-1-required rule is the structural form of “human review enforced”. An agent that authors a PR cannot also approve it, because a CODEOWNERS approver is computed from the touched paths and the agent’s own approval doesn’t count. Pairs naturally with the layered “agent pre-review + human glance” shape level-2 describes.
  • PL2-agent-audit-trail partial contributor — every change to the protected branch is a PR; every PR carries reviewer identity, decision reasoning, conversation threads, the diff, and the merge event. The reconciliation workflow’s run log is itself an audit artefact for any drift event. Together they make the protected branch’s history queryable and reversible at decision granularity.

Prerequisites

  • PL3-source-control ≥ 2. The agent’s inspection mode reads via the source-control API; the IaC declaration is reviewed via PRs the agent opens; the reconciliation workflow is itself a source-control artefact. Without level-2 source control, the agent can’t audit, advise, or maintain the recipe.
  • PL5-pipeline-reliability ≥ 1. The reconciliation workflow must actually run on schedule. A flaky scheduler means drift outlives windows and the gate becomes intermittent. Level-1 is enough for the recipe to function; level-2 makes the drift-detection alert path actionable.
  • CODEOWNERS file exists and resolves. Not a rubric criterion, but a hard precondition for rule 2. An entry like * @some-team with @some-team empty or non-existent makes the gate unsatisfiable and locks the repo (see Failure modes). Verify with gh api /repos/{owner}/{repo}/codeowners/errors returning [] before activating the rule.

Failure modes

  • Provider silently drops a write. Observed on the canon repo: integrations/github v6.11.1 accepted a bypass_actors declaration on a Team-plan repo-level ruleset, reported success, but the live ruleset showed bypass_actors: null. The IaC tool’s success signal is not enough. Mitigation: post-apply verification that calls the host’s API directly and asserts the live state matches the declaration field-by-field, not just at the resource level.
  • Unresolvable CODEOWNERS entry locks the repo. A reference to a deleted user or empty team makes require_code_owner_review impossible to satisfy; the fix PR can’t merge because the gate it would fix is the gate blocking it. Mitigation: pre-flight gh api /repos/.../codeowners/errors returns [] before activating; if the gate is already live and stuck, the escape hatch is admin-bypass via the host UI, then a follow-up PR to land the CODEOWNERS fix and reconcile. The observe-before-enforce gate promotion recipe documents this exact failure.
  • Required status checks reference a workflow job name that doesn’t exist. No PR ever passes the gate. Same lockout shape as the CODEOWNERS case. Mitigation: declare a required check only after the workflow’s job-name contract is stable; verify a recent run of that workflow exists and succeeded before flipping the rule on.
  • Reconciler credential is broader than the gate it declares. The workflow needs admin scope on the repo. If that token is also usable to push directly to main, the reconciler is a bypass channel. Mitigation: GitHub App installation token scoped to Administration permission only; the App is not a CODEOWNERS approver and is not in any bypass list.
  • bypass_actors drift opens a side channel. A bypass entry added via the UI (or smuggled in via a provider quirk) lets named actors skip the gate. Mitigation: the declaration says bypass_actors = []; the reconciler resets it on every run; any legitimate bypass requirement is a meta-governance change to the declaration, reviewed through the same gate.
  • Agent self-approval if CODEOWNERS includes any non-human identity. If the agent’s service account or bot identity appears in CODEOWNERS, it can approve its own PRs. The same hole opens via agent-pair approval: two CI service accounts (e.g. bot-author and bot-reviewer) where one authors and the other approves — the gate technically routes through “two identities” but no human ever sees the change. Mitigation: CODEOWNERS lists humans only; every agent / service account / bot identity is explicitly excluded; the inspection mode flags any non-human identity it finds in CODEOWNERS as a finding of its own.
  • Squash-only loses semantic-commit granularity. A PR with five well-named commits collapses into one squash commit; bisect points to the squash, not the offending sub-commit. Trade-off, not a bug. Mitigation: PR template encourages single-concern PRs; multi-concern work is split. The audit clarity gained is generally worth the bisect granularity lost, but call the trade-off out so it’s a chosen position, not an accident.
  • Reconciliation cadence too slow vs. too fast. Hourly: small drift window, but a misconfigured policy change reaches every covered repo within the hour. Daily: bigger drift window, more breathing room on policy mistakes. Mitigation: stage policy changes through a single canary repo before the org-wide declaration picks them up; alert on first reconciler-detected drift after a policy change so a bad change is caught in the canary.

Cost estimate

Medium. First repo: 0.5–1 engineer-day to author the declaration, set up the App / token, and land the reconciliation workflow. Subsequent repos: 1–2 hours each, mostly parameterising the declaration with the new repo’s name. The agent inspection mode is a one-time build of 1–2 days that then audits any repo on demand.

Ongoing maintenance burden is low: occasional provider-version bumps, periodic review of the rule list against rubric drift (the bundle stays roughly fixed; rubric anchors evolve), and triage of drift alerts from the reconciler.

Open design questions

(Proposed status — this blocks promotion to proven.)

  • Reconciliation cadence default. Hourly minimises drift windows but amplifies blast radius on a bad declaration. Daily reverses both. What’s the right default for the recipe — and should it depend on the repo’s blast radius (production-adjacent vs. canon-adjacent)?

Case studies

(None yet — proposed status, pending the continuous-reconciliation deployment in at least one repo. Partial reference below for shape, not status promotion.)

  • Canon repo (internal/integrations/canon-repo-substrate.md)partial reference, not a proven case study. The canon’s github_repository_ruleset.main declares five of the six rules in this bundle (signed commits, CODEOWNERS-1 with code-owner review, conversation-thread resolution, squash-only via allowed_merge_methods = ["squash"], PR review required); required_status_checks is deliberately omitted for v1 pending stable workflow job names. The reconciliation loop is plan-only on PR and scheduled drift-detection on the observability environment — drift is surfaced, not auto-corrected. Auto-apply on merge is listed as Not available in the integration’s Open items pending the JIT-escalation GitHub App PEM adapter. The canon thus instantiates the declaration half of the recipe; the continuous reconciliation half is a follow-up. Failure modes 1 (provider silent-drop), 5 (can_admins_bypass drift), and the chicken-and-egg (CODEOWNERS unresolvable on first activation) were all observed and informed this recipe’s failure-modes section.
  • Composes with: observe-before-enforce gate promotion — every flip from evaluate to active on a ruleset rule should run through that recipe’s pre-flight checklist. The canon’s first-flip incident with the unresolvable CODEOWNERS entry is the case study both recipes reference.
  • Composes with: gitops-jit-privilege-elevation — the IaC change to update the protection declaration is itself a privileged write. Routing it through a JIT elevation PR means the reconciler’s admin credential isn’t standing.
  • Composes with: indexed per-entry registry — the agent inspection mode benefits from a structured policy registry (allowed bypass shapes, required-check name lists, CODEOWNERS schema) so that “is this declaration well-formed?” is a queryable check, not a hand-rolled script.
  • Alternatives to: dashboard-only configuration — the failure mode this recipe exists to fix. Settings clicked in the admin UI have no audit trail beyond the host’s own log, no review path, no reconciliation. Strictly weaker.
  • Alternatives to: one-shot IaC apply — declares the rules in code but applies only on demand. Better than dashboard-only (declaration is reviewed) but loses to admin-UI overrides between applies. The reconciliation loop is the difference.