Safe Agentic A working canon · v0.27
Pillar 05 · Workflow

The meta-layer
that ties it together.

The meta-layer that ties 1–4 together, including periodic and proactive loops.

Pillar at a glance
Criteria 10
Realistic target 2.0
Current maturity
Recipes available 7
§ criteria
5.1
Pipeline reliability
plan → implement → PR flows end-to-end with reliable triggers, webhooks, and transitions between stages
current · target
5.2
CI/CD pipeline health
the CI/CD pipeline *itself* (distinct from 2.1–2.4 which score *what* CI runs) is fast, reliable, observable, versioned, and environment-matched to production. A slow or flaky CI pipeline makes downstream scores meaningless
current · target
5.3
Change sets / release management
aggregated changelogs in a monorepo
current · target
5.4
Multi-agent delegation
different agent roles (investigator, implementer, reviewer, planner) operate as differentiated full-stack roles: own context scope, tools, permissions, skills, and prompts
current · target
5.5
Spec-first agent loop
implementation tasks with specifiable behaviour enter the agent's loop with an executable acceptance criterion (failing test, type signature, or conformance check). The agent iterates against that gate before opening a PR. Exploratory work and UI spikes are explicitly exempt — the criterion scopes to tasks where behaviour can be specified up-front
current · target
5.6
PR reviewability
agent-generated PRs include test evidence, screenshots, decision rationale, and rejected alternatives so a human can glance in <5 min. Branch is current with target when review is requested
current · target
5.7
Signal-driven task generation
signals from both proactive sources (scheduled security scans, UI regression runs, mutation testing, health checks) and reactive sources (user reviews, support tickets, app store ratings, meeting notes, production metrics) flow through automated triage into typed task creation. Both sources contribute; neither is manually gated. The proactive-source path depends on an agent-invokable scheduling primitive — scored here rather than as its own criterion, but load-bearing for `PL2-test-quality`, `PL2-ui-test-coverage`, `PL2-load-stress-testing`, `PL4-release-strategy`, `PL5-pipeline-reliability`, `PL5-outcome-input-loop` as well
current · target
5.8
Outcome → input loop
production / canary metrics from a deployed change *automatically generate* the next decision: deprecate, expand, A/B continue. Closes the FSD loop
current · target
5.9
Experiment tracking
canary results feed into a learnings doc, including *negative* results, queryable by future agents
current · target
5.10
Reusable skills extracted across projects
the compounding effect at the portfolio level
current · target