Indexed per-entry registry with YAML frontmatter
Indexed per-entry registry with YAML frontmatter
Purpose
Corpus filing pattern for collections of like entities (stakeholders, integrations, ADRs, recipes, vendors, projects) where each entity has its own file and a thin index file aggregates. Solves the “scattered markdown in arbitrary folders” problem that makes corpuses invisible to agents and humans alike.
Architecture
Two layers:
- Index file at the corpus root (e.g.
stakeholders.md,integrations.md,recipes.md) containing a short explanation of what the corpus is, a one-line-per-entry table, and a “how to use this registry” section covering add / query / sweep conventions. - Per-entry files in a sibling directory (
stakeholders/,integrations/,recipes/), each with YAML frontmatter carrying structured attributes (status, type, dates, tags, IDs) and markdown body organized into conventional sections.
A _template.md file lives in the per-entry directory showing expected frontmatter keys and body sections. New entries are created by copying the template, not by free-form authoring — this keeps frontmatter queryable across the set.
Query surface: rg 'status: active' <corpus>/ for filtering; yq '.field' <corpus>/*.md for structured extraction; grep for cross-corpus references.
Criteria advanced
PL1-corpus-taxonomyCorpus taxonomy, filing, indexing — this is the mechanism for level-2. Explicit type system (enforced via template frontmatter), consistent filing structure (per-corpus directory), index (the.mdfile), agent-queryable by type/status/recency. Level-3 requires the additional discipline of tracked staleness sweeps and filing-gap detection, both of which this pattern accommodates but doesn’t automate on its own.
Indirectly supports any criterion that depends on corpus retrieval (PL1-primary-source-access, PL1-decision-records, PL1-documentation-loop, PL1-stakeholder-context, PL2-agent-audit-trail) — if those criteria need a queryable knowledge store, this is the substrate.
Prerequisites
None beyond markdown, git, and a willingness to apply the pattern consistently. The pattern’s value compounds with scale (5 entries → marginal; 50 entries → load-bearing).
Failure modes
- Frontmatter key drift. Different entries use different key names for the same attribute (
emailvs.contact_email;statusvs.state). Breaksyqqueries silently. Mitigation:_template.mdas the canonical schema, with discipline around consulting it when adding entries. - Index-body drift. The index table row claims an entry exists that the directory doesn’t contain, or vice versa. Mitigation: adding an entry is a two-step commit (file + index row), done in one PR; a periodic staleness sweep catches the rest.
- Over-structuring. Template demands too many required fields; contributors skip the pattern for new entries because filling it in is painful; you end up with a mix of structured and unstructured. Mitigation: keep required frontmatter to 3–5 fields; let the rest be optional.
- Under-structuring. Frontmatter is too thin to actually answer real queries; the registry devolves back into free-text markdown. Mitigation: add a frontmatter field the first time you notice you want to query for it, not before and not after.
- Narrative content leaking into frontmatter. Frontmatter should be queryable attributes, not prose.
status: activeis queryable;status: "active but with reservations because..."is not. Narrative belongs in the body.
Cost estimate
Low. Establishing the pattern for a new corpus: 1–2 hours (template + index + 2 seed entries). Ongoing cost is proportional to contribution volume — a well-designed template makes new entries 10–20 minutes, most of which is actual content thinking, not filing work.
Case studies
memory/stakeholders/— people involved in the Agentic Engineering canon. Frontmatter carries contact details (email, Slack, preferred channel, timezone) making outbound reach a one-step lookup; tags capture project scope; status tracks active / archived / prospective. Enables queries like “who do I ping about the Gentari deployment?” viayq 'select(.tags | contains(["gentari"])) | .contact.slack' memory/stakeholders/*.md.internal/integrations/— external systems the agent can act on. Added 2026-04-18 in this shape (previously a single monolithic file — the refactor itself was the test that the pattern scales). Frontmatter carries system, status (active / proposed / archived), auth mode, pillars advanced, making scope drift auditable across the corpus.recipes/— this collection. Frontmatter carries criteria advanced, prerequisites, complexity, seen_in, making portfolio-reuse queries natural: “which recipes advancePL4-least-privilege?”; “which recipes are blocked onPL4-branch-protectionbeing at level 2?”; “which recipes have been proven in at least one project?”.
Related recipes
- Composes with: every domain-specific corpus pattern. Stakeholder registries, integration registries, ADR corpora, recipe collections, project inventories, vendor catalogues — same shape, different frontmatter schema per corpus.
- Alternatives to: free-text markdown in arbitrary folders (the default failure mode); spreadsheet-of-truth (loses prose context, breaks at entry complexity); external CMS (introduces a system boundary agents can’t query natively).