4.5 Prompt injection defence at ingestion boundary
4.5 Prompt injection defence at ingestion boundary
all external content entering *persistent* agent context passes through an ingestion sanitization layer before indexing. Scope is durable ingestion paths (memory writes, indexed knowledge, unsupervised scheduled ingestion); interactive turn context in user-supervised sessions is out of scope — blast radius there is contained by Pillar 4 substrate (`PL4-least-privilege`, `PL4-branch-protection`). The layer strips, escapes, or sandboxes instruction-shaped text. The same policy is applied consistently across every ingestion surface — `PL1-real-world-feedback` (real-world feedback loop), `PL5-signal-driven-tasks` (signal-driven task generation), `PL4-memory-safety` (memory write-path)
Levels
Level 0
No sanitization; untrusted text flows directly into context
Level 1
Ad-hoc sanitization on some surfaces (e.g. PII redaction only); inconsistent between ingestion paths
Level 2
Unified sanitization layer applied at every ingestion surface; instruction-shaped patterns (role-prompts, system-message mimicry, fake tool calls, jailbreak patterns) stripped, escaped, or sandboxed; policy version-controlled
Level 3
Layer adversarially tested; evasion rate measured; new attack patterns auto-update policy; near-misses from `PL1-real-world-feedback` / `PL5-signal-driven-tasks` / `PL4-memory-safety` feed back into layer refinement