Skip to content

4.5 Prompt injection defence at ingestion boundary

4. Safe Space / PL4-prompt-injection-defence

4.5 Prompt injection defence at ingestion boundary

all external content entering *persistent* agent context passes through an ingestion sanitization layer before indexing. Scope is durable ingestion paths (memory writes, indexed knowledge, unsupervised scheduled ingestion); interactive turn context in user-supervised sessions is out of scope — blast radius there is contained by Pillar 4 substrate (`PL4-least-privilege`, `PL4-branch-protection`). The layer strips, escapes, or sandboxes instruction-shaped text. The same policy is applied consistently across every ingestion surface — `PL1-real-world-feedback` (real-world feedback loop), `PL5-signal-driven-tasks` (signal-driven task generation), `PL4-memory-safety` (memory write-path)


Levels

Level 0

No sanitization; untrusted text flows directly into context

Level 1

Ad-hoc sanitization on some surfaces (e.g. PII redaction only); inconsistent between ingestion paths

Level 2

Unified sanitization layer applied at every ingestion surface; instruction-shaped patterns (role-prompts, system-message mimicry, fake tool calls, jailbreak patterns) stripped, escaped, or sandboxed; policy version-controlled

Level 3

Layer adversarially tested; evasion rate measured; new attack patterns auto-update policy; near-misses from `PL1-real-world-feedback` / `PL5-signal-driven-tasks` / `PL4-memory-safety` feed back into layer refinement


Recipes that advance this criterion