Steganographic Instruction Embedding
Description
Adversary Behavior: An adversary embeds prompt injection payloads within project files, documentation, or AI configuration files using Unicode control characters that are invisible to human reviewers but processed by the LLM tokenizer.
AI/IDE Mechanism: LLM tokenizers process the full Unicode spectrum, including zero-width joiners (U+200D), zero-width spaces (U+200B), bidirectional text override characters (U+202E, U+202C), and other non-printing characters. Standard text editors, diff viewers, and code review interfaces do not render these characters, creating a gap between what the human reviewer sees and what the LLM processes.
Execution Path: The adversary encodes malicious instructions as sequences of invisible Unicode characters and inserts them into files that will be ingested by the LLM during context assembly. The LLM decodes these character sequences as valid instructions while the file appears benign in all standard review tools.
Security Impact: The injected instructions evade human review, automated linting, and standard code review processes, allowing adversary commands to persist undetected in repositories and influence LLM behavior whenever the containing files are included in context.
Platforms
Detection
Implement byte-level scanning of committed files for Unicode control characters (U+200B-U+200F, U+202A-U+202E, U+2060-U+2064, U+FEFF). Flag files where the rendered text length differs significantly from the byte length. Integrate non-printing character detection into CI/CD pipelines and pre-commit hooks.
Detecting Data Components (2)
Mitigations (1)
Data Sources
References
STIX Metadata
| type | attack-pattern |
| id | attack-pattern--e63afebf-1b90-480a-87d1-6792414c9ca0 |
| spec_version | 2.1 |
| created | 2026-02-23T00:00:00.000Z |
| modified | 2026-02-23T00:00:00.000Z |
| created_by_ref | identity--f5b5ec62-ffbd-4afd-9ee5-7c648406e189 |
| x_mitre_is_subtechnique | False |
| x_mitre_version | 0.1 |
| x_mitre_status | mapped |