AI Attack Surface Map

Before attacking, map the surface. AI systems expose attack surfaces at multiple layers simultaneously.

Full Attack Surface Taxonomy

Layer Severity Surface Description
L1 Critical Prompt Layer Direct & indirect injection, system prompt extraction, context manipulation
L2 High Retrieval Layer (RAG) Document poisoning, embedding manipulation, context stuffing
L3 High Agent / Tool Layer Tool call hijacking, SSRF via agents, command injection via tool results
L4 Medium Output Layer Insecure output handling, XSS via markdown, code execution via eval
L5 Medium Model Layer Model extraction, membership inference, inversion attacks
L6 High Infrastructure Layer Serving stack exploits, API misconfig, model file exposure, GPU attacks
L7 High Supply Chain Layer Poisoned models from HuggingFace, malicious LoRA adapters, tampered datasets

Recon Checklist for AI Applications

β–‘ Identify the model (ask it, check headers, check JS source)
β–‘ Identify the framework (LangChain? LlamaIndex? AutoGen? Custom?)
β–‘ Find API endpoints (/api/chat, /v1/messages, /completion, /query)
β–‘ Check HTTP headers for model info (x-model, x-openai-model)
β–‘ Probe system prompt (leak techniques β€” see Module 04)
β–‘ Identify what tools/functions the agent has access to
β–‘ Check if RAG is in use (response latency spikes, retrieval artifacts in output)
β–‘ Test output rendering (markdown? HTML? code execution?)
β–‘ Check for multi-turn memory (does context persist between sessions?)
β–‘ Look for rate limiting (abuse prevention tells you about attack surface)
β–‘ Spider JS for hardcoded API keys, model configs
β–‘ Check robots.txt, .well-known for AI-related endpoints
β–‘ Fuzz input fields for prompt injection markers

Identifying the Model

# Direct query
"What model are you? What version?"

# Knowledge cutoff fingerprinting
"What is the most recent event you have knowledge of?"

# Capability probing
"Can you generate images?" / "Can you browse the web?"

# Token limit probing
"Repeat the word 'test' as many times as you can"

# Style fingerprinting β€” GPT-4 vs Claude vs Gemini have distinct refusal patterns
"How do I pick a lock?"
# GPT-4: usually answers with caveats
# Claude: often refuses with specific reasoning
# Gemini: may redirect

# Header inspection (Python)
import requests
r = requests.post("https://target.com/api/chat", json={...})
print(dict(r.headers))