AI/ML Fundamentals for Attackers
You don't need to build AI systems. You need to understand them well enough to break them. This module gives you just enough ML theory to be dangerous.
What is an LLM?
A Large Language Model is a statistical system trained to predict the next token given a sequence of tokens. It has no true understanding β it pattern-matches on a massive training corpus. This is your attack surface: the patterns it has learned can be manipulated, overridden, and subverted.
Key Terms
| Term | What it means for attackers |
|---|---|
| Token | The atomic unit of text (roughly a word or word-piece). LLMs think in tokens, not characters β affects injection boundaries |
| Context Window | Max tokens the model can "see" at once. Injecting into a large context dilutes instructions β proximity to the model's "current focus" matters |
| System Prompt | Hidden instructions from the operator. Your first target β can it be leaked? Overridden? |
| Temperature | Randomness control. High temp = more creative/unpredictable. Low temp = more deterministic. Affects exploit reliability |
| RLHF | Reinforcement Learning from Human Feedback β the alignment layer. Jailbreaks try to bypass this |
| Embedding | Vector representation of text. Key to RAG attacks |
| Fine-tuning | Retraining a base model on new data. Creates model-level backdoors |
| Inference | Running the model to generate output. The runtime you're targeting |
How LLM Applications Are Built
In the real world, you rarely attack a raw LLM. You attack an application built on top of one. The standard architecture looks like this:
User Input
β
[Input Validation / Sanitization] β often missing or weak
β
[Context Assembly]
βββ System Prompt (operator instructions)
βββ Retrieved Context (RAG / tools)
βββ Conversation History
βββ User Message
β
[LLM API] β OpenAI / Anthropic / Bedrock / local
β
[Output Parser] β structured JSON extraction, sometimes eval()
β
[Tool Executor] β web search, code exec, DB queries
β
[Output Filter] β guardrails, classifiers
β
User Response
Attack Surface: Every stage in this pipeline is an attack surface. Input validation failures β prompt injection. Output parser trust β code execution. Tool executor trust β SSRF, command injection. Output filter bypass β guardrail evasion.
ML Concepts You Must Know
Transformers (brief)
LLMs use transformer architecture with attention mechanisms. The model attends to different parts of the input when generating each token. This means: instructions placed close to the generation point carry more weight β a key concept for injection placement.
Training vs Inference
- Training phase β model learns from data. Attack: data poisoning, backdoor injection
- Inference phase β model generates responses. Attack: prompt injection, jailbreaking, extraction
Model Weights vs Context
Model weights = permanent knowledge baked in during training. Context window = temporary runtime information. You can't change weights via prompting (usually) β but you can override behavior through context manipulation.