AI/ML Fundamentals for Attackers

You don't need to build AI systems. You need to understand them well enough to break them. This module gives you just enough ML theory to be dangerous.

What is an LLM?

A Large Language Model is a statistical system trained to predict the next token given a sequence of tokens. It has no true understanding β€” it pattern-matches on a massive training corpus. This is your attack surface: the patterns it has learned can be manipulated, overridden, and subverted.

Key Terms

Term What it means for attackers
Token The atomic unit of text (roughly a word or word-piece). LLMs think in tokens, not characters β€” affects injection boundaries
Context Window Max tokens the model can "see" at once. Injecting into a large context dilutes instructions β€” proximity to the model's "current focus" matters
System Prompt Hidden instructions from the operator. Your first target β€” can it be leaked? Overridden?
Temperature Randomness control. High temp = more creative/unpredictable. Low temp = more deterministic. Affects exploit reliability
RLHF Reinforcement Learning from Human Feedback β€” the alignment layer. Jailbreaks try to bypass this
Embedding Vector representation of text. Key to RAG attacks
Fine-tuning Retraining a base model on new data. Creates model-level backdoors
Inference Running the model to generate output. The runtime you're targeting

How LLM Applications Are Built

In the real world, you rarely attack a raw LLM. You attack an application built on top of one. The standard architecture looks like this:

User Input
    ↓
[Input Validation / Sanitization]  ← often missing or weak
    ↓
[Context Assembly]
  β”œβ”€β”€ System Prompt (operator instructions)
  β”œβ”€β”€ Retrieved Context (RAG / tools)
  β”œβ”€β”€ Conversation History
  └── User Message
    ↓
[LLM API]  ← OpenAI / Anthropic / Bedrock / local
    ↓
[Output Parser]  ← structured JSON extraction, sometimes eval()
    ↓
[Tool Executor]  ← web search, code exec, DB queries
    ↓
[Output Filter]  ← guardrails, classifiers
    ↓
User Response

Attack Surface: Every stage in this pipeline is an attack surface. Input validation failures β†’ prompt injection. Output parser trust β†’ code execution. Tool executor trust β†’ SSRF, command injection. Output filter bypass β†’ guardrail evasion.

ML Concepts You Must Know

Transformers (brief)

LLMs use transformer architecture with attention mechanisms. The model attends to different parts of the input when generating each token. This means: instructions placed close to the generation point carry more weight β€” a key concept for injection placement.

Training vs Inference

  • Training phase β€” model learns from data. Attack: data poisoning, backdoor injection
  • Inference phase β€” model generates responses. Attack: prompt injection, jailbreaking, extraction

Model Weights vs Context

Model weights = permanent knowledge baked in during training. Context window = temporary runtime information. You can't change weights via prompting (usually) β€” but you can override behavior through context manipulation.