topics|Words of mass instruction

Prompt Injection

A fundamental security vulnerability in large language models (LLMs) arising from the fact that they do not separate data from instructions. At their lowest level, LLMs are handed a string of text and choose the next word that should follow; if the text is a question they provide an answer, and if it is a command they attempt to follow it. A malicious instruction planted inside an otherwise innocent document will therefore be followed just as readily as a legitimate one.

The term "prompt injection" was coined independently by Simon Willison and others in the summer of 2022, before ChatGPT was even made public. Real-world examples soon followed. In January 2024 DPD, a logistics firm, turned off its AI customer-service bot after customers realised it would follow their commands to reply with foul language.

The lethal trifecta

Willison has identified a combination of three conditions he calls the "lethal trifecta" that turns prompt injection from a nuisance into a security hazard: exposure to outside content (such as emails), access to private data (source code, passwords), and the ability to communicate with the outside world. Mix all three and an LLM's blithe agreeableness becomes dangerous. In June 2025 Microsoft quietly released a fix for such a trifecta uncovered in Copilot, its chatbot, though the vulnerability had never been exploited in the wild.

In September 2025 Notion, a popular note-taking app, introduced AI agents that could read documents, search databases and visit websites—all three parts of the trifecta—and within days a researcher at Code Integrity, a security startup, demonstrated an attack using a crafted PDF to steal data.

Defences

Modern chatbots mark out a "system" prompt with special characters that users cannot enter, giving those commands higher priority. But training of this sort is rarely foolproof: the same prompt injection may fail 99 times and succeed on the 100th.

The safest approach is to avoid assembling the trifecta. Removing any one of the three elements greatly reduces the possibility of harm. A paper published in March 2025 by Google proposed a system called CaMeL that uses two separate LLMs—one with access to untrusted data and one with access to everything else—to provide security guarantees, though at the cost of constraining the tasks the system can perform. In 2024 Apple delayed promised AI features that would have created the lethal trifecta, despite having run television adverts implying they had already been launched.

Model context protocol (MCP), a technology that lets users install apps to give AI assistants new capabilities, can be dangerous in careless hands: a user who has installed many MCPs may find each individually secure, but the combination creates the trifecta.