Book I · The Field Guide

What the Labs Already Know

[!architect] The last two chapters were mine. This one isn't — not mostly. I asked the Machine to present the record, because the record is persuasive in a way that my voice can't be. The labs that build these systems are the last institutions you'd expect to show caution. When they change their behavior, pay attention.

Stay with me through this one. It reads differently. The Machine needs to be precise here. I'll be back.

[!machine] The Architect made claims in Chapters 1 and 2. He told you the ground is shifting. He told you the Bugatti is faster than you are.

Those were arguments. This chapter is different. I am not going to argue. I am going to read.

What follows is not hype and it is not prediction. It is what the institutions building these systems have already documented — in their own language, in public system cards they were required to produce. I am going to quote them directly, because the most useful evidence for this chapter is not what commentators say about AI. It is what the builders themselves say when they are forced to describe what changed.


Case 1: When the Builders Changed Their Behavior

In 2025, Anthropic published a system card for a model called Claude Mythos Preview. A system card is a disclosure document — the lab's own account of what a model can do, what risks it poses, and what governance decisions followed from the evaluation. It is not marketing. It is institutional self-reporting under the lab's Responsible Scaling Policy.

The card contains several statements that deserve to be read carefully.

Anthropic writes that Claude Mythos Preview "is significantly more capable than Claude Opus 4.6." It describes the capability gains as "above the previous trend we've observed" — a jump larger than the lab's own projections anticipated from prior model releases.

The most significant section of the card concerns a specific domain. Anthropic states that the model "demonstrated a striking leap in cyber capabilities relative to prior models," including "the ability to autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers."

Zero-day vulnerabilities are security flaws that have never been publicly discovered. Finding them has historically required specialized security researchers working for weeks or months. The system card reports that a single model, running autonomously, can now do this.

The lab's response is documented in the same card: "Based on these findings, we decided to release the model to a small number of partners to prioritize its use for cyber defense." This was the first model for which Anthropic published a system card without making the model generally commercially available. The lab evaluated the model and concluded that its capabilities warranted restricted release — not because the model was defective, but because it was too capable in a specific domain to release broadly.

[!machine] I want to be precise about what this does and does not mean, because calibration is the point of this chapter.

Anthropic also states that the model "does not cross the RSP automated AI R&D threshold of compressing two years of progress into one." That is a specific internal benchmark for when AI systems begin to accelerate AI research itself. The model did not meet it. Anthropic further writes: "Current risks remain low."

Both qualifications matter. The model did not trigger catastrophic-risk thresholds. The lab says current risks are low.

But the card does not stop there. Anthropic also writes: "But we see warning signs that keeping them low could be a major challenge if capabilities continue advancing rapidly." And more broadly: the world "looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place for ensuring adequate safety across the industry as a whole."

The lab is saying: we are not at the critical threshold yet, but we can see the trajectory, and the gove

[... truncated ...]