← All articles

May 22, 2026

Nothing to Freeze

Why replaying the model weights does not reconstruct an agent that learned inside the run.

By Jonathan Luethke

Static-model audit rests on one move. Freeze the weights, replay the input, reproduce the output.

An agent that learns inside a single run breaks that move. The weights never changed. The behavior did.

There is nothing to freeze and replay against.

What the static-model method assumes.

The validation paradigm that governed regulated models for fifteen years rests on a stable artifact. A model is trained. The weights are frozen. The model is validated against that frozen state. It is deployed, and its performance is monitored for drift over time.

Audit follows the same shape. Take the recorded input. Run it against the frozen weights. Confirm the output matches what the model produced in production. The artifact under examination does not move while you examine it.

On April 17, 2026, SR 11-7 was rescinded and replaced with SR 26-2. The new guidance places generative and agentic AI explicitly out of scope, calls these systems novel and rapidly evolving, and signals a forthcoming interagency request for information. One reading of that carve-out is procedural caution. Another reading is more specific. The static-model method does not describe a system whose behavior is shaped at runtime.

What in-context learning does.

The model weights never change during a run. The behavior changes anyway. Across a single execution the agent accumulates context. Earlier tool outputs become later inputs. The system prompt, the retrieved documents, and the prior steps all condition what the model does next.

Two runs that open with identical inputs can reach different decisions. A retrieved document arrived in a different order. A tool returned a value that shifted the next branch. The frozen weights produced both outcomes.

The weights are not the model that made the decision. The weights plus the context that accumulated inside the run are. The audit object is the run, not the model.

Why replay fails.

Freeze the weights and replay the opening input and you do not reconstruct the run. You produce a fresh run that happened to start the same way. The intermediate state that bent the first run toward its decision is gone unless it was captured as it formed.

What has to be captured is the context as it accumulated. Every token that entered the window. The order it entered. The state the agent carried from one step to the next. The branch the agent took, and the branches it rejected at each turn.

This is a one-shot capture problem. A static model can be re-interrogated at leisure, because it sits still. An in-context run cannot. The session closes, the context window is discarded, and a later retrain overwrites the weights that were live at the moment. If the run was not recorded as it happened, the run is gone.

What this means for the record.

The reconstruction window is the run itself. There is no second window. The record has to be produced at the moment of execution, signed and sealed before the context that produced it is discarded.

A model card has no field for this. The card describes the artifact that was validated. It says nothing about what the agent absorbed during run 4,812 that moved it toward a denial. The thing the examiner will eventually contest is the run, and the model card describes the wrong object.

This reframes what governance has to capture. Not the trained model and its drift over months. The decision and the learning event, each one, at the resolution of a single execution.

Where the learning governor sits.

Two governors do two jobs. One records the decision. The trajectory of tool calls, state transitions, and branches that produced the output. The other records the learning event. What entered the agent's context and changed its posture inside the run.

The learning governor captures the context deltas at the moment they form. What was retrieved. What the prior steps fed forward. What shifted the agent away from the path identical opening inputs would otherwise have taken. Signed, chained, and timestamped, so the record of how the agent came to behave the way it did survives the session that produced it.

This is the slot the static-model paradigm leaves empty. A method built to validate a frozen artifact has no place to put a behavior that formed at runtime and was never written to weights.

What we are building.

Wayfinder Systems Group is building the governor for the part that learns. Every decision signed onto a tamper-evident chain, and every learning event signed alongside it, captured at the moment the agent absorbs the context that changes what it does next. The static model has a validation file. The run that learned inside itself now has a record too. The reviewer reads exceptions. The examiner reads the run. Patents held in The Wayfinder Trust. We call her Velma.

Next step

Thirty minutes. Architecture, not sales.

A conversation about what an in-context run has to record at the moment it happens, because the static-model replay your validation method assumes is not available after the session closes.

JonathanLuethke@WayfinderSystemsGroup.com