← All articles

May 25, 2026

Where Model Risk Ends

What model risk management keeps owning, and where agent assurance starts.

By Jonathan Luethke

Model risk management was built for an artifact that does not move. An agent run is an artifact that does.

The methodology that worked for the first does not extend to the second. The second is what the firm is increasingly deploying.

The question is where the line sits.

What the function was built to do.

Model risk management was assembled around a stable object. A model is trained. The weights are frozen. The artifact is validated against that frozen state. Performance is monitored for drift in production. Documentation lives in a dossier. The dossier is read by the second line, by internal audit, and eventually by the examiner.

For fifteen years this worked. The artifact under examination was a static scoring engine that took an input and returned an output. The validation methodology, the bias testing, the data lineage review, the conceptual soundness write-up. All of them assumed the same thing. The artifact does not move while you examine it.

On April 17, 2026, SR 11-7 was rescinded. The replacement, SR 26-2, carries the same expectations for static models. It also places generative and agentic AI explicitly out of scope, calls those systems novel and rapidly evolving, and signals a forthcoming interagency RFI.

One reading of that carve-out is administrative caution. Another reading is more practical. The methodology the function spent fifteen years refining does not describe a system whose behavior is shaped at runtime.

What an agent run is.

An agent run is a sequence of tool calls, branches, and state transitions, in which the model invokes external systems, modifies its own context, and chooses among intermediate paths.

A model card is the wrong document for this object. The card describes a frozen artifact and the data it was trained on. It says nothing about what the agent did in the seventeen steps between the question and the answer. The card is a true description of a different thing.

The validation methods are the wrong methods too. Replay against frozen weights produces a fresh run that started the same way. Not the run that was recorded. Statistical drift monitoring measures distributional shift over many runs. Not the trajectory of a single decision. Conceptual soundness assesses the model. The decision graph is downstream of the model.

The function that owns the model card does not, by extension, own the trajectory. The artifact is what changed.

Where the line sits.

The boundary holds at the artifact. Model risk owns the model. Agent assurance owns the run.

Model risk keeps the model card. It keeps the data lineage. It keeps the fairness and bias testing. It keeps the conceptual soundness review. It keeps the drift monitoring of the underlying weights. The methodology already exists. The function already exists. Both stay.

Agent assurance picks up what model risk was never built to hold. The trajectory of tool calls. The tool-call permissions enforced at runtime. The signed record of what the agent did at execution. The context deltas that shaped the run. The replay format the examiner will eventually ask for.

Two artifacts. Two functions. One line between them.

The line is sharper than it sounds. A model is validated before deployment. The run happens after. A model is examined statistically across many cases. The run is examined as a single graph. The model card is written by the modeler. The trajectory is produced by the substrate. The boundary is the moment the agent starts executing.

Three lines of defense, still.

Agent assurance is a sibling function inside the second line. Not a fourth line.

The first line still owns the operational decision. The business unit deploys the agent, accepts the outcome, and answers for the customer impact. The second line now has two specialists. One for the model, and one for the run. They report to the same risk leader or to peers. They share a regulator-facing dossier. They produce two artifacts that read together.

Internal audit still audits both. The third line tests the work of the second. With agent assurance in place, the third line has something to test against the run, which it did not have before. The trajectory is owned by a sibling, not an extension.

The function is new because the artifact is new. Not because the firm wanted a new function.

What waiting looks like.

The instinct in many model risk functions is to read the SR 26-2 carve-out as relief. The agencies say agentic AI is out of scope pending the RFI. The reading: nothing to do until further guidance arrives.

The cost of that reading is not the rule. The rule will arrive. The cost is the eighteen months between the rule arriving and the firm being asked to produce evidence under it. The trajectories from the runs the agent makes between now and then will not be reconstructable. The infrastructure to produce them, the function to own them, and the format to deliver them are all on the same build clock.

The firms that begin organizing now will have a second-line owner, a working substrate, and a year of trajectories on file when the RFI closes. The firms that wait will start the build when the deadline starts ticking.

What we are building.

Wayfinder Systems Group is building the substrate that produces the artifact the sibling function consumes. Every decision signed onto a tamper-evident chain at the moment it happens. Every learning event signed alongside it. Every tool call written into a record that survives the run. Model risk keeps the model card. Agent assurance reads the trajectory. The substrate produces both, and the examiner reads what each function delivers. Patents pending. We call her Velma.

Next step

Thirty minutes. Architecture, not sales.

A conversation about where the model risk function stops in your organization today, and what the sibling function that picks up the agent run will need to own.

JonathanLuethke@WayfinderSystemsGroup.com