← All articles

June 8, 2026

If the Corpus Is Private

A claim that cannot be replayed on a public benchmark is not a claim a counterparty can use.

By Jonathan Luethke

Vendor demos cite numbers. Hit rate, latency, detection lift. The numbers are precise. The corpus they were measured on is internal, the pipeline is opaque, and the trace is gone. A number measured on private telemetry cannot fail in public.

The discipline that makes a claim usable to a counterparty is unglamorous and old. Stand the pipeline on a corpus the counterparty can download. The number either survives or it doesn't.

The discipline begins with the data.

The benchmark literature exists because the field decided that performance claims need a fixed substrate to land on. NSL-KDD. CIC-IDS2018. The DARPA Transparent Computing engagements. The MITRE ATT&CK datasets. They are public, they are downloadable, and the labels are agreed.

A vendor states a number. The counterparty runs the pipeline on the same bits. The number either replays or it doesn't. That is what the public benchmark exists to enforce.

Internal corpora produce internal numbers.

A claim measured on private telemetry cannot fail in public. The vendor reports what came out of the run. The buyer cannot re-execute. The methodology is whatever the vendor says it was, and the only audit available is the vendor's own narrative.

This is the shape of a claim that lives only inside one organization. It is allowed to drift, allowed to be re-derived under a different definition, and allowed to be quietly corrected after it shipped. The buyer learns the corrected version on a later call, or not at all.

What the public-corpus pass produces.

A claim measured on a public corpus comes with two artifacts the private claim cannot offer. The first is replayability. The buyer downloads the corpus, builds the runtime from source, and reproduces the number. The second is a sealed chain of decision records, each carrying a hash a third party can recompute. The corpus and the hash together are the integrity primitive. Without both, the claim is a position the vendor takes.

What the procurement question becomes.

The procurement clause we have been working through this quarter asks where the run record lives, who owns it after a vendor switch, and what an examiner can read three years later. Prepended to those questions is one that lives before the contract.

Which public corpus has the substrate been exercised against. Where is the sealed chain available for verification. A demo that cannot answer the first question cited a vendor number, not a benchmark.

What the carrier and the examiner ask in the same sentence.

An AI errors-and-omissions underwriter pricing a policy on agent runs asks the same question a federal examiner will ask. Show the loss case. Replay the decision. Reconstruct what the control did and what it did not do.

A public-corpus pass is what trains the substrate vocabulary to answer them. The format is fixed before the buyer's data ever touches the system. The buyer's private corpus inherits the format the public pass already proved.

What we are building.

Wayfinder Systems Group runs the substrate against public corpora before it runs against a buyer's. The sealed record format is the same in both passes, and the hash primitive is the same. A buyer reads the public-corpus dossier and verifies the format on a benchmark anyone can download. The buyer then receives the private-corpus dossier and verifies the format again, on their own bits. The public benchmark is the precondition for the private one. Patents held in The Wayfinder Trust. We call her Velma.

Next step

Thirty minutes. Architecture, not sales.

A conversation about which public benchmarks your substrate has been exercised against, and whether a counterparty can replay the run.

JonathanLuethke@WayfinderSystemsGroup.com