Research
Proof-of-value for the environments: training runs and model evaluations conducted directly on them. Each study tests specific models or harness configurations under controlled conditions, with binary ground truth and full per-step telemetry. The environments are the product. These studies show they hold up.