Provably Correct by Construction

Verifiable RL environments,
generated at scale

TarantuLabs' proprietary engine generates dozens of new environments an hour. The current focus is cybersecurity, the hardest domain to verify: multi-step exploit chains where up to five vulnerabilities must compose correctly for the flag to fall out. Each one is deterministically solvable and automatically verified, verifiably correct, by the intended path, enforced by gates: the chain composes or it doesn't, with no human judgment and no partial credit. The hard part isn't the exploits; it's generating environments whose correctness is provable at scale.

View on GitHub
100

Verifiable Environments

Each carries a hidden flag, binary ground truth, generated and checked by a deterministic solver. Verifiably correct, by the intended path, enforced by gates, no human grading, no ambiguity.

5

Up to 5 Vulnerabilities

Exploit chains that must fire in sequence, ideal for evaluating and training correct tool-use and long-horizon tasks.

Total Visibility

Every HTTP request, tool call, and reasoning trace is logged. Full per-step telemetry into exactly how an agent reasons, acts, and fails.

Verifiable RL Environments Are the Next Frontier

If you need them built at scale, provably correct, deterministically verified, and checked to work via the intended path through exploit gates, reach out at tomer@tarantulabs.com or on LinkedIn.

Research

Proof that the environments are real signal, not just puzzles: studies that train and evaluate models directly on them, with full per-step telemetry. The environments came first; this is what they're for.

Latest Research

How NOT To Fine-Tune an Offensive AI Model

One week of Qwen 2.5 14B LoRA SFT on TarantuBench, every val attempt regressed or flatlined. Final layered SFT: 2/19 vs base 3/19.

View full results →

Environment Catalog

Interactive environments. Launch any one in your browser and attempt the exploit yourself. The same flag a model has to extract is the one you submit.

Category
Difficulty
AI Solve
0 scenarios available