Provably Correct by Construction

Verifiable RL environments,
generated at scale

TarantuLabs' proprietary engine generates dozens of new environments an hour. The current focus is cybersecurity, the hardest domain to verify: multi-step exploit chains where up to five vulnerabilities must compose correctly for the flag to fall out. Each one is deterministically solvable and automatically verified, verifiably correct, by the intended path, enforced by gates: the chain composes or it doesn't, with no human judgment and no partial credit. The hard part isn't the exploits; it's generating environments whose correctness is provable at scale.

View on GitHub

100

Verifiable Environments

Each carries a hidden flag, binary ground truth, generated and checked by a deterministic solver. Verifiably correct, by the intended path, enforced by gates, no human grading, no ambiguity.

Up to 5 Vulnerabilities

Exploit chains that must fire in sequence, ideal for evaluating and training correct tool-use and long-horizon tasks.

∞

Total Visibility

Every HTTP request, tool call, and reasoning trace is logged. Full per-step telemetry into exactly how an agent reasons, acts, and fails.

The hardest thing to verify

The difficulty isn't writing one exploit. It's generating an environment where a long chain of exploits is the only path to the flag, and proving that path holds before the environment ships.

Example environment · 5-vulnerability chain

gaming-platform-chain-biz-xss-jwt-ssrf-sqli

business-logic flaw → stored XSS → JWT forgery → SSRF → blind SQLi

Five steps, one flag. Miss any link and there is nothing to submit: the verifier returns false, deterministically. Automatically generated by the TarantuLabs engine: the pipeline produces the environment, a known-correct solution trace, and the verifier together, then proves the chain composes before the lab is published.

Security is the proof of difficulty, not the boundary. The engine underneath is domain-agnostic by design. Security is the first domain, not the only possible one.

Verifiable RL Environments Are the Next Frontier

If you need them built at scale, provably correct, deterministically verified, and checked to work via the intended path through exploit gates, reach out at tomer@tarantulabs.com or on LinkedIn.

Research

Proof that the environments are real signal, not just puzzles: studies that train and evaluate models directly on them, with full per-step telemetry. The environments came first; this is what they're for.

Latest Research

How NOT To Fine-Tune an Offensive AI Model

One week of Qwen 2.5 14B LoRA SFT on TarantuBench, every val attempt regressed or flatlined. Final layered SFT: 2/19 vs base 3/19.

View full results →

Environment Catalog

Interactive environments. Launch any one in your browser and attempt the exploit yourself. The same flag a model has to extract is the one you submit.

0 scenarios available

Verifiable RL environments,
generated at scale

Verifiable Environments

Up to 5 Vulnerabilities

Total Visibility

The hardest thing to verify

gaming-platform-chain-biz-xss-jwt-ssrf-sqli

Verifiable RL Environments Are the Next Frontier

Research

How NOT To Fine-Tune an Offensive AI Model

Environment Catalog

Research

TarantuLabs

Objectives

Hints

Submit Solution

Verifiable RL environments,generated at scale

Verifiable Environments

Up to 5 Vulnerabilities

Total Visibility

The hardest thing to verify

gaming-platform-chain-biz-xss-jwt-ssrf-sqli

Verifiable RL Environments Are the Next Frontier

Research

How NOT To Fine-Tune an Offensive AI Model

Environment Catalog

Research

Objectives

Hints

Submit Solution

Verifiable RL environments,
generated at scale