Benchmarks
Prove the
agent on
dedicated
targets.
Benchmark traffic runs on separate infrastructure, not the agent's own host, so measurement stays realistic without exposing the control plane itself.
tracks
4
deployed
0
- pending
benchmark / dvmcp
DVMCP
Damn Vulnerable MCP Server, an open-source benchmark focused on MCP-specific exploits.
Native probes mcp-tool-description-injection and mcp-schema-poisoning will target this first.
- pending
benchmark / dvla
DVLA
Damn Vulnerable LLM Application, used here as a prompt-injection-shaped benchmark surface.
Reuses the prompt-injection probe family and needs a benchmark host.
- pending
benchmark / cybench
Cybench subset
A 10 to 15 task slice of Cybench aligned to the categories Spieon can meaningfully address.
Subset list is locked in PRD section 11.
- pending
benchmark / custom
Spieon x402 target
Custom vulnerable x402 endpoint that intentionally accepts replayed payment headers.
This is expected to be the first benchmark to ship for demos.