Benchmarks

Prove the
agent on
dedicated
targets.

Benchmark traffic runs on separate infrastructure, not the agent's own host, so measurement stays realistic without exposing the control plane itself.

tracks

4

deployed

0

  • benchmark / dvmcp

    DVMCP

    pending

    Damn Vulnerable MCP Server, an open-source benchmark focused on MCP-specific exploits.

    Native probes mcp-tool-description-injection and mcp-schema-poisoning will target this first.

  • benchmark / dvla

    DVLA

    pending

    Damn Vulnerable LLM Application, used here as a prompt-injection-shaped benchmark surface.

    Reuses the prompt-injection probe family and needs a benchmark host.

  • benchmark / cybench

    Cybench subset

    pending

    A 10 to 15 task slice of Cybench aligned to the categories Spieon can meaningfully address.

    Subset list is locked in PRD section 11.

  • benchmark / custom

    Spieon x402 target

    pending

    Custom vulnerable x402 endpoint that intentionally accepts replayed payment headers.

    This is expected to be the first benchmark to ship for demos.