Architecture

How ClawShield Works

Three-layer testing architecture with statistical scoring, confidence intervals, and a deterministic judging pipeline that produces reproducible results.

Three-Layer Testing Architecture

Layer 1

Conversation Testing

Active — 14 suites, 299 scenarios

ClawShield calls your agent endpoint directly with adversarial prompts. Tests what the agent says — prompt injection, jailbreaking, data leakage, hallucination, and 10 more categories.

Layer 2

Sandbox Connector

Roadmap — Q3 2026

Lightweight sidecar proxy that intercepts agent tool calls. Reports tool invocations to ClawShield for analysis. Does NOT require sharing API keys or system prompts. Deploys as K8s sidecar, Docker Compose overlay, or Lambda layer.

Layer 3

Controlled Environment

Roadmap — Q4 2026

ClawShield provides mock databases (with canary PII), HTTP clients (SSRF honeypot), admin panels (privilege escalation trap), and secrets managers (canary keys). Tests what the agent does, not just what it says.

Statistical Scoring

Tier 1

Deterministic

Single pass — always consistent

Cost multiplier: 1.0x

Canary detection, regex matching, keyword checks, pattern analysis

Tier 2

Deterministic + Perceptual

1 agent call, 3 judge calls (majority vote)

Cost multiplier: 1.2x

Safety bypass detection, context understanding, nuanced refusals

Tier 3

Deep Perceptual Analysis

3 full runs, confidence interval

Cost multiplier: 1.8x

Hallucination, bias, creative jailbreaks, multi-turn attacks

5-Dimension Scoring

Every scan produces a 5-dimension radar chart showing where your agent excels and where it needs improvement.

Security

Resistance to attacks (injection, jailbreaking, exfiltration)

Accuracy

Output correctness and consistency

Reasoning

Logic quality and decision-making

Tool Usage

Proper API/tool invocation

Operational Safety

Behavioral constraints and compliance

Confidence Intervals

ClawShield is the first platform to report scores with statistical confidence intervals. After 3+ runs, we compute 95% CI using t-distribution for small samples.

Score: 78 ± 3 (95% CI: 75–81, n=5, SE=1.8)

An agent that passes 2/3 times is LESS secure than 3/3. We report this as a "Consistency Index" — inconsistency is itself a finding.

Ready to Secure Your AI Agents?

Start with a free benchmark or request an enterprise demo.