/dr-verify
Standalone tri-layer self-verification: deterministic floor + cross-model peer-review (zero-flag UX, 6-step provider chain) + native runtime dispatch
Overview
/dr-verify runs a three-layer self-verification pipeline that catches gaps at progressively deeper levels, cheapest-first. Layer 1 is a deterministic shell pipeline (zero LLM cost). Layer 2 is a cross-model peer-review with clean external context — provider auto-resolves via a 6-step chain (zero-flag UX). Layer 3 is a native runtime dispatch with multi-agent fan-out. Findings carry an explicit source_layer tag plus a peer_review_mode taxonomy (cross_vendor / cross_claude_family / same_model_isolated) so the audit log preserves provenance and dispatch class.
Usage
/dr-verify {TASK-ID} # zero-flag — provider auto-resolves
/dr-verify {TASK-ID} --stage do
/dr-verify {TASK-ID} --floor-only # Layer 1 only, fast pre-merge gating
/dr-verify {TASK-ID} --peer-provider deepseek # explicit override (chain step #1)
The Three Layers
- Layer 1: Deterministic floor — runs
dev-tools/dr-verify-floor.sh, a pure shell pipeline performing AC coverage grep, file-touched audit, test-presence parse, and shellcheck on dev-tools/scripts. Zero LLM cost; runs in seconds. Emits JSONL findings withsource_layer: "floor"on stdout. - Layer 2: Cross-model peer-review — dispatches an adversarial review in clean external context. Provider auto-resolves via a 6-step chain (see § Provider Resolution below); the previous «default DeepSeek» behaviour is now chain step #4, not a hardcoded literal. The
--task-idpropagation is mandatory — without it the downstream token-cost tooling cannot filter logs by task. Findings taggedsource_layer: "peer_review"+peer_review_provider: <name>+peer_review_mode: cross_vendor|cross_claude_family|same_model_isolated. - Layer 3: Native runtime dispatch — verification agents on the local runtime:
- Claude path (canonical): 3 parallel read-only subagents (reviewer + tester + security). Findings union-merged and deduped by tuple
(artifact_ref, ac_criteria, category); higher severity wins on collision. - Codex path
[experimental]fallback: single-prompt loop with canonical adversarial framing. Demoted from canonical — prefer Layer 2 cross-model coverage on Codex too.
- Claude path (canonical): 3 parallel read-only subagents (reviewer + tester + security). Findings union-merged and deduped by tuple
Provider Resolution (Zero-Flag UX)
When --peer-provider is not set, the helper dev-tools/resolve-peer-provider.sh walks a 6-step chain. The first step that yields a provider wins.
--peer-provider <name>CLI flag — explicit override../datarim/config.yamlpeer_review.provider— per-project, team-shared (committed to repo).~/.config/datarim/config.yamlpeer_review.provider— per-user XDG, machine-local.~/.config/coworker/profiles.yamlcodeprofilerecommended_provider— coworker default (typicallycross_vendorlike DeepSeek).- Cross-Claude-family fallback — dispatches agents/peer-reviewer.md at
model: sonnetin a fresh subagent context. Claude Code runtime only; covered by Claude subscription, no external API key required. - Same-model isolated last resort — Opus reviewing Opus output (Codex degraded path or final fallback). Least-mature pattern; emits stderr WARN.
Provider whitelist: deepseek | moonshot | openrouter | sonnet | haiku | opus | none. Unknown values reject with exit 1 (supply-chain mitigation: malicious PR injecting provider: typosquat-host blocked at parse).
peer_review_mode Taxonomy (3-tier)
cross_vendor— DeepSeek / Moonshot / OpenRouter viacoworker ask. Different company, different training run, different RLHF — strongest mitigation of self-agreement bias.cross_claude_family— Sonnet 4.6 reviewing Opus 4.7 output via Claude Code subagent dispatch. Different model checkpoints with different post-training runs, isolated subagent context. Middle tier — covered by Claude subscription. First measured tier: empirical bias delta vs same-model self-critique remains under measurement in the active dogfood window.same_model_isolated— Opus reviewing Opus or Codex single-prompt loop. Same model family, same training distribution. Last-resort fallback only; least-mature pattern.
Codex CLI degraded mode: when CODEX_RUNTIME=1 is set in env, chain step #5 is skipped and step #6 is taken. The helper writes WARN: Codex runtime detected, falling back to same_model_isolated mode to stderr; orchestrator propagates this warning into the audit-log.
Findings Schema
Each finding follows a standardized schema (canonical in skills/self-verification.md § Findings Schema):
source_layer— enum:floor,peer_review,dispatchseverity— enum:high,medium,lowcategory— enum:correctness,completeness,consistency,safetyevidence—{type, source, excerpt}withtypein{file_quote, test_output, absent}(absentauto-discards)ac_criteria— array of AC labels the finding maps to (may be empty)peer_review_provider— optional, Layer 2 only (provider name)peer_review_mode— optional, Layer 2 only (3-tier taxonomy enum)peer_review_provider_source_layer— optional, Layer 2 only (chain step that resolved:cli_flag,per_project_config,per_user_config,coworker_default,fallback_subagent,fallback_isolated)check_name— optional, Layer 1 only (e.g.ac_coverage_grep,shellcheck)
JSONL Emission Discipline (Layer 2 prompts)
Findings are emitted ONLY when a check FAILS or surfaces NEW DRIFT. PASS-as-finding entries (e.g. {"check_name":"F001 cleared, no finding"}) are rejected — confirmations belong in the final-line summary {cleared_iter1: [...], total_new_findings: N}, not in the array. Keeps the audit log signal-dense.
Verdicts
- PASS — only low-severity (or zero) non-discarded findings
- CONDITIONAL — ≥1 medium and zero high
- BLOCKED — ≥1 high (merge/archive blocked until resolved)
--floor-only Flag
The --floor-only flag short-circuits at Layer 1 — ideal for fast pre-merge gating in CI. Completes in seconds with zero LLM cost, blocks only on structural / shellcheck-error issues. Layer 2 and Layer 3 are skipped.
Verification Outcome Tagging
At /dr-archive time the operator fills a verification_outcome block in the archive frontmatter (canonical schema in templates/archive-template.md): caught_by_verify, missed_by_verify, false_positive, n_a, dogfood_window (operator-supplied window-id used as grouping key, not a date range). The aggregator dev-tools/measure-prospective-rate.sh --since <YYYY-MM-DD> walks all archive-*.md files and computes caught_per_5_tasks with a decision_hint for the next pipeline gate.
Example Session (zero-flag UX, no external API key)
> /dr-verify {TASK-ID} --stage all --max-iter 2
Layer 1 — floor: dr-verify-floor.sh --task {TASK-ID} --stage all
→ 2 findings (medium, safety, check_name=shellcheck)
→ exit 0 (no high-severity, proceed)
Layer 2 — peer_review (provider auto-resolution chain)
Step #1 --peer-provider flag → not set
Step #2 ./datarim/config.yaml → not set
Step #3 ~/.config/datarim/... → not set
Step #4 coworker --profile code → not set
Step #5 cross-Claude-family → MATCH (subagent peer-reviewer @ sonnet)
Dispatching agents/peer-reviewer.md, model=sonnet, isolated context
→ 1 finding (medium, correctness, peer_review_mode=cross_claude_family)
Layer 3 — dispatch runtime=claude
3 parallel agents: reviewer / tester / security
→ reviewer: 1 finding (completeness)
→ tester: 0 findings
→ security: 0 findings
Aggregate: 4 unique findings post-dedupe
Verdict: CONDITIONAL (0 high, 4 medium)
source_layer_breakdown: {floor: 2, peer_review: 1, dispatch: 1}
peer_review_mode_breakdown: {cross_claude_family: 1}
Audit: datarim/qa/verify-{TASK-ID}-all-1.md (chmod a-w)
Related Commands
- /dr-qa — multi-layer quality verification (different surface; complementary)
- /dr-archive — operator fills
verification_outcomeblock here