Command Plugin

/dr-orchestrate

Tmux-based self-driving Datarim pipeline runner — Phase 2 (Subagent Inference, autonomy L2)

Overview

/dr-orchestrate is the framework's reference non-core plugin. Phase 1 (v2.3.0) shipped a lean rule-based tmux runner. Phase 2 (v2.4.0) adds a subagent inference layer that activates when the rule-based parser cannot classify a pane line, plus a flock-race-safe cooldown and audit schema v2. Plugin autonomy is now L2 (assisted): a human still owns escalation, but unknown prompts no longer dead-end at the parser.

The plugin lives at plugins/dr-orchestrate/ in the framework repo and is enabled via the standard plugin CLI (/dr-plugin).

Usage

# Enable the plugin once
/dr-plugin enable dr-orchestrate

# Optional: opt in to send-keys (default fail-closed)
cp ~/.claude/plugins/dr-orchestrate/user-config.template.yaml \
   ~/.claude/plugins/dr-orchestrate/user-config.yaml
chmod 600 ~/.claude/plugins/dr-orchestrate/user-config.yaml
$EDITOR  ~/.claude/plugins/dr-orchestrate/user-config.yaml   # set key_injection: true

# Run a single Phase 2 cycle (parse → resolver → autonomous-or-escalate)
/dr-orchestrate run

# Dry-run for validation
/dr-orchestrate run --dry-run

# Manually resolve a pasted prompt without consuming a tmux pane
/dr-orchestrate run --unknown-prompt "operator paste: > /dr-prd ready for strategy gate"

Subagent Inference (Phase 2)

On parser miss (confidence: 0), cmd_run.sh dispatches to subagent_resolver.sh, which classifies the pane text via a configurable fallback chain of AI CLI backends:

  • coworker-deepseek (default primary) — coworker ask --provider deepseek --profile code; vendor-neutral OSS CLI.
  • claudeclaude --print --output-format=json; the wrapper carries {type, result}, the resolver re-parses .result.
  • codexcodex exec --output-last-message -; best-effort, chain continues on parse fail.

Each backend has a 15 s wall-clock budget (DR_ORCH_RESOLVER_TIMEOUT_S), runs with FD 3 closed for bats-harness compatibility, and is skipped silently when the binary is absent from $PATH (one-time WARN on first miss, deduped via $STATE_DIR/.warned.<backend>). A lenient JSON extractor handles raw bodies, fenced ```json blocks, and prose-wrapped objects.

The autonomous-vs-escalate decision lives in cmd_run.sh, gated on subagent.confidence_threshold (default 0.80). Resolver outputs below the threshold (or chain_exhausted) route to the escalation sink; outputs at/above the threshold pass through the decision-cooldown gate before any autonomous action.

Escalation

Two backends are wired in Phase 2 (operator-overridable via DR_ORCH_ESCALATION_BACKEND):

  • mock (default) — appends a JSONL event to ~/.local/share/dr-orchestrate/escalation.jsonl with a frozen schema (schema_version, cycle_id, pane_id, prompt_hash, action_suggested, confidence, reason, subagent_model, backend_used, escalation_backend, mock).
  • dev-bot — stub returning exit 99 with WARN until a real consumer service lands.

Security Floor

Every tmux send-keys and every autonomous decision goes through a fixed, fail-closed pipeline:

  • Whitelist — only [a-zA-Z0-9 _-./:=@] permitted. Anything else → block + audit.
  • Escape block — any byte 0x1b rejected.
  • Micro-cooldown — 500 ms gate per send to a given pane.
  • Decision-cooldown — 60 s gate per autonomous decision per pane (now reached via the resolver path).
  • Flock-safe lock (Phase 2 addition) — flock -n per (pane, kind) on Linux; macOS hosts emit a one-time WARN and operate at Phase-1 non-atomic semantics.
  • Violation tracker — 5 violations of any kind in 1 hour → pane blocked for 1 hour.

Audit (schema v2)

Phase 2 introduces make_event_v2 alongside the Phase 1 emitter. Schema v2 lines extend the v1 6-field event with resolver/escalation metadata:

{
  "schema_version": 2,
  "timestamp": "2026-05-11T08:00:00Z",
  "matched_text_hash": "af86d…",
  "command": "/dr-do",
  "exit_code": 0,
  "duration_ms": 1421,
  "pane_id": "datarim:0.0",
  "confidence": 0.87,
  "subagent_model": "deepseek-chat",
  "backend_used": "coworker-deepseek",
  "escalation_backend": "",
  "stage": "resolve",
  "outcome": "resolved",
  "reason": "explicit slash-command"
}

The hash-only-credentials invariant is preserved — raw pane text never enters the log. The reason field is truncated to 500 characters and grep-redacted for password=, token=, secret=, credential=, and api_key= patterns before emission.

Configuration

# user-config.yaml
subagent:
  fallback_chain: ["coworker-deepseek", "claude", "codex"]
  timeout_s: 15
  confidence_threshold: 0.80

escalation:
  backend: "mock"
  mock_log: ~/.local/share/dr-orchestrate/escalation.jsonl

Autonomy Levels

  • Phase 1 — L1 (manual; rule-based confidence; no learning).
  • Phase 2 — L2 (assisted; multi-backend subagent inference + race-safe cooldown + audit v2).
  • Phase 3 — L4 (planned: auto-learning rules with 24-hour re-validation).

Soak Verdict

The plugin ships with dev-tools/measure-orchestrator-soak.sh, a verdict gate computing false_escalate_rate = escalated / (resolved + escalated) from schema-v2 audit events. Default threshold < 0.15 over the last 48 h.

Bot-Interaction Interface (v2.5.0+)

From Datarim v2.5.0 / plugin v0.3.0 the orchestrator gains a programmatic IO surface in addition to the tmux pane — a framework-owned wire contract that lets a bot (or any HTTP client) submit prompts and receive escalation/progress events.

  • OpenAPI 3.1 specplugins/dr-orchestrate/openapi/orchestrator-interface.yaml. Single inbound endpoint POST /orchestrator/input, Bearer auth, JSON body (session_id, command, ts, optional meta). Default response 202 Accepted; sync shortcut 200 + inline body only for whitelist commands (dr-status, dr-help) when the client sends X-Sync-Timeout (hard-capped ≤ 2000 ms).
  • Reference impladnanh/webhook v2.8.3 (Go single binary, MIT, no runtime deps) + config/hooks.yaml + scripts/orchestrator-input-handler.sh. The handler validates the Bearer header against the Vault-sourced secret, schema-checks the body, and atomically writes a ULID-named JSON file into ~/.local/share/datarim-orchestrate/inbox/. The main loop (cmd_run.sh) drains the inbox oldest-first per cycle and injects .command as UNKNOWN_TEXT, falling through to the existing semantic-parser → resolver pipeline.
  • Outbound emitter_emit_devbot in escalation_backend.sh replaces the v0.2.x stub. Two backends, switched via DR_ORCH_OUTBOUND_BACKEND: callback (default — HMAC-SHA256 sign + X-Timestamp 300 s replay window, curl POST to DR_ORCH_ESCALATION_DEVBOT_URL) and redis (opt-in — redis-cli PUBLISH orchestrator-out:{session_id} via DR_ORCH_OUTBOUND_REDIS_URL).
  • Activation gate — when DR_ORCH_ESCALATION_DEVBOT_URL is unset, _emit_devbot silently return 0 (noop). Rollback = unset the env var. No code rollback needed.
  • Network exposure — Tier 1 only. Reference impl binds 127.0.0.1:8090 (loopback, single-tenant). Redis backend consumes an existing Tailscale-only listener (no new Tier 3 surface).
  • Contract teststests/contract/run-schemathesis.sh + CI workflow .github/workflows/dr-orchestrate-contract.yml start the reference impl on an ephemeral port and run schemathesis property-based fuzz against the OpenAPI spec. dev-tools/check-agent0017-live.sh is the manual pre-activation gate (curl /healthz + smoke POST /prompts) operators run before setting the production env.

Out of Scope (Phase 2)

Telegram bridge UI, auto-learned rules write path (Phase 3), real dev-bot HTTP endpoint, Vault secrets_backend.sh rewrite, embedding/vector classification, multi-host SSH aggregation, Docker/Kubernetes orchestration, native Windows.