/dr-orchestrate
Tmux-based self-driving Datarim pipeline runner — Phase 2 (Subagent Inference, autonomy L2)
Overview
/dr-orchestrate is the framework's reference non-core plugin. Phase 1 (v2.3.0) shipped a lean rule-based tmux runner. Phase 2 (v2.4.0) adds a subagent inference layer that activates when the rule-based parser cannot classify a pane line, plus a flock-race-safe cooldown and audit schema v2. Plugin autonomy is now L2 (assisted): a human still owns escalation, but unknown prompts no longer dead-end at the parser.
The plugin lives at plugins/dr-orchestrate/ in the framework repo and is enabled via the standard plugin CLI (/dr-plugin).
Usage
# Enable the plugin once
/dr-plugin enable dr-orchestrate
# Optional: opt in to send-keys (default fail-closed)
cp ~/.claude/plugins/dr-orchestrate/user-config.template.yaml \
~/.claude/plugins/dr-orchestrate/user-config.yaml
chmod 600 ~/.claude/plugins/dr-orchestrate/user-config.yaml
$EDITOR ~/.claude/plugins/dr-orchestrate/user-config.yaml # set key_injection: true
# Run a single Phase 2 cycle (parse → resolver → autonomous-or-escalate)
/dr-orchestrate run
# Dry-run for validation
/dr-orchestrate run --dry-run
# Manually resolve a pasted prompt without consuming a tmux pane
/dr-orchestrate run --unknown-prompt "operator paste: > /dr-prd ready for strategy gate"
Subagent Inference (Phase 2)
On parser miss (confidence: 0), cmd_run.sh dispatches to subagent_resolver.sh, which classifies the pane text via a configurable fallback chain of AI CLI backends:
- coworker-deepseek (default primary) —
coworker ask --provider deepseek --profile code; vendor-neutral OSS CLI. - claude —
claude --print --output-format=json; the wrapper carries{type, result}, the resolver re-parses.result. - codex —
codex exec --output-last-message -; best-effort, chain continues on parse fail.
Each backend has a 15 s wall-clock budget (DR_ORCH_RESOLVER_TIMEOUT_S), runs with FD 3 closed for bats-harness compatibility, and is skipped silently when the binary is absent from $PATH (one-time WARN on first miss, deduped via $STATE_DIR/.warned.<backend>). A lenient JSON extractor handles raw bodies, fenced ```json blocks, and prose-wrapped objects.
The autonomous-vs-escalate decision lives in cmd_run.sh, gated on subagent.confidence_threshold (default 0.80). Resolver outputs below the threshold (or chain_exhausted) route to the escalation sink; outputs at/above the threshold pass through the decision-cooldown gate before any autonomous action.
Escalation
Two backends are wired in Phase 2 (operator-overridable via DR_ORCH_ESCALATION_BACKEND):
- mock (default) — appends a JSONL event to
~/.local/share/dr-orchestrate/escalation.jsonlwith a frozen schema (schema_version,cycle_id,pane_id,prompt_hash,action_suggested,confidence,reason,subagent_model,backend_used,escalation_backend,mock). - dev-bot — stub returning exit 99 with WARN until a real consumer service lands.
Security Floor
Every tmux send-keys and every autonomous decision goes through a fixed, fail-closed pipeline:
- Whitelist — only
[a-zA-Z0-9 _-./:=@]permitted. Anything else → block + audit. - Escape block — any byte
0x1brejected. - Micro-cooldown — 500 ms gate per send to a given pane.
- Decision-cooldown — 60 s gate per autonomous decision per pane (now reached via the resolver path).
- Flock-safe lock (Phase 2 addition) —
flock -nper (pane, kind) on Linux; macOS hosts emit a one-time WARN and operate at Phase-1 non-atomic semantics. - Violation tracker — 5 violations of any kind in 1 hour → pane blocked for 1 hour.
Audit (schema v2)
Phase 2 introduces make_event_v2 alongside the Phase 1 emitter. Schema v2 lines extend the v1 6-field event with resolver/escalation metadata:
{
"schema_version": 2,
"timestamp": "2026-05-11T08:00:00Z",
"matched_text_hash": "af86d…",
"command": "/dr-do",
"exit_code": 0,
"duration_ms": 1421,
"pane_id": "datarim:0.0",
"confidence": 0.87,
"subagent_model": "deepseek-chat",
"backend_used": "coworker-deepseek",
"escalation_backend": "",
"stage": "resolve",
"outcome": "resolved",
"reason": "explicit slash-command"
}
The hash-only-credentials invariant is preserved — raw pane text never enters the log. The reason field is truncated to 500 characters and grep-redacted for password=, token=, secret=, credential=, and api_key= patterns before emission.
Configuration
# user-config.yaml
subagent:
fallback_chain: ["coworker-deepseek", "claude", "codex"]
timeout_s: 15
confidence_threshold: 0.80
escalation:
backend: "mock"
mock_log: ~/.local/share/dr-orchestrate/escalation.jsonl
Autonomy Levels
- Phase 1 — L1 (manual; rule-based confidence; no learning).
- Phase 2 — L2 (assisted; multi-backend subagent inference + race-safe cooldown + audit v2).
- Phase 3 — L4 (planned: auto-learning rules with 24-hour re-validation).
Soak Verdict
The plugin ships with dev-tools/measure-orchestrator-soak.sh, a verdict gate computing false_escalate_rate = escalated / (resolved + escalated) from schema-v2 audit events. Default threshold < 0.15 over the last 48 h.
Bot-Interaction Interface (v2.5.0+)
From Datarim v2.5.0 / plugin v0.3.0 the orchestrator gains a programmatic IO surface in addition to the tmux pane — a framework-owned wire contract that lets a bot (or any HTTP client) submit prompts and receive escalation/progress events.
- OpenAPI 3.1 spec —
plugins/dr-orchestrate/openapi/orchestrator-interface.yaml. Single inbound endpointPOST /orchestrator/input, Bearer auth, JSON body (session_id,command,ts, optionalmeta). Default response202 Accepted; sync shortcut200 + inline bodyonly for whitelist commands (dr-status,dr-help) when the client sendsX-Sync-Timeout(hard-capped ≤ 2000 ms). - Reference impl —
adnanh/webhookv2.8.3 (Go single binary, MIT, no runtime deps) +config/hooks.yaml+scripts/orchestrator-input-handler.sh. The handler validates the Bearer header against the Vault-sourced secret, schema-checks the body, and atomically writes a ULID-named JSON file into~/.local/share/datarim-orchestrate/inbox/. The main loop (cmd_run.sh) drains the inbox oldest-first per cycle and injects.commandasUNKNOWN_TEXT, falling through to the existing semantic-parser → resolver pipeline. - Outbound emitter —
_emit_devbotinescalation_backend.shreplaces the v0.2.x stub. Two backends, switched viaDR_ORCH_OUTBOUND_BACKEND:callback(default — HMAC-SHA256 sign +X-Timestamp300 s replay window, curl POST toDR_ORCH_ESCALATION_DEVBOT_URL) andredis(opt-in —redis-cli PUBLISH orchestrator-out:{session_id}viaDR_ORCH_OUTBOUND_REDIS_URL). - Activation gate — when
DR_ORCH_ESCALATION_DEVBOT_URLis unset,_emit_devbotsilentlyreturn 0(noop). Rollback =unsetthe env var. No code rollback needed. - Network exposure — Tier 1 only. Reference impl binds
127.0.0.1:8090(loopback, single-tenant). Redis backend consumes an existing Tailscale-only listener (no new Tier 3 surface). - Contract tests —
tests/contract/run-schemathesis.sh+ CI workflow.github/workflows/dr-orchestrate-contract.ymlstart the reference impl on an ephemeral port and run schemathesis property-based fuzz against the OpenAPI spec.dev-tools/check-agent0017-live.shis the manual pre-activation gate (curl /healthz+ smokePOST /prompts) operators run before setting the production env.
Out of Scope (Phase 2)
Telegram bridge UI, auto-learned rules write path (Phase 3), real dev-bot HTTP endpoint, Vault secrets_backend.sh rewrite, embedding/vector classification, multi-host SSH aggregation, Docker/Kubernetes orchestration, native Windows.