Datarim Doctor
Schema spec for thin-index operational files: canonical regex, YAML frontmatter contract, 6-pass migration semantics (incl. archive-section enforcement), data-loss safety contract. Loaded on demand by /dr-doctor and /dr-init self-heal.
Overview
Datarim Doctor is the runtime knowledge module that defines the thin-index contract. Operational files (tasks.md, backlog.md, activeContext.md) carry one-liner-per-task pointers; task descriptions live in per-task files at datarim/tasks/{TASK-ID}-task-description.md with closed YAML frontmatter. progress.md and the activeContext.md § Последние завершённые rolling log are abolished — completion history lives in documentation/archive/ and git log only.
Why Thin Indexes
Operational files are indexes, not content. Each line answers four questions: which task, what state, where the description lives. No prose, no requirements, no plan content lives in tasks.md / backlog.md.
- Bounded context — agents read 1 KB index instead of 100 KB monolith.
- Single source of truth per task — description, ACs, constraints in one file.
- Greppable state — line format is machine-parseable; status changes are 1-line diffs.
- Idempotent migrations —
/dr-doctorcan run any number of times without drift.
Canonical Line Regex
^- ([A-Z]{2,10}-[0-9]{4}) · (STATUS) · P[0-3] · L[1-4] · (.{1,80}) → tasks/\1-task-description\.md$
Status sets:
tasks.md:in_progress | blocked | not_startedbacklog.md:pending | blocked-pending | cancelled
Separator: · (U+00B7 MIDDLE DOT). Arrow: → (U+2192). Title: 1–80 chars, single-line, no →.
Description File Contract
Every task has a description file at datarim/tasks/{TASK-ID}-task-description.md with closed 12-key YAML frontmatter:
---
id: <TASK-ID> # ^[A-Z]{2,10}-[0-9]{4}$
title: <string> # ≤ 80 chars
status: <enum>
priority: <enum> # P0|P1|P2|P3
complexity: <enum> # L1|L2|L3|L4
type: <string> # framework|infra|content|...
project: <string> # Datarim|Arcanada|...
started: <date> # YYYY-MM-DD
parent: <TASK-ID|null>
related: <list[TASK-ID]>
prd: <relpath|null>
plan: <relpath|null>
---
Body: 5 canonical sections (Overview / Acceptance Criteria / Constraints / Out of Scope / Related), capped at 250 lines. Optional ## Implementation Notes and ## Decisions.
6-Pass Migration Algorithm
- Pass 1 — Description files (build cache): walk legacy
### TASK-ID:headings; extract status/priority/complexity/type/started/parent/related/prd/plan; write per-task file with frontmatter. - Pass 2 — Operational files: rewrite
tasks.mdandbacklog.mdas one-liner indexes grouped by section. - Pass 3 — activeContext.md: convert legacy
**Current Task:**shape into## Active Taskslist (Active-Tasks-only mirror, ≤30 lines). - Pass 4 — backlog-archive migration: AWK section-state machine + per-ID dispatch splits legacy
backlog-archive.mdintodocumentation/archive/cancelled/(synthesised stubs) and area-specificarchive-{TASK-ID}.mdfor completed entries (verify-or-synthesise intogeneral/);--no-promptflag for CI. - Pass 5 — post-fix re-scan: composition of existing scan dispatch in dry-run mode after
--fix; asserts post-fix zero findings +.pre-v2.baksidecar preserved + idempotent rerun (second--fixis a no-op). - Pass 6 — operational-files archive section migration (TUNE-0085 v1.21.5, hardened TUNE-0088 v1.21.6): strips
## Archivedfromtasks.md/backlog.mdand### Archived/### Recently Archived/## Последние завершённыеfromactiveContext.md— sections that violate canonical thin-index contract («one section only», v1.19.1). Four bullet shapes auto-detected (S1 arrow-link, S2 status-paren, S4 mid-bold-context, S3 plain-bold). Compound task IDs supported (e.g.DEV-1212-S8,DEV-1196-FOLLOWUP-lock-ownership-doc). Explicit→ documentation/archive/{path}.mdpointer in bullet body wins over hardcodedprefix_to_areamapping. Per bullet: verify canonical archive at resolved path → strip; missing → defensivefindacross area subdirs (depth ≤ 3) — if found with ID literal, strip-with-warning; otherwise synthesise stub; collision → respect--conflict-policy. Headerless fallback: operational files without an archive header are processed line-by-line; bullets with explicit non-terminal status (in_progress,not_started,blocked, …) pass through as active content.
Idempotency guard: if no ### TASK-ID: headings exist, no legacy backlog-archive.md, and all bullet lines match canonical regex, exit 0 immediately. Cheap probe for /dr-init self-heal.
Data-Loss Safety Contract
Defence-in-depth around --fix mode (TUNE-0077):
- Pre-write tarball backup —
umask 077tarball written to${DATARIM_DOCTOR_BACKUP_DIR:-/tmp}/datarim-backup-{TS}.tgzbefore any mutation. Path surfaces in success summary. - Sidecar copy — every legacy file mutated by Pass 4 also gets a
.pre-v2.baksidecar in-tree (operator-visible). - Invariant —
EMITTED_COUNT >= PARSED_COUNT. Doctor counts task entries before and after rewrite. Violation triggersrestore_backup_and_die(): removes mutated state,tar -xzfthe tarball back into place, exits 2. - Symlink-default uniformity — under
install.shdefault mode,~/.claude/scripts/datarim-doctor.shis a directory-symlink target of the canonical Datarim repo. Divergence is impossible by construction; rogue v2 binaries cannot be silently dropped on top.
Self-Heal Entry Points
/dr-doctor(always)/dr-initself-heal — probesdatarim-doctor.sh --quiet; offers/dr-doctor --fixon non-compliance./dr-archivepre-archive gate —pre-archive-check.shvalidates line format; bypass with--no-schema-checkonly during in-flight migration.
Edge Cases
- Bash 3.2 (macOS default) — tool uses two-pass grep+awk parser, NOT NUL-delimited reads.
- Title with → character — escaped or rejected (regex disallows). Operator must rename.
- Concurrent invocation —
flockon$DATARIM_ROOT/.dr-doctor.lock. Second instance exits 3. - Path traversal — lexical canonicalisation via
scripts/lib/canonicalise.sh(no I/O). Tool exits 4.
Loaded By
/dr-doctor(always)/dr-initself-heal step (when probe returns exit 1)/dr-archiveline-format gate (on failure, to explain non-compliance)