Skill Maintenance

Datarim Doctor

Schema spec for thin-index operational files: canonical regex, YAML frontmatter contract, 6-pass migration semantics (incl. archive-section enforcement), data-loss safety contract. Loaded on demand by /dr-doctor and /dr-init self-heal.

Overview

Datarim Doctor is the runtime knowledge module that defines the thin-index contract. Operational files (tasks.md, backlog.md, activeContext.md) carry one-liner-per-task pointers; task descriptions live in per-task files at datarim/tasks/{TASK-ID}-task-description.md with closed YAML frontmatter. progress.md and the activeContext.md § Последние завершённые rolling log are abolished — completion history lives in documentation/archive/ and git log only.

Why Thin Indexes

Operational files are indexes, not content. Each line answers four questions: which task, what state, where the description lives. No prose, no requirements, no plan content lives in tasks.md / backlog.md.

  • Bounded context — agents read 1 KB index instead of 100 KB monolith.
  • Single source of truth per task — description, ACs, constraints in one file.
  • Greppable state — line format is machine-parseable; status changes are 1-line diffs.
  • Idempotent migrations/dr-doctor can run any number of times without drift.

Canonical Line Regex

^- ([A-Z]{2,10}-[0-9]{4}) · (STATUS) · P[0-3] · L[1-4] · (.{1,80}) → tasks/\1-task-description\.md$

Status sets:

  • tasks.md: in_progress | blocked | not_started
  • backlog.md: pending | blocked-pending | cancelled

Separator: · (U+00B7 MIDDLE DOT). Arrow: (U+2192). Title: 1–80 chars, single-line, no .

Description File Contract

Every task has a description file at datarim/tasks/{TASK-ID}-task-description.md with closed 12-key YAML frontmatter:

---
id: <TASK-ID>                 # ^[A-Z]{2,10}-[0-9]{4}$
title: <string>               # ≤ 80 chars
status: <enum>
priority: <enum>              # P0|P1|P2|P3
complexity: <enum>            # L1|L2|L3|L4
type: <string>                # framework|infra|content|...
project: <string>             # Datarim|Arcanada|...
started: <date>               # YYYY-MM-DD
parent: <TASK-ID|null>
related: <list[TASK-ID]>
prd: <relpath|null>
plan: <relpath|null>
---

Body: 5 canonical sections (Overview / Acceptance Criteria / Constraints / Out of Scope / Related), capped at 250 lines. Optional ## Implementation Notes and ## Decisions.

6-Pass Migration Algorithm

  1. Pass 1 — Description files (build cache): walk legacy ### TASK-ID: headings; extract status/priority/complexity/type/started/parent/related/prd/plan; write per-task file with frontmatter.
  2. Pass 2 — Operational files: rewrite tasks.md and backlog.md as one-liner indexes grouped by section.
  3. Pass 3 — activeContext.md: convert legacy **Current Task:** shape into ## Active Tasks list (Active-Tasks-only mirror, ≤30 lines).
  4. Pass 4 — backlog-archive migration: AWK section-state machine + per-ID dispatch splits legacy backlog-archive.md into documentation/archive/cancelled/ (synthesised stubs) and area-specific archive-{TASK-ID}.md for completed entries (verify-or-synthesise into general/); --no-prompt flag for CI.
  5. Pass 5 — post-fix re-scan: composition of existing scan dispatch in dry-run mode after --fix; asserts post-fix zero findings + .pre-v2.bak sidecar preserved + idempotent rerun (second --fix is a no-op).
  6. Pass 6 — operational-files archive section migration (TUNE-0085 v1.21.5, hardened TUNE-0088 v1.21.6): strips ## Archived from tasks.md/backlog.md and ### Archived/### Recently Archived/## Последние завершённые from activeContext.md — sections that violate canonical thin-index contract («one section only», v1.19.1). Four bullet shapes auto-detected (S1 arrow-link, S2 status-paren, S4 mid-bold-context, S3 plain-bold). Compound task IDs supported (e.g. DEV-1212-S8, DEV-1196-FOLLOWUP-lock-ownership-doc). Explicit → documentation/archive/{path}.md pointer in bullet body wins over hardcoded prefix_to_area mapping. Per bullet: verify canonical archive at resolved path → strip; missing → defensive find across area subdirs (depth ≤ 3) — if found with ID literal, strip-with-warning; otherwise synthesise stub; collision → respect --conflict-policy. Headerless fallback: operational files without an archive header are processed line-by-line; bullets with explicit non-terminal status (in_progress, not_started, blocked, …) pass through as active content.

Idempotency guard: if no ### TASK-ID: headings exist, no legacy backlog-archive.md, and all bullet lines match canonical regex, exit 0 immediately. Cheap probe for /dr-init self-heal.

Data-Loss Safety Contract

Defence-in-depth around --fix mode (TUNE-0077):

  • Pre-write tarball backupumask 077 tarball written to ${DATARIM_DOCTOR_BACKUP_DIR:-/tmp}/datarim-backup-{TS}.tgz before any mutation. Path surfaces in success summary.
  • Sidecar copy — every legacy file mutated by Pass 4 also gets a .pre-v2.bak sidecar in-tree (operator-visible).
  • InvariantEMITTED_COUNT >= PARSED_COUNT. Doctor counts task entries before and after rewrite. Violation triggers restore_backup_and_die(): removes mutated state, tar -xzf the tarball back into place, exits 2.
  • Symlink-default uniformity — under install.sh default mode, ~/.claude/scripts/datarim-doctor.sh is a directory-symlink target of the canonical Datarim repo. Divergence is impossible by construction; rogue v2 binaries cannot be silently dropped on top.

Self-Heal Entry Points

  • /dr-doctor (always)
  • /dr-init self-heal — probes datarim-doctor.sh --quiet; offers /dr-doctor --fix on non-compliance.
  • /dr-archive pre-archive gate — pre-archive-check.sh validates line format; bypass with --no-schema-check only during in-flight migration.

Edge Cases

  • Bash 3.2 (macOS default) — tool uses two-pass grep+awk parser, NOT NUL-delimited reads.
  • Title with → character — escaped or rejected (regex disallows). Operator must rename.
  • Concurrent invocationflock on $DATARIM_ROOT/.dr-doctor.lock. Second instance exits 3.
  • Path traversal — lexical canonicalisation via scripts/lib/canonicalise.sh (no I/O). Tool exits 4.

Loaded By

  • /dr-doctor (always)
  • /dr-init self-heal step (when probe returns exit 1)
  • /dr-archive line-format gate (on failure, to explain non-compliance)