Skill Tech

File-Sync Configuration

Pre-flight checklist + ignore patterns for file-sync (Syncthing/rclone/rsync/Dropbox/iCloud) — protects git working trees, virtualenvs, and build artifacts.

Overview

File-Sync Configuration captures the pre-flight rules every two-way file synchroniser needs before it ever touches the network. Load it before configuring Syncthing, rclone bisync, Dropbox/iCloud/Google Drive shared folders, periodic rsync jobs, Disk Arcana sync, or any custom sync layer. Do not load for one-way backups or CI artifact transfer — those have a different risk model.

Why It Matters (Founding Incident)

INFRA-0026 (2026-04-25): the first .stignore for Syncthing had 28 patterns and missed .venv, __pycache__, target/, *.db, plus failed to exclude nested git repos wholesale. Outcome: 1 materialised production sync-conflict, 60+ .sync-conflict files in a week, 14 git repos with diverging working trees across hosts, and a real cross-platform breakage risk (macOS Mach-O vs Linux ELF binaries). Expanding the pattern set 28 → 66 dropped the file count from 40,361 to 2,206 (−95%).

Pre-Flight Inventory (Mandatory)

Before turning sync on, run find against the source root for every problem class — whatever the inventory surfaces must be in the ignore list before the first sync:

  • Vendored / build artifactsnode_modules, .venv/venv, __pycache__, target, .next/.turbo/.nuxt, .cache/.parcel-cache, coverage/.nyc_output, dist/build/.build, DerivedData, .pytest_cache/.mypy_cache/.ruff_cache
  • Nested .git directories (critical) — every .git/ under the sync root
  • Local DB / state files*.db, *.sqlite/*.sqlite3, *.duckdb, *.db-journal
  • Compiled binaries (cross-platform unsafe) — *.so, *.dylib, *.dll, *.exe
  • IDE / OS junk.idea, .vscode, .DS_Store, Thumbs.db

Decision Tree: Sync Working Trees vs Git Pull

For every .git/ found inside the sync root, ask: does the second node host live edits, agents, or production runtime in this repo?

  • Yes → do not sync the working tree. Exclude /path/to/repo wholesale. Update via a git pull cron on the second node (see arcanada-pull.sh pattern).
  • No → the working tree may be synced as a read-only mirror, but still exclude .git/ — each node keeps its own commit history.

Default to "yes" — almost every second node eventually becomes "active" (a new agent, a deploy script, a manual edit). Overprotection beats recovery.

Reusable .stignore Template (Syncthing, INFRA-0026 v2)

The hardened template covers eight buckets:

  1. Project source code (separate git repos) — /Projects/*/code, /Projects/Datarim/sources, /Projects/Rules of Robotics/Code
  2. AI agents with their own git/venv/AI_agents/Email Agent, Screen reader, Remove-Watermark, Agent Dreamer
  3. Workflow / runtime state.git, .dreamer, .meta, .claude, .githooks
  4. Build / depsnode_modules, dist, build, .next, .turbo, .nuxt, .cache, .parcel-cache, coverage, .nyc_output, target
  5. Python environments / caches.venv, venv, __pycache__, *.pyc, .pytest_cache, .mypy_cache, .ruff_cache
  6. Swift build artifacts.build, DerivedData, *.xcuserstate
  7. Compiled binaries*.so, *.dylib, *.dll, *.exe, *.o, *.a
  8. DB / state files*.db, *.sqlite, *.sqlite3, *.duckdb, *.db-journal, *.db-shm, *.db-wal

Plus secrets/temp/OS junk: *.tmp, *.log, .env*, .DS_Store, Thumbs.db, .Spotlight-V100, .Trashes, .fseventsd.

Pattern Syntax Cheat-Sheet

Syncthing (.stignore): node_modules matches at any depth, /Projects/*/code is path-anchored, (?d)pattern deletes already-synced files, (?i) is case-insensitive, !important.log negates.

rclone: trailing / targets folders only, ** recurses across folders, /path/to/exclude/** is path-anchored.

rsync: no file/dir distinction, /relative/path is anchored at start dir, **/*.tmp for recursive globs.

Workflow for Git-Managed Repos (when file-sync is excluded)

  1. Cron git pull script — the recommended pattern is arcanada-pull.sh: fetch upstream, skip if local==remote, skip if branch ≠ main/master, stash local edits, ff-only pull with merge fallback, then a CLI Claude conflict-resolver fallback, alert via Ops Bot if unresolved, pop stash.
  2. CI/CD self-hosted runner — a GitHub Actions runner on the second node pulls on push to main (event-driven, no polling).
  3. Manual — the operator runs git pull on demand. Fine for rarely-updated repos.

Compliance Check

  • Pre-flight inventory completed for every problem class
  • Every discovered class is present in the ignore patterns
  • Every nested .git/ is either fully excluded or documented as a read-only mirror
  • Cross-platform binary classes (.venv, target, *.so/*.dylib/*.dll) excluded if syncing across operating systems
  • DB files (*.db, *.sqlite) excluded as host-local state
  • Lockdown applied (globalAnnounce=false, no public discovery, transport restricted to a private network such as Tailscale)
  • Backup of pre-change config preserved (config.xml.pre-{TASK-ID})
  • Runbook documented (topology, ops, rollback)
  • Bidirectional smoke test executed