# Satoru Backup Service Plan ## Scope Build a Linux-over-SSH backup system where Satoru pulls edge data locally, snapshots it into a local restic repo, and syncs that repo to B2. ## Locked Decisions 1. Pull model only: edge hosts never push to B2 directly. 2. Directory targets use `rsync`. 3. SQLite targets run remote `.backup`, compress, pull, and cleanup. 4. Staging path: `./backups///` (single persistent path per target). 5. Site runs are background jobs; each site job is serialized, but multiple sites can run concurrently. 6. Partial target failure does not stop the whole site job; site health becomes `warning`. 7. Retention is restic-only (`forget --prune`), no tar archive layer. ## Pipeline 1. Preflight job: - SSH connectivity/auth. - Remote tool/path checks (rsync/sqlite3 as needed). - Local tool checks (`ssh`, `rsync`, `restic`, `gzip`). - SQLite preflight validates access/temp write capability only. 2. Backup job: - Pull sqlite artifacts. - Pull directory targets with rsync. - `restic backup` against local staging. - Update health and job status (`success|warning|failed`). 3. Retention job: - `restic forget --prune` per policy. 4. Sync job: - restic-native sync/copy to B2 repo on schedule. ## Minimal Data Model 1. `sites`: `site_uuid`, health fields, last preflight/scan. 2. `site_targets`: mode (`directory|sqlite_dump`), path/hash, last scan metadata. 3. `jobs`: type (`preflight|backup|restic_sync`), status, timing, attempts. 4. `job_events`: structured logs per step. 5. `sync_state`: last sync status/timestamp/error. ## Runtime Paths 1. Staging: `./backups///` 2. Local restic repo: `./repos/restic` ## Security Defaults Recommended: `0700` directories, `0600` files, dedicated `satoru` system user. ## Required Config 1. `staging_root` 2. `restic_repo_path` 3. `restic_password_file` or secret source 4. `restic_retention_policy` 5. `restic_sync_interval_hours` 6. `restic_b2_repository` 7. `restic_b2_account_id` / `restic_b2_account_key` secret source 8. `job_worker_concurrency` 9. `site_scan_interval_hours` (default 24) ## Build Order 1. Phase 1: queue tables + workers + Run->background + preflight-only. 2. Phase 2: sqlite pull + rsync pull + local restic backup. 3. Phase 3: restic retention + scheduled B2 sync + sync health UI. 4. Phase 4: restore UX + retries/backoff + alerts/observability. ## Operational Risks 1. Disk pressure from staging + restic repo -> enforce headroom checks. 2. SSH/command variability -> clear per-target errors and preflight gating. 3. Long-running jobs -> heartbeat, timeout, retry state.