2.6 KiB
2.6 KiB
Satoru Backup Service Plan
Scope
Build a Linux-over-SSH backup system where Satoru pulls edge data locally, snapshots it into a local restic repo, and syncs that repo to B2.
Locked Decisions
- Pull model only: edge hosts never push to B2 directly.
- Directory targets use
rsync. - SQLite targets run remote
.backup, compress, pull, and cleanup. - Staging path:
./backups/<site_uuid>/<target_hash>/(single persistent path per target). - Site runs are background jobs; each site job is serialized, but multiple sites can run concurrently.
- Partial target failure does not stop the whole site job; site health becomes
warning. - Retention is restic-only (
forget --prune), no tar archive layer.
Pipeline
- Preflight job:
- SSH connectivity/auth.
- Remote tool/path checks (rsync/sqlite3 as needed).
- Local tool checks (
ssh,rsync,restic,gzip). - SQLite preflight validates access/temp write capability only.
- Backup job:
- Pull sqlite artifacts.
- Pull directory targets with rsync.
restic backupagainst local staging.- Update health and job status (
success|warning|failed).
- Retention job:
restic forget --pruneper policy.
- Sync job:
- restic-native sync/copy to B2 repo on schedule.
Minimal Data Model
sites:site_uuid, health fields, last preflight/scan.site_targets: mode (directory|sqlite_dump), path/hash, last scan metadata.jobs: type (preflight|backup|restic_sync), status, timing, attempts.job_events: structured logs per step.sync_state: last sync status/timestamp/error.
Runtime Paths
- Staging:
./backups/<site_uuid>/<target_hash>/ - Local restic repo:
./repos/restic
Security Defaults
Recommended: 0700 directories, 0600 files, dedicated satoru system user.
Required Config
staging_rootrestic_repo_pathrestic_password_fileor secret sourcerestic_retention_policyrestic_sync_interval_hoursrestic_b2_repositoryrestic_b2_account_id/restic_b2_account_keysecret sourcejob_worker_concurrencysite_scan_interval_hours(default 24)
Build Order
- Phase 1: queue tables + workers + Run->background + preflight-only.
- Phase 2: sqlite pull + rsync pull + local restic backup.
- Phase 3: restic retention + scheduled B2 sync + sync health UI.
- Phase 4: restore UX + retries/backoff + alerts/observability.
Operational Risks
- Disk pressure from staging + restic repo -> enforce headroom checks.
- SSH/command variability -> clear per-target errors and preflight gating.
- Long-running jobs -> heartbeat, timeout, retry state.