diff --git a/README.md b/README.md new file mode 100644 index 0000000..ac2a7d2 --- /dev/null +++ b/README.md @@ -0,0 +1,220 @@ +# Cairn + +Crash artifact aggregator and regression detection system. Collects, fingerprints, and analyzes crash artifacts across repositories. Provides a CLI client, REST API, and web dashboard. + +## Architecture + +- **Go server** (Gin) with embedded web UI +- **PostgreSQL** for persistent storage with auto-migrations +- **S3-compatible blob storage** (MinIO) for artifact files +- **Optional Forgejo integration** for issue tracking and commit statuses +- **Multi-format crash fingerprinting** — ASan, GDB/LLDB, Zig, and generic stack traces + +## Quick Start + +Start PostgreSQL and MinIO: + +```sh +docker-compose up -d +``` + +Run the server: + +```sh +go run ./cmd/cairn-server +``` + +Visit `http://localhost:8080`. + +## Configuration + +All configuration is via environment variables: + +| Variable | Default | Description | +|----------|---------|-------------| +| `CAIRN_LISTEN_ADDR` | `:8080` | HTTP server listen address | +| `CAIRN_DATABASE_URL` | `postgres://cairn:cairn@localhost:5432/cairn?sslmode=disable` | PostgreSQL connection URL | +| `CAIRN_S3_ENDPOINT` | `localhost:9000` | S3/MinIO endpoint | +| `CAIRN_S3_BUCKET` | `cairn-artifacts` | S3 bucket name | +| `CAIRN_S3_ACCESS_KEY` | `minioadmin` | S3 access key | +| `CAIRN_S3_SECRET_KEY` | `minioadmin` | S3 secret key | +| `CAIRN_S3_USE_SSL` | `false` | Enable SSL for S3 | +| `CAIRN_FORGEJO_URL` | *(empty)* | Forgejo base URL | +| `CAIRN_FORGEJO_TOKEN` | *(empty)* | Forgejo API token | +| `CAIRN_FORGEJO_WEBHOOK_SECRET` | *(empty)* | Secret for webhook HMAC verification | + +## CLI + +The `cairn` CLI communicates with the server over HTTP. All commands accept `-server URL` (env: `CAIRN_SERVER_URL`, default: `http://localhost:8080`). + +### Upload an artifact + +```sh +cairn upload \ + -repo myproject -owner myorg -commit abc123 \ + -type sanitizer -file crash.log \ + -crash-message "heap-buffer-overflow" \ + -stack-trace "$(cat trace.txt)" +``` + +Flags: + +| Flag | Required | Description | +|------|----------|-------------| +| `-repo` | yes | Repository name | +| `-owner` | yes | Repository owner | +| `-commit` | yes | Commit SHA | +| `-type` | yes | `coredump`, `fuzz`, `sanitizer`, or `simulation` | +| `-file` | yes | Path to artifact file | +| `-crash-message` | no | Crash message text | +| `-stack-trace` | no | Stack trace text | + +### Check for regressions + +```sh +cairn check -repo myproject -base abc123 -head def456 +``` + +Exits with code 1 if regressions are found. Output: + +```json +{ + "is_regression": true, + "new": [""], + "fixed": [""], + "recurring": [""] +} +``` + +### Campaigns + +```sh +# Start a campaign +cairn campaign start -repo myproject -owner myorg -name "nightly-fuzz" -type fuzz + +# Finish a campaign +cairn campaign finish -id +``` + +### Download an artifact + +```sh +cairn download -id -o output.bin +``` + +Omit `-o` to write to stdout. + +## API Reference + +All endpoints are under `/api/v1`. + +### Artifacts + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/artifacts` | Ingest artifact (multipart: `meta` JSON + `file`) | +| `GET` | `/artifacts` | List artifacts (`?repository_id=&commit_sha=&type=&limit=&offset=`) | +| `GET` | `/artifacts/:id` | Get artifact details | +| `GET` | `/artifacts/:id/download` | Download artifact file | + +### Crash Groups + +| Method | Path | Description | +|--------|------|-------------| +| `GET` | `/crashgroups` | List crash groups (`?repository_id=&status=&limit=&offset=`) | +| `GET` | `/crashgroups/:id` | Get crash group details | + +### Regression + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/regression/check` | Check regressions (`{repository, base_sha, head_sha}`) | + +### Campaigns + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/campaigns` | Create campaign (`{repository, owner, name, type}`) | +| `GET` | `/campaigns` | List campaigns (`?repository_id=&limit=&offset=`) | +| `GET` | `/campaigns/:id` | Get campaign details | +| `POST` | `/campaigns/:id/finish` | Finish a campaign | + +### Other + +| Method | Path | Description | +|--------|------|-------------| +| `GET` | `/dashboard` | Dashboard statistics, trends, top crashers | +| `GET` | `/search` | Full-text search (`?q=&limit=&offset=`) | +| `POST` | `/webhooks/forgejo` | Forgejo webhook receiver | + +## Crash Fingerprinting + +The fingerprinting pipeline turns raw crash output into a stable identifier for grouping duplicate crashes. + +### Pipeline + +1. **Parse** — Extract stack frames from raw text. Parsers are tried in order: ASan → GDB/LLDB → Zig → Generic. +2. **Normalize** — Strip addresses, template parameters, ABI tags. Filter runtime/library frames. Keep the top 8 frames. +3. **Hash** — Combine normalized function names and file names, then SHA-256 hash to produce a 64-character hex fingerprint. + +### Supported Formats + +| Parser | Detects | Example Pattern | +|--------|---------|-----------------| +| ASan/MSan/TSan/UBSan | `==PID==ERROR: AddressSanitizer` | `#0 0x55a3b4 in func /file.c:10:5` | +| GDB/LLDB | Presence of `#0` frame marker | `#0 0x55a3b4 in func () at /file.c:10` | +| Zig | `panic:` keyword | `/file.zig:10:5: 0x55a3b4 in func (module)` | +| Generic | Heuristic fallback | `at func (file.c:10)` or `func+0x1a` | + +### Normalization Rules + +- Strip hex addresses +- Replace C++ template parameters (`vector` → `vector<>`) +- Strip ABI tags (`[abi:cxx11]`) +- Extract file basenames only +- Filter runtime prefixes: `__libc_`, `__asan_`, `__sanitizer_`, `std.debug.`, etc. + +## Forgejo Integration + +When `CAIRN_FORGEJO_URL` and `CAIRN_FORGEJO_TOKEN` are set, Cairn integrates with Forgejo: + +- **Issue creation** — New crash signatures automatically open issues prefixed with `[Cairn]` +- **Issue sync** — Closing/reopening a `[Cairn]` issue in Forgejo updates the crash group status in Cairn +- **Commit statuses** — Regression checks post `cairn/regression` status (success/failure) to commits +- **Webhooks** — Configure a Forgejo webhook pointing to `/api/v1/webhooks/forgejo` with the shared secret. Handles `issues` and `push` events. Verifies HMAC-SHA256 signatures via `X-Forgejo-Signature` (or `X-Gitea-Signature`). + +## Deployment + +Infrastructure is managed as a Nomad service job in `../infra/cairn/cairn.hcl`. + +The Dockerfile uses a multi-stage build: + +```dockerfile +# Build stage — Go 1.25 Alpine, static binary (CGO_ENABLED=0) +FROM golang:1.25-alpine AS builder +# ... +RUN CGO_ENABLED=0 go build -o /cairn-server ./cmd/cairn-server + +# Runtime stage — minimal Alpine with ca-certificates and tzdata +FROM alpine:3.21 +COPY --from=builder /cairn-server /usr/local/bin/cairn-server +ENTRYPOINT ["cairn-server"] +``` + +The server reads all configuration from environment variables at startup. + +## Database + +The schema auto-migrates on startup. Core tables: + +| Table | Purpose | +|-------|---------| +| `repositories` | Tracked repositories (name, owner, optional Forgejo URL) | +| `commits` | Commit SHAs linked to repositories | +| `builds` | Optional build metadata (builder, flags, tags) | +| `artifacts` | Crash artifacts with blob references, metadata, and full-text search | +| `crash_signatures` | Unique fingerprints per repository with occurrence counts | +| `crash_groups` | Human-facing crash groups linked to signatures and Forgejo issues | +| `campaigns` | Testing campaigns (running/finished) grouping artifacts | + +Artifacts have a GIN-indexed `tsvector` column for full-text search across crash messages, stack traces, and types.