Add readme

This commit is contained in:
Matthew Knight 2026-03-02 16:13:26 -08:00
parent 8d2c7bb8b6
commit 8dab439a54
No known key found for this signature in database
1 changed files with 220 additions and 0 deletions

220
README.md Normal file
View File

@ -0,0 +1,220 @@
# Cairn
Crash artifact aggregator and regression detection system. Collects, fingerprints, and analyzes crash artifacts across repositories. Provides a CLI client, REST API, and web dashboard.
## Architecture
- **Go server** (Gin) with embedded web UI
- **PostgreSQL** for persistent storage with auto-migrations
- **S3-compatible blob storage** (MinIO) for artifact files
- **Optional Forgejo integration** for issue tracking and commit statuses
- **Multi-format crash fingerprinting** — ASan, GDB/LLDB, Zig, and generic stack traces
## Quick Start
Start PostgreSQL and MinIO:
```sh
docker-compose up -d
```
Run the server:
```sh
go run ./cmd/cairn-server
```
Visit `http://localhost:8080`.
## Configuration
All configuration is via environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `CAIRN_LISTEN_ADDR` | `:8080` | HTTP server listen address |
| `CAIRN_DATABASE_URL` | `postgres://cairn:cairn@localhost:5432/cairn?sslmode=disable` | PostgreSQL connection URL |
| `CAIRN_S3_ENDPOINT` | `localhost:9000` | S3/MinIO endpoint |
| `CAIRN_S3_BUCKET` | `cairn-artifacts` | S3 bucket name |
| `CAIRN_S3_ACCESS_KEY` | `minioadmin` | S3 access key |
| `CAIRN_S3_SECRET_KEY` | `minioadmin` | S3 secret key |
| `CAIRN_S3_USE_SSL` | `false` | Enable SSL for S3 |
| `CAIRN_FORGEJO_URL` | *(empty)* | Forgejo base URL |
| `CAIRN_FORGEJO_TOKEN` | *(empty)* | Forgejo API token |
| `CAIRN_FORGEJO_WEBHOOK_SECRET` | *(empty)* | Secret for webhook HMAC verification |
## CLI
The `cairn` CLI communicates with the server over HTTP. All commands accept `-server URL` (env: `CAIRN_SERVER_URL`, default: `http://localhost:8080`).
### Upload an artifact
```sh
cairn upload \
-repo myproject -owner myorg -commit abc123 \
-type sanitizer -file crash.log \
-crash-message "heap-buffer-overflow" \
-stack-trace "$(cat trace.txt)"
```
Flags:
| Flag | Required | Description |
|------|----------|-------------|
| `-repo` | yes | Repository name |
| `-owner` | yes | Repository owner |
| `-commit` | yes | Commit SHA |
| `-type` | yes | `coredump`, `fuzz`, `sanitizer`, or `simulation` |
| `-file` | yes | Path to artifact file |
| `-crash-message` | no | Crash message text |
| `-stack-trace` | no | Stack trace text |
### Check for regressions
```sh
cairn check -repo myproject -base abc123 -head def456
```
Exits with code 1 if regressions are found. Output:
```json
{
"is_regression": true,
"new": ["<fingerprints appearing in head but not base>"],
"fixed": ["<fingerprints in base but not head>"],
"recurring": ["<fingerprints in both>"]
}
```
### Campaigns
```sh
# Start a campaign
cairn campaign start -repo myproject -owner myorg -name "nightly-fuzz" -type fuzz
# Finish a campaign
cairn campaign finish -id <campaign-id>
```
### Download an artifact
```sh
cairn download -id <artifact-id> -o output.bin
```
Omit `-o` to write to stdout.
## API Reference
All endpoints are under `/api/v1`.
### Artifacts
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/artifacts` | Ingest artifact (multipart: `meta` JSON + `file`) |
| `GET` | `/artifacts` | List artifacts (`?repository_id=&commit_sha=&type=&limit=&offset=`) |
| `GET` | `/artifacts/:id` | Get artifact details |
| `GET` | `/artifacts/:id/download` | Download artifact file |
### Crash Groups
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/crashgroups` | List crash groups (`?repository_id=&status=&limit=&offset=`) |
| `GET` | `/crashgroups/:id` | Get crash group details |
### Regression
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/regression/check` | Check regressions (`{repository, base_sha, head_sha}`) |
### Campaigns
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/campaigns` | Create campaign (`{repository, owner, name, type}`) |
| `GET` | `/campaigns` | List campaigns (`?repository_id=&limit=&offset=`) |
| `GET` | `/campaigns/:id` | Get campaign details |
| `POST` | `/campaigns/:id/finish` | Finish a campaign |
### Other
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/dashboard` | Dashboard statistics, trends, top crashers |
| `GET` | `/search` | Full-text search (`?q=&limit=&offset=`) |
| `POST` | `/webhooks/forgejo` | Forgejo webhook receiver |
## Crash Fingerprinting
The fingerprinting pipeline turns raw crash output into a stable identifier for grouping duplicate crashes.
### Pipeline
1. **Parse** — Extract stack frames from raw text. Parsers are tried in order: ASan → GDB/LLDB → Zig → Generic.
2. **Normalize** — Strip addresses, template parameters, ABI tags. Filter runtime/library frames. Keep the top 8 frames.
3. **Hash** — Combine normalized function names and file names, then SHA-256 hash to produce a 64-character hex fingerprint.
### Supported Formats
| Parser | Detects | Example Pattern |
|--------|---------|-----------------|
| ASan/MSan/TSan/UBSan | `==PID==ERROR: AddressSanitizer` | `#0 0x55a3b4 in func /file.c:10:5` |
| GDB/LLDB | Presence of `#0` frame marker | `#0 0x55a3b4 in func () at /file.c:10` |
| Zig | `panic:` keyword | `/file.zig:10:5: 0x55a3b4 in func (module)` |
| Generic | Heuristic fallback | `at func (file.c:10)` or `func+0x1a` |
### Normalization Rules
- Strip hex addresses
- Replace C++ template parameters (`vector<int>` → `vector<>`)
- Strip ABI tags (`[abi:cxx11]`)
- Extract file basenames only
- Filter runtime prefixes: `__libc_`, `__asan_`, `__sanitizer_`, `std.debug.`, etc.
## Forgejo Integration
When `CAIRN_FORGEJO_URL` and `CAIRN_FORGEJO_TOKEN` are set, Cairn integrates with Forgejo:
- **Issue creation** — New crash signatures automatically open issues prefixed with `[Cairn]`
- **Issue sync** — Closing/reopening a `[Cairn]` issue in Forgejo updates the crash group status in Cairn
- **Commit statuses** — Regression checks post `cairn/regression` status (success/failure) to commits
- **Webhooks** — Configure a Forgejo webhook pointing to `/api/v1/webhooks/forgejo` with the shared secret. Handles `issues` and `push` events. Verifies HMAC-SHA256 signatures via `X-Forgejo-Signature` (or `X-Gitea-Signature`).
## Deployment
Infrastructure is managed as a Nomad service job in `../infra/cairn/cairn.hcl`.
The Dockerfile uses a multi-stage build:
```dockerfile
# Build stage — Go 1.25 Alpine, static binary (CGO_ENABLED=0)
FROM golang:1.25-alpine AS builder
# ...
RUN CGO_ENABLED=0 go build -o /cairn-server ./cmd/cairn-server
# Runtime stage — minimal Alpine with ca-certificates and tzdata
FROM alpine:3.21
COPY --from=builder /cairn-server /usr/local/bin/cairn-server
ENTRYPOINT ["cairn-server"]
```
The server reads all configuration from environment variables at startup.
## Database
The schema auto-migrates on startup. Core tables:
| Table | Purpose |
|-------|---------|
| `repositories` | Tracked repositories (name, owner, optional Forgejo URL) |
| `commits` | Commit SHAs linked to repositories |
| `builds` | Optional build metadata (builder, flags, tags) |
| `artifacts` | Crash artifacts with blob references, metadata, and full-text search |
| `crash_signatures` | Unique fingerprints per repository with occurrence counts |
| `crash_groups` | Human-facing crash groups linked to signatures and Forgejo issues |
| `campaigns` | Testing campaigns (running/finished) grouping artifacts |
Artifacts have a GIN-indexed `tsvector` column for full-text search across crash messages, stack traces, and types.