Skip to content

Deployment Guide

Production deployment guide for HandoffRail — Docker, PostgreSQL, Redis, monitoring, and tier configuration.

Quick Deploy (Docker Compose)

Development

git clone https://github.com/MelaBuilt-AI/handoffrail.git
cd handoffrail
docker compose up --build

This starts the API on http://localhost:8080 with SQLite. Good for local development and testing.

Production

# Copy environment template
cp .env.example .env

# Edit with your PostgreSQL credentials
# Required: HR_DATABASE_URL, HR_REDIS_URL

# Start production stack
docker compose -f docker-compose.prod.yml up -d

Production compose includes: - API server (HandoffRail FastAPI) - PostgreSQL 16 (persistent storage with volume) - Redis 7 (caching, future Celery task queue)


Environment Variables

Variable Default Description
HR_ENVIRONMENT dev dev, staging, or prod
HR_DATABASE_URL sqlite+aiosqlite:///./handoffrail.db Database connection URL
HR_REDIS_URL redis://localhost:6379/0 Redis connection URL
HR_TIER_DEFAULT free Default tier for new API keys
HR_LOG_LEVEL info Log level: debug, info, warning, error
HR_PORT 8080 Server port
HR_CORS_ORIGINS ["*"] CORS allowed origins (JSON list)

Database URL Formats

Database URL Format
SQLite (dev) sqlite+aiosqlite:///./handoffrail.db
PostgreSQL postgresql://user:pass@host:5432/handoffrail
PostgreSQL (alt) postgres://user:pass@host:5432/handoffrail

The server auto-detects PostgreSQL and uses the asyncpg driver. If DATABASE_URL is omitted, it falls back to SQLite for development.


PostgreSQL Setup

1. Create Database

CREATE DATABASE handoffrail;
CREATE USER handoffrail WITH PASSWORD 'your_secure_password';
GRANT ALL PRIVILEGES ON DATABASE handoffrail TO handoffrail;

2. Set Connection String

export HR_DATABASE_URL="postgresql://handoffrail:your_secure_password@localhost:5432/handoffrail"

3. Run Migrations

cd server
alembic upgrade head

4. Start the Server

uvicorn app.main:app --host 0.0.0.0 --port 8080

Production Docker Compose

The docker-compose.prod.yml includes:

services:
  api:
    build: .
    ports:
      - "8080:8080"
    environment:
      HR_ENVIRONMENT: prod
      HR_DATABASE_URL: postgresql://handoffrail:${DB_PASSWORD}@db:5432/handoffrail
      HR_REDIS_URL: redis://redis:6379/0
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: handoffrail
      POSTGRES_USER: handoffrail
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U handoffrail"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 5s
      retries: 5

volumes:
  pgdata:
  redisdata:

Tier Configuration

Tier quotas are configured via environment or config.py:

Feature Free Pro Business
Handoffs/day 5 Unlimited Unlimited
Max agents 2 10 50
Max API keys 1 5 25
Max packet size 64 KB 256 KB 1 MB
Rate limit (req/hr) 100 1,000 10,000
Webhooks ✅ (5) ✅ (Unlimited)
Audit trail ✅ (30 days) ✅ (Full + export)

Custom Tier Configuration

Override in environment (JSON):

export HR_TIER_QUOTAS='{
  "free": {"handoffs_per_day": 10, "max_agents": 3, "max_api_keys": 2, "max_packet_size": 65536, "unlimited_handoffs": false},
  "pro": {"handoffs_per_day": 0, "max_agents": 20, "max_api_keys": 10, "max_packet_size": 524288, "unlimited_handoffs": true}
}'

Health & Monitoring

Endpoints

Endpoint Purpose Auth Required
GET /health Liveness probe — returns 200 if process is running No
GET /ready Readiness probe — returns 200 if DB connected, 503 if not No
GET /metrics Prometheus metrics No

Prometheus Metrics

Standard Prometheus format at /metrics:

# HELP handoffrail_requests_total Total HTTP requests
# TYPE handoffrail_requests_total counter
handoffrail_requests_total{method="POST",endpoint="/api/v1/packets",status_code="201"} 142

# HELP handoffrail_request_latency_seconds Request latency
# TYPE handoffrail_request_latency_seconds histogram
handoffrail_request_latency_seconds_bucket{method="POST",endpoint="/api/v1/packets",le="0.1"} 128

# HELP handoffrail_active_packets Currently active (non-terminal) packets
# TYPE handoffrail_active_packets gauge
handoffrail_active_packets 23

# HELP handoffrail_handoffs_total Completed handoffs per tenant
# TYPE handoffrail_handoffs_total counter
handoffrail_handoffs_total{tenant_id="abc123"} 89

Prometheus Scrape Config

scrape_configs:
  - job_name: 'handoffrail'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: /metrics

Kubernetes Probes

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Structured Logging

HandoffRail uses structlog for structured JSON logging:

{
  "event": "packet_created",
  "packet_id": "a1b2c3d4-...",
  "source_agent": "sales-01",
  "target_agent": "billing-01",
  "priority": "high",
  "tenant_id": "abc123",
  "timestamp": "2026-05-30T19:35:00Z",
  "level": "info"
}

Configure log level:

export HR_LOG_LEVEL=debug  # dev
export HR_LOG_LEVEL=warning # prod

CORS

Default: allows all origins (["*"]). For production, restrict:

export HR_CORS_ORIGINS='["https://your-app.com", "https://admin.your-app.com"]'

A warning is logged if CORS_ORIGINS=["*"] is set in production mode.


Reverse Proxy (nginx)

server {
    listen 443 ssl;
    server_name api.handoffrail.dev;

    ssl_certificate /etc/letsencrypt/live/api.handoffrail.dev/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.handoffrail.dev/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Backups

SQLite (Dev)

# Simple file copy (stop the server first)
cp handoffrail.db handoffrail.db.backup

PostgreSQL (Prod)

# pg_dump
pg_dump -U handoffrail -d handoffrail -F c -f backup_$(date +%Y%m%d).dump

# Restore
pg_restore -U handoffrail -d handoffrail backup_20260530.dump

Automated Backups (Cron)

# Add to crontab (daily at 2am)
0 2 * * * pg_dump -U handoffrail -d handoffrail -F c -f /backups/handoffrail_$(date +\%Y\%m\%d).dump

Scaling Considerations

Concern Recommendation
Single-server bottlenecks Run multiple API instances behind a load balancer. PostgreSQL handles concurrent connections.
Connection pooling SQLAlchemy async pool is built-in. Tune pool_size and max_overflow for your workload.
Redis caching Enable for session caching and rate limit counters. Required for multi-instance deployments.
WebSocket scaling (v0.2) Redis Pub/Sub for event broadcasting across instances.
Large packet storage Keep artifacts small (<1MB). Reference external storage (S3, GCS) for large files.

Security Checklist

  • [ ] HTTPS enabled (TLS termination at reverse proxy or load balancer)
  • [ ] API keys are hashed at rest
  • [ ] CORS origins restricted in production
  • [ ] Rate limiting enabled per tier
  • [ ] Packet size limits enforced
  • [ ] PostgreSQL credentials stored securely (env vars, not code)
  • [ ] Redis not exposed to public internet
  • [ ] Webhook secrets minimum 16 characters
  • [ ] HMAC-SHA256 webhook signature verification implemented on receiver
  • [ ] Health endpoints don't expose sensitive data
  • [ ] Structured logging doesn't leak API keys or packet contents