Multi-Instance Deployment

Everruns supports running multiple server instances behind a load balancer for high availability and horizontal scaling.

Overview

Multiple control-plane instances can run concurrently, all connected to the same PostgreSQL database. Workers can connect to any instance and will claim tasks from a shared queue.

Architecture

┌─────────────────────────────────────────────────┐
│            Load Balancer (HTTP/2)               │
│          Health check: GET /health              │
└─────────────────────────────────────────────────┘
          │                │                │
          ▼                ▼                ▼
    ┌──────────┐     ┌──────────┐     ┌──────────┐
    │ Server 1 │     │ Server 2 │     │ Server 3 │
    │ :9000    │     │ :9000    │     │ :9000    │
    │ :9001    │     │ :9001    │     │ :9001    │
    └──────────┘     └──────────┘     └──────────┘
          │                │                │
          └────────────────┴────────────────┘
                          │
                          ▼
                  ┌───────────────┐
                  │  PostgreSQL   │
                  │  (Shared DB)  │
                  └───────────────┘
                          │
          ┌───────────────┴───────────────┐
          │                               │
          ▼                               ▼
    ┌──────────┐                    ┌──────────┐
    │ Worker 1 │ ... (N workers)    │ Worker N │
    └──────────┘                    └──────────┘

What’s Multi-Instance Safe

Component	Reason
PostgreSQL database	Shared, connection pool per instance
Database migrations	Advisory lock protected
Task claiming	`SELECT ... FOR UPDATE SKIP LOCKED` partitions work
Worker registration	Database-backed, any server can serve any worker
PgListener (task_available)	Each instance runs its own listener; all receive NOTIFY
PgListener (event_available)	Same; SSE clients on any instance see all events

Configuration

EXPECTED_INSTANCES

Set EXPECTED_INSTANCES to inform each instance about the total count:

EXPECTED_INSTANCES=3

This setting affects:

SSE Connection Limits
- Global and per-org limits divided by N
- Per-session limits unchanged (full limit on each instance)
- Prevents total connection count from exceeding desired limits
Database Pool Sizing
- Set DATABASE_POOL_MAX = pg_max_connections / N - margin
- A startup warning fires if pool × instances exceeds 80% of PG_MAX_CONNECTIONS (default 100)
- Prevents connection exhaustion
Metrics Aggregation
- Each instance maintains its own ring buffer
- /v1/durable/metrics/timeseries response includes instance_count field when > 1
- Helps interpret per-instance metrics

Example: 3-Instance Deployment

# Each server instance
EXPECTED_INSTANCES=3
DATABASE_POOL_MAX=25

# PostgreSQL configuration
# max_connections = 100 (default)
# Used: 3 instances × 25 connections = 75
# Margin: 25 connections (25%)

Database Pool Calculation

Formula:

DATABASE_POOL_MAX = (PG_MAX_CONNECTIONS - MARGIN) / EXPECTED_INSTANCES

Example with PostgreSQL max_connections=100:

Instances	Pool per Instance	Total Used	Margin
1	80	80	20
2	40	80	20
3	25	75	25
4	20	80	20

Startup validation: Server warns if DATABASE_POOL_MAX × EXPECTED_INSTANCES exceeds 80% of PG_MAX_CONNECTIONS.

Load Balancer Requirements

Protocol Support

HTTP/1.1 or HTTP/2 required for SSE (long-lived connections)
No HTTP/1.0 (doesn’t support chunked transfer encoding for SSE)

Health Check

GET /health

Response:

{
  "status": "ok"
}

Returns 200 OK if server is healthy
Returns 503 Service Unavailable if unhealthy
Check interval: 10-30 seconds recommended

Session Affinity

Not required. Everruns is designed to be stateless:

API requests are stateless
SSE connections reconnect automatically on disconnection
Database provides shared state across instances

Timeouts

Important for SSE: Configure appropriate timeouts for long-lived connections

# Nginx example
proxy_read_timeout 300s;  # 5 minutes
proxy_connect_timeout 10s;
proxy_send_timeout 10s;

Everruns server cycles SSE connections every 5 minutes by sending a disconnecting event. Clients automatically reconnect.

HTTP/2 Flow Control

Critical for high-concurrency SSE deployments.

The Problem

HTTP/2 uses flow control to prevent fast senders from overwhelming slow receivers. The default per-stream window (65 KB) is too small for many concurrent SSE streams, leading to:

Stream blocking when window exhausted
Cascade timeouts
Connection stalls

The Solution

Everruns exposes HTTP/2 configuration knobs:

# Per-stream flow control window (default: 2 MB)
HTTP2_STREAM_WINDOW_SIZE=2097152

# Per-connection flow control window (default: 16 MB)
HTTP2_CONNECTION_WINDOW_SIZE=16777216

# Max concurrent streams per connection (default: 256)
HTTP2_MAX_CONCURRENT_STREAMS=256

Tuning Guidelines

High event throughput:

HTTP2_STREAM_WINDOW_SIZE=4194304      # 4 MB
HTTP2_CONNECTION_WINDOW_SIZE=33554432  # 32 MB

Many slow clients:

HTTP2_STREAM_WINDOW_SIZE=8388608       # 8 MB
HTTP2_CONNECTION_WINDOW_SIZE=67108864  # 64 MB

Memory-constrained:

HTTP2_STREAM_WINDOW_SIZE=1048576       # 1 MB
HTTP2_CONNECTION_WINDOW_SIZE=8388608   # 8 MB

Adaptive Flow Control

Everruns enables HTTP/2 adaptive flow control (hyper auto-adjusts windows based on throughput). HTTP/2 PING keepalive runs every 20s to detect dead connections.

Docker Compose Example

3-Server + Load Balancer

services:
  # PostgreSQL (shared)
  postgres:
    image: postgres:17-alpine
    environment:
      POSTGRES_USER: everruns
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: everruns
      # Increase max_connections for multi-instance
      POSTGRES_INITDB_ARGS: "-c max_connections=200"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    command:
      - "postgres"
      - "-c"
      - "max_connections=200"

  # Server instances
  server-1:
    image: ghcr.io/everruns/everruns-server:latest
    environment:
      DATABASE_URL: postgres://everruns:${POSTGRES_PASSWORD}@postgres:5432/everruns
      SECRETS_ENCRYPTION_KEY: ${SECRETS_ENCRYPTION_KEY}
      WORKER_GRPC_AUTH_TOKEN: ${WORKER_GRPC_AUTH_TOKEN}
      EXPECTED_INSTANCES: 3
      DATABASE_POOL_MAX: 50
      PG_MAX_CONNECTIONS: 200
      HOST: 0.0.0.0
      PORT: "9000"
    depends_on:
      - postgres

  server-2:
    image: ghcr.io/everruns/everruns-server:latest
    environment:
      DATABASE_URL: postgres://everruns:${POSTGRES_PASSWORD}@postgres:5432/everruns
      SECRETS_ENCRYPTION_KEY: ${SECRETS_ENCRYPTION_KEY}
      WORKER_GRPC_AUTH_TOKEN: ${WORKER_GRPC_AUTH_TOKEN}
      EXPECTED_INSTANCES: 3
      DATABASE_POOL_MAX: 50
      PG_MAX_CONNECTIONS: 200
      HOST: 0.0.0.0
      PORT: "9000"
    depends_on:
      - postgres

  server-3:
    image: ghcr.io/everruns/everruns-server:latest
    environment:
      DATABASE_URL: postgres://everruns:${POSTGRES_PASSWORD}@postgres:5432/everruns
      SECRETS_ENCRYPTION_KEY: ${SECRETS_ENCRYPTION_KEY}
      WORKER_GRPC_AUTH_TOKEN: ${WORKER_GRPC_AUTH_TOKEN}
      EXPECTED_INSTANCES: 3
      DATABASE_POOL_MAX: 50
      PG_MAX_CONNECTIONS: 200
      HOST: 0.0.0.0
      PORT: "9000"
    depends_on:
      - postgres

  # Caddy load balancer
  caddy:
    image: caddy:2-alpine
    ports:
      - "9300:9300"
    configs:
      - source: caddyfile
        target: /etc/caddy/Caddyfile
    depends_on:
      - server-1
      - server-2
      - server-3

  # Workers (can connect to any server)
  worker-1:
    image: ghcr.io/everruns/everruns-worker:latest
    environment:
      WORKER_GRPC_ADDRESS: server-1:9001  # Or load balance gRPC too
      WORKER_GRPC_AUTH_TOKEN: ${WORKER_GRPC_AUTH_TOKEN}
    depends_on:
      - server-1

configs:
  caddyfile:
    content: |
      :9300 {
        # Load balance across servers
        reverse_proxy server-1:9000 server-2:9000 server-3:9000 {
          # Health check
          health_uri /health
          health_interval 10s
          health_timeout 5s
          
          # SSE requires no buffering
          flush_interval -1
        }
      }

volumes:
  postgres_data:

Migration Safety

Database migrations are safe in multi-instance deployments:

Advisory Lock: First instance to start acquires PostgreSQL advisory lock
Serial Execution: Other instances wait for migrations to complete
Lock Release: Lock released after migrations finish
No Race Conditions: Only one instance runs migrations

Disable Auto-Migrations

For controlled migration execution:

server-1:
  # No flag - runs migrations
  image: ghcr.io/everruns/everruns-server:latest

server-2:
  # Skip migrations
  image: ghcr.io/everruns/everruns-server:latest
  command: ["--no-migrations"]

server-3:
  # Skip migrations
  image: ghcr.io/everruns/everruns-server:latest
  command: ["--no-migrations"]

Or run migrations manually before starting instances:

# Run migrations once
docker run --rm \
  -e DATABASE_URL=postgres://... \
  ghcr.io/everruns/everruns-server:latest \
  migrate

# Start all instances with --no-migrations
docker compose up -d

Worker Distribution

Workers can connect to any server instance. Task claiming is handled by the database:

Task Claiming Flow

Worker polls server via gRPC: ClaimDurableTasks
Server queries database: SELECT ... FOR UPDATE SKIP LOCKED
Database atomically assigns task to one worker
Worker executes task
Worker reports completion to any server instance

Key insight: SKIP LOCKED ensures no two workers claim the same task, even if connected to different server instances.

Worker Load Balancing

You can: Option 1: Point all workers to one server

WORKER_GRPC_ADDRESS=server-1:9001

Option 2: Distribute workers across servers

worker-1:
  environment:
    WORKER_GRPC_ADDRESS: server-1:9001

worker-2:
  environment:
    WORKER_GRPC_ADDRESS: server-2:9001

worker-3:
  environment:
    WORKER_GRPC_ADDRESS: server-3:9001

Option 3: Use DNS round-robin or gRPC load balancer All options are safe - task claiming is always database-coordinated.

SSE Event Distribution

Server-Sent Events (SSE) work correctly in multi-instance deployments:

Event Flow

Event written to database by any instance
PostgreSQL NOTIFY sent to event_available channel
All instances receive notification (each has PgListener)
Each instance checks for SSE clients subscribed to that session
Matching clients receive event

Result: Clients connected to any instance see all events, regardless of which instance wrote the event.

Connection Distribution

SSE clients may connect to different instances:

Load balancer distributes connections
No session affinity required
Reconnections may land on different instance
Event stream remains consistent

Monitoring

Per-Instance Metrics

Each instance exposes metrics at /v1/durable/metrics/timeseries:

{
  "instance_count": 3,
  "metrics": {
    "task_claimed": [...],
    "task_completed": [...]
  }
}

Note: Metrics are per-instance. Aggregate across instances for cluster-wide view.

Database Metrics

Monitor PostgreSQL:

-- Active connections per instance (requires tracking)
SELECT application_name, count(*) 
FROM pg_stat_activity 
WHERE datname = 'everruns' 
GROUP BY application_name;

-- Total connections
SELECT count(*) FROM pg_stat_activity WHERE datname = 'everruns';

-- Max connections setting
SHOW max_connections;

Health Monitoring

Monitor each instance:

# Health check all instances
curl http://server-1:9000/health
curl http://server-2:9000/health
curl http://server-3:9000/health

Healthy response:

{"status": "ok"}

Scaling Guidelines

When to Add Instances

Add server instances when:

CPU usage consistently > 70%
API response times increasing
SSE connection limits reached
High request rate during peak traffic

Horizontal Scaling

Servers: Scale horizontally

Add more instances behind load balancer
Update EXPECTED_INSTANCES
Adjust DATABASE_POOL_MAX

Workers: Scale horizontally

Add more worker containers/processes
Workers are stateless and scale linearly
Monitor task queue depth

Database: Scale vertically (managed PostgreSQL)

Increase CPU/memory for higher throughput
Increase max_connections for more instances
Consider read replicas for read-heavy workloads (future)

Resource Planning

Component	Scaling Strategy	Bottleneck
Server	Horizontal (instances)	CPU, SSE connections
Worker	Horizontal (workers)	CPU (LLM calls are I/O)
Database	Vertical (bigger instance)	CPU, connections, IOPS

Troubleshooting

Connection Pool Exhaustion

Symptom: Errors like connection pool timeout Diagnosis:

SELECT count(*) FROM pg_stat_activity WHERE datname = 'everruns';
SHOW max_connections;

Solutions:

Reduce DATABASE_POOL_MAX per instance
Increase PostgreSQL max_connections
Reduce EXPECTED_INSTANCES (if overestimated)

Split-Brain (Not Possible)

Everruns cannot experience split-brain because:

All state stored in PostgreSQL
No in-memory state shared across instances
No consensus protocol needed
Database provides serialization

Uneven Load Distribution

Symptom: One instance handling most traffic Solutions:

Check load balancer algorithm (use round-robin or least-connections)
Verify all instances are healthy
Check for long-lived connections (SSE) pinning clients to one instance

Get Started

Core Concepts

Guides

Integrations

Deployment

Documentation Index

​Overview

​Architecture

​What’s Multi-Instance Safe

​Configuration

​EXPECTED_INSTANCES

​Example: 3-Instance Deployment

​Database Pool Calculation

​Load Balancer Requirements

​Protocol Support

​Health Check

​Session Affinity

​Timeouts

​HTTP/2 Flow Control

​The Problem

​The Solution

​Tuning Guidelines

​Adaptive Flow Control

​Docker Compose Example

​3-Server + Load Balancer

​Migration Safety

​Disable Auto-Migrations

​Worker Distribution

​Task Claiming Flow

​Worker Load Balancing

​SSE Event Distribution

​Event Flow

​Connection Distribution

​Monitoring

​Per-Instance Metrics

​Database Metrics

​Health Monitoring

​Scaling Guidelines

​When to Add Instances

​Horizontal Scaling

​Resource Planning

​Troubleshooting

​Connection Pool Exhaustion

​Split-Brain (Not Possible)

​Uneven Load Distribution

​Production Checklist

​Next Steps