Deployment Guide

Guide for deploying HybridInference in production.

Production Deployment

Using systemd

The recommended way to deploy HybridInference is using systemd.

Install dependencies:

cd hybridInference
uv venv -p 3.10
source .venv/bin/activate
uv sync

Create systemd unit file:

sudo cp infrastructure/systemd/hybrid_inference.service /etc/systemd/system/

Configure environment:

Edit /etc/systemd/system/hybrid_inference.service and update:
- WorkingDirectory
- User
- Environment variables

Start the service:

sudo systemctl daemon-reload
sudo systemctl enable hybrid_inference.service
sudo systemctl start hybrid_inference.service

Check status:

sudo systemctl status hybrid_inference.service
journalctl -u hybrid_inference.service -f

Environment Variables

Required environment variables for production:

# API Keys
DEEPSEEK_API_KEY=your-key
GEMINI_API_KEY=your-key
LLAMA_API_KEY=your-key

# Database
DB_NAME=hybridinference
DB_USER=postgres
DB_PASSWORD=your-secure-password
DB_HOST=localhost
DB_PORT=5432

# Local vLLM (optional)
LOCAL_BASE_URL=http://localhost:8000/v1

# Rate limiting (optional)
RATE_LIMIT_PER_MINUTE=100

Health Checks

Monitor service health:

curl http://localhost:80/health

Logs

View logs:

# Follow logs
journalctl -u hybrid_inference.service -f

# View recent logs
journalctl -u hybrid_inference.service -n 100

Monitoring

Prometheus Metrics

Metrics are exposed at /metrics:

curl http://localhost:80/metrics

Key metrics:

http_requests_total - Total HTTP requests
http_request_duration_seconds - Request latency
model_requests_total - Requests per model
model_errors_total - Errors per model

Grafana Dashboards

Import the dashboard from infrastructure/grafana/.

Database Setup

PostgreSQL

Create database:

CREATE DATABASE hybridinference;
CREATE USER hybridinference WITH PASSWORD 'your-password';
GRANT ALL PRIVILEGES ON DATABASE hybridinference TO hybridinference;

Configure connection:

Update .env with database credentials.

See the Database guide in this section: Database.

Troubleshooting

Service won’t start

Check logs:

journalctl -u hybrid_inference.service -n 50

Common issues:

Missing API keys
Database connection failed
Port already in use

High latency

Check:

Database performance
Provider API latency
Resource usage (CPU/memory)

Rate limiting

Adjust rate limits in configuration:

rate_limits:
  requests_per_minute: 100
  tokens_per_minute: 100000