Deployment Guide
Guide for deploying HybridInference in production.
Production Deployment
Using systemd
The recommended way to deploy HybridInference is using systemd.
Install dependencies:
cd hybridInference uv venv -p 3.10 source .venv/bin/activate uv sync
Create systemd unit file:
sudo cp infrastructure/systemd/hybrid_inference.service /etc/systemd/system/
Configure environment:
Edit
/etc/systemd/system/hybrid_inference.serviceand update:WorkingDirectoryUserEnvironment variables
Start the service:
sudo systemctl daemon-reload sudo systemctl enable hybrid_inference.service sudo systemctl start hybrid_inference.service
Check status:
sudo systemctl status hybrid_inference.service journalctl -u hybrid_inference.service -f
Environment Variables
Required environment variables for production:
# API Keys
DEEPSEEK_API_KEY=your-key
GEMINI_API_KEY=your-key
LLAMA_API_KEY=your-key
# Database
DB_NAME=hybridinference
DB_USER=postgres
DB_PASSWORD=your-secure-password
DB_HOST=localhost
DB_PORT=5432
# Local vLLM (optional)
LOCAL_BASE_URL=http://localhost:8000/v1
# Rate limiting (optional)
RATE_LIMIT_PER_MINUTE=100
Health Checks
Monitor service health:
curl http://localhost:80/health
Logs
View logs:
# Follow logs
journalctl -u hybrid_inference.service -f
# View recent logs
journalctl -u hybrid_inference.service -n 100
Monitoring
Prometheus Metrics
Metrics are exposed at /metrics:
curl http://localhost:80/metrics
Key metrics:
http_requests_total- Total HTTP requestshttp_request_duration_seconds- Request latencymodel_requests_total- Requests per modelmodel_errors_total- Errors per model
Grafana Dashboards
Import the dashboard from infrastructure/grafana/.
Database Setup
PostgreSQL
Create database:
CREATE DATABASE hybridinference; CREATE USER hybridinference WITH PASSWORD 'your-password'; GRANT ALL PRIVILEGES ON DATABASE hybridinference TO hybridinference;
Configure connection:
Update
.envwith database credentials.
See the Database guide in this section: Database.
Troubleshooting
Service won’t start
Check logs:
journalctl -u hybrid_inference.service -n 50
Common issues:
Missing API keys
Database connection failed
Port already in use
High latency
Check:
Database performance
Provider API latency
Resource usage (CPU/memory)
Rate limiting
Adjust rate limits in configuration:
rate_limits:
requests_per_minute: 100
tokens_per_minute: 100000