# Deployment Guide Guide for deploying HybridInference in production. ## Production Deployment ### Using systemd The recommended way to deploy HybridInference is using systemd. 1. **Install dependencies:** ```bash cd hybridInference uv venv -p 3.10 source .venv/bin/activate uv sync ``` 2. **Create systemd unit file:** ```bash sudo cp infrastructure/systemd/hybrid_inference.service /etc/systemd/system/ ``` 3. **Configure environment:** Edit `/etc/systemd/system/hybrid_inference.service` and update: - `WorkingDirectory` - `User` - Environment variables 4. **Start the service:** ```bash sudo systemctl daemon-reload sudo systemctl enable hybrid_inference.service sudo systemctl start hybrid_inference.service ``` 5. **Check status:** ```bash sudo systemctl status hybrid_inference.service journalctl -u hybrid_inference.service -f ``` ### Environment Variables Required environment variables for production: ```bash # API Keys DEEPSEEK_API_KEY=your-key GEMINI_API_KEY=your-key LLAMA_API_KEY=your-key # Database DB_NAME=hybridinference DB_USER=postgres DB_PASSWORD=your-secure-password DB_HOST=localhost DB_PORT=5432 # Local vLLM (optional) LOCAL_BASE_URL=http://localhost:8000/v1 # Rate limiting (optional) RATE_LIMIT_PER_MINUTE=100 ``` ### Health Checks Monitor service health: ```bash curl http://localhost:80/health ``` ### Logs View logs: ```bash # Follow logs journalctl -u hybrid_inference.service -f # View recent logs journalctl -u hybrid_inference.service -n 100 ``` ## Monitoring ### Prometheus Metrics Metrics are exposed at `/metrics`: ```bash curl http://localhost:80/metrics ``` Key metrics: - `http_requests_total` - Total HTTP requests - `http_request_duration_seconds` - Request latency - `model_requests_total` - Requests per model - `model_errors_total` - Errors per model ### Grafana Dashboards Import the dashboard from `infrastructure/grafana/`. ## Database Setup ### PostgreSQL 1. **Create database:** ```sql CREATE DATABASE hybridinference; CREATE USER hybridinference WITH PASSWORD 'your-password'; GRANT ALL PRIVILEGES ON DATABASE hybridinference TO hybridinference; ``` 2. **Configure connection:** Update `.env` with database credentials. See the Database guide in this section: [Database](database.md). ## Troubleshooting ### Service won't start Check logs: ```bash journalctl -u hybrid_inference.service -n 50 ``` Common issues: - Missing API keys - Database connection failed - Port already in use ### High latency Check: - Database performance - Provider API latency - Resource usage (CPU/memory) ### Rate limiting Adjust rate limits in configuration: ```yaml rate_limits: requests_per_minute: 100 tokens_per_minute: 100000 ```