FreeInference Deployment
FastAPI + systemd (current)
We serve OpenRouter-compatible traffic directly through a FastAPI application listening on port 80. Removing Nginx reduces operational overhead, keeps debugging straightforward, and lets systemd own the lifecycle of the gateway process.
Overview
┌─────────────┐ ┌─────────────────┐ ┌────────────────────┐
│ OpenRouter │─────▶│ FastAPI Gateway │─────▶│ Model Executors... │
└─────────────┘ └─────────────────┘ └────────────────────┘
FastAPI binds to
0.0.0.0:80and exposes/v1endpoints consumed by OpenRouter clients.The gateway handles request authentication, routing, and backpressure before invoking the selected model adapter.
systemdsupervises the process, ensuring automatic restarts after crashes or host reboots.
Deployment Steps
Install runtime dependencies
Ensure Python environment and model weights are ready. Confirm the FastAPI entry point (
serving.servers.bootstrap:app) is reachable viauvicornor the configured launcher script.Create the unit file
sudo tee /etc/systemd/system/freeinference.service <<'UNIT' [Unit] Description=FreeInference FastAPI service After=network-online.target Wants=network-online.target [Service] Type=simple User=ubuntu WorkingDirectory=/home/ubuntu/hybridInference ExecStart=/usr/bin/env uvicorn serving.servers.bootstrap:app --host 0.0.0.0 --port 80 Restart=always RestartSec=5 Environment=PYTHONUNBUFFERED=1 [Install] WantedBy=multi-user.target UNIT
Replace
User,WorkingDirectory, andEnvironmententries as needed for the target host. The repository carries a maintained version of this unit atinfrastructure/systemd/hybrid_inference.service; copy or symlink it into/etc/systemd/system/freeinference.serviceduring deploys.Reload and enable the service
sudo systemctl daemon-reload sudo systemctl enable freeinference.service sudo systemctl start freeinference.service sudo systemctl status freeinference.service
Runtime Operations
Restart on demand:
sudo systemctl restart freeinference.serviceFollow logs:
journalctl -u freeinference.service -fHealth check:
curl https://freeinference.org/healthList registered models:
curl https://freeinference.org/v1/models | jq
Why We Dropped Nginx
FastAPI already terminates HTTP and exposes the required OpenRouter-compatible endpoints.
Nginx added another moving part, increasing failover complexity and opaque error handling.
Debugging latency or request routing is simpler when traffic is handled in a single process.
Legacy Architectures
Nginx (v2, abandoned)
We briefly fronted FastAPI (running on port 8080) with vanilla Nginx that exposed http://freeinference.org on port 80 and terminated TLS for the public endpoint. Once Cloudflare took over edge SSL duties, the extra hop mostly added deployment and observability complexity without material benefit, so the setup was removed.
Nginx + Lua via OpenResty (v1, abandoned)
We previously relied on OpenResty (Nginx + Lua) to provide a production routing tier across multiple LLM backends. The stack handled model mapping, load balancing, health checks, and error handling. We keep the installation notes for posterity.
Overview
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Client │─────▶│ OpenResty │─────▶│ Backend 1 │
│ (API Call) │ │ (Router) │ │ (Qwen@8000) │
└─────────────┘ │ │ └─────────────────┘
│ - Model Mapping │
│ - Load Balancing│ ┌─────────────────┐
│ - Health Checks │─────▶│ Backend 2 │
│ - Error Handling│ │ (Llama@8001) │
└──────────────────┘ └─────────────────┘
Installation Notes
# Add repository
wget -O - https://openresty.org/package/pubkey.gpg | sudo apt-key add -
echo "deb http://openresty.org/package/ubuntu $(lsb_release -sc) main" | \
sudo tee /etc/apt/sources.list.d/openresty.list
# Install
sudo apt-get update
sudo apt-get install openresty
# Create directory
sudo mkdir -p /usr/local/openresty/nginx/conf/sites-available
sudo mkdir -p /usr/local/openresty/nginx/conf/sites-enabled
# Copy Config file
sudo cp <your config file> /usr/local/openresty/nginx/conf/sites-available/vllm
# Enable the site
sudo ln -s /usr/local/openresty/nginx/conf/sites-available/vllm \
/usr/local/openresty/nginx/conf/sites-enabled/vllm
http {
# ... Others ...
# Lua settings
lua_package_path "/usr/local/openresty/lualib/?.lua;;";
lua_shared_dict model_cache 10m;
# Include Site Configuration
include /usr/local/openresty/nginx/conf/sites-enabled/*;
}
# test openresty config
sudo openresty -t
# Start
sudo systemctl start openresty
# Enable auto-start
sudo systemctl enable openresty
# reload openresty
sudo openresty -s reload
# check service status
curl https://freeinference.org/health
# list all models
curl https://freeinference.org/v1/models | jq
# Chat with Qwen3-Coder
curl -X POST http://freeinference.org/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "/models/Qwen_Qwen3-Coder-480B-A35B-Instruct-FP8", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 50}'
# Chat with llama4-scout
curl -X POST http://freeinference.org/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "/models/meta-llama_Llama-4-Scout-17B-16E", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 50}'
Nginx (v0, abandoned)
sudo vim /etc/nginx/sites-available/vllm
sudo nginx -t
sudo systemctl reload nginx
# to test the endpoint
curl https://freeinference.org/v1/models