HybridInference API Documentation
OpenRouter-compatible API for accessing multiple LLM models
Get started with HybridInference in minutes. Our API provides seamless access to state-of-the-art language models including Llama 3.3, Llama 4, Gemini, GPT-5, and Claude.
Quick Links
Quick Start - Get started in 5 minutes
Available Models - View available models
Code Examples - Code examples in Python, JavaScript, and more
API Reference - Complete API reference
Key Features
- Fast & Reliable
Low-latency inference with automatic failover
- OpenRouter Compatible
Drop-in replacement for OpenRouter API
- Multiple Models
Access Llama, Gemini, GPT, and Claude models
- Free Tier Available
Get started with our free tier
- Production Ready
Built for scale with monitoring and observability
Getting Started
Get your API key (contact the team)
Install the OpenAI SDK:
pip install openai
Make your first request:
import openai client = openai.OpenAI( base_url="https://freeinference.org/v1", api_key="your-api-key-here" ) response = client.chat.completions.create( model="llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content)
See the Quick Start guide for more details.
Available Models
Model |
Context Length |
Pricing |
|---|---|---|
Llama 3.3 70B Instruct |
131K tokens |
Free |
Llama 4 Maverick |
128K tokens |
Free |
Gemini 2.5 Flash |
1M tokens |
Free |
GPT-5 |
128K tokens |
Free |
See the complete Available Models list for all available models.
Support
Need help? Check out:
Code Examples - Code examples
API Reference - API documentation
GitHub Issues - Report bugs or request features