# API Reference Complete reference for the HybridInference API. ## Base URL ``` https://freeinference.org/v1 ``` ## Authentication All API requests require authentication using an API key in the `Authorization` header: ``` Authorization: Bearer YOUR_API_KEY ``` ## Endpoints ### Chat Completions Create a chat completion using a specified model. **Endpoint:** `POST /v1/chat/completions` **Headers:** ``` Content-Type: application/json Authorization: Bearer YOUR_API_KEY ``` **Request Body:** | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `model` | string | Yes | Model ID to use (e.g., `llama-3.3-70b-instruct`) | | `messages` | array | Yes | Array of message objects | | `temperature` | number | No | Sampling temperature (0-2). Default: 1 | | `max_tokens` | integer | No | Maximum tokens to generate | | `top_p` | number | No | Nucleus sampling parameter (0-1) | | `stream` | boolean | No | Whether to stream responses. Default: false | | `stop` | string or array | No | Stop sequences | | `response_format` | object | No | Format of response (e.g., `{"type": "json_object"}`) | | `tools` | array | No | Function calling tools | | `tool_choice` | string or object | No | Tool choice strategy | **Message Object:** | Field | Type | Description | |-------|------|-------------| | `role` | string | Role: `system`, `user`, or `assistant` | | `content` | string or array | Message content (text or multimodal) | **Example Request:** ```json { "model": "llama-3.3-70b-instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], "temperature": 0.7, "max_tokens": 1000 } ``` **Response:** ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "llama-3.3-70b-instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 8, "total_tokens": 28 } } ``` **Streaming Response:** When `stream: true`, responses are sent as Server-Sent Events (SSE). With curl, use `-N` (no-buffer) to see tokens as they arrive: ``` data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"llama-3.3-70b-instruct","choices":[{"index":0,"delta":{"role":"assistant","content":"The"},"finish_reason":null}]} data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"llama-3.3-70b-instruct","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]} data: [DONE] ``` --- ### List Models Get a list of available models. **Endpoint:** `GET /v1/models` **Headers:** ``` Authorization: Bearer YOUR_API_KEY ``` **Response:** ```json { "object": "list", "data": [ { "id": "llama-3.3-70b-instruct", "object": "model", "created": 1234567890, "owned_by": "system", "context_length": 131072, "architecture": { "modality": "text", "tokenizer": "llama3", "instruct_type": "llama3" }, "pricing": { "prompt": "0", "completion": "0", "request": "0" } } ] } ``` --- ### Health Check Check API health status. **Endpoint:** `GET /health` **Response:** ```json { "status": "ok", "timestamp": "2025-10-26T08:00:00Z" } ``` --- ## Parameters Reference ### Temperature Controls randomness in responses. - **Range:** 0.0 - 2.0 - **Default:** 1.0 - **Lower values:** More focused and deterministic - **Higher values:** More creative and diverse **Examples:** - `0.0` - Deterministic (good for factual tasks) - `0.7` - Balanced (general use) - `1.5` - Creative (storytelling, brainstorming) ### Max Tokens Maximum number of tokens to generate. - **Default:** Model-specific - **Note:** Input + output tokens cannot exceed model's context length ### Top P (Nucleus Sampling) Alternative to temperature for controlling diversity. - **Range:** 0.0 - 1.0 - **Default:** 1.0 - **Lower values:** More focused - **Higher values:** More diverse **Note:** It's recommended to use either `temperature` or `top_p`, not both. ### Stop Sequences Sequences where the model will stop generating. **Examples:** ```json { "stop": "\n" // Single stop sequence } ``` ```json { "stop": ["\n", "###", "END"] // Multiple stop sequences } ``` --- ## Response Formats ### Standard Text Response Default response format. ```json { "response_format": {"type": "text"} } ``` ### JSON Mode Forces the model to respond with valid JSON. ```json { "response_format": {"type": "json_object"} } ``` **Example:** ```json { "model": "llama-3.3-70b-instruct", "messages": [ { "role": "user", "content": "Extract person info: John is a 30-year-old engineer. Return as JSON." } ], "response_format": {"type": "json_object"} } ``` **Response:** ```json { "name": "John", "age": 30, "occupation": "engineer" } ``` --- ## Function Calling Enable the model to call functions you define. ### Tool Definition ```json { "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], "tool_choice": "auto" } ``` ### Tool Choice Options - `"auto"` - Model decides whether to call a function - `"none"` - Model will not call any function - `{"type": "function", "function": {"name": "function_name"}}` - Force specific function ### Response with Function Call ```json { "choices": [ { "message": { "role": "assistant", "content": null, "tool_calls": [ { "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"Paris\", \"unit\": \"celsius\"}" } } ] } } ] } ``` --- ## Vision (Multimodal) Send images along with text (requires vision-capable models like `llama-4-maverick`). ### Image URL ```json { "model": "llama-4-maverick", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/image.jpg" } } ] } ] } ``` ### Base64 Image ```json { "type": "image_url", "image_url": { "url": "..." } } ``` --- ## Error Codes | Code | Description | |------|-------------| | 400 | Bad Request - Invalid parameters | | 401 | Unauthorized - Invalid or missing API key | | 404 | Not Found - Model or endpoint not found | | 429 | Too Many Requests - Rate limit exceeded | | 500 | Internal Server Error - Server error | | 503 | Service Unavailable - Server overloaded | ### Error Response Format ```json { "error": { "message": "Invalid API key provided", "type": "invalid_request_error", "code": "invalid_api_key" } } ``` --- ## Rate Limits Current rate limits (subject to change): - **Requests per minute:** Based on your API key tier - **Tokens per minute:** Based on your API key tier Rate limit headers are included in responses: ``` X-RateLimit-Limit-Requests: 100 X-RateLimit-Remaining-Requests: 99 X-RateLimit-Reset-Requests: 2025-10-26T08:01:00Z ``` --- ## OpenRouter Compatibility This API is fully compatible with [OpenRouter](https://openrouter.ai/) clients and libraries. Simply change the base URL to: ```python client = openai.OpenAI( base_url="https://freeinference.org/v1", api_key="your-api-key-here" ) ``` All OpenRouter-compatible parameters and features are supported.