FreeInference Documentation

Free LLM inference for coding agents and IDEs

FreeInference provides free access to state-of-the-art language models specifically designed for coding agents like Cursor, Codex, Roo Code, and other AI-powered IDEs.

Quick Links

Quick Start - Get started in 5 minutes
IDE & Coding Agent Integrations - Configure with Cursor, Codex, and other coding agents
Available Models - View available models

Key Features

Free Access: Free inference for coding agents and development tools
Multiple Models: Access GLM, Qwen, MiniMax, Llama, and other powerful models
Codebase Indexing: Free embedding endpoint (BGE-M3) for semantic code search in Roo Code and Kilo Code
IDE Integration: Easy setup with Cursor, Codex, Roo Code, Kilo Code, and more

Getting Started

Get your API key - Register at https://freeinference.org and create your API key
Choose your IDE:
- Cursor - AI-powered code editor
- Codex - Terminal-based coding assistant
- Roo Code / Kilo Code - VS Code extensions
Configure and start coding!

See the Quick Start guide for detailed setup instructions.

Available Models

Model	Context Length	Best For
GLM-5 ^recommended	200K tokens	Most capable, bilingual
GLM-4.7	200K tokens	Long context, bilingual
GLM-4.7-Flash	200K tokens	Fast and cost-effective
MiniMax M2.5 ^new	1M tokens	Ultra-long context, multimodal
MiniMax M2	196K tokens	Large codebases
Qwen3 Coder 30B	32K tokens	Code generation
Llama 3.3 70B ^limited	131K tokens	General coding tasks
Llama 4 Maverick ^limited	128K tokens	Multimodal support
BGE-M3 (Embedding)	8K tokens	Codebase indexing

See the complete Available Models list for all available models.

Support

Need help? Check out:

IDE & Coding Agent Integrations - IDE setup guides
Available Models - Available models
GitHub Issues - Report bugs or request features