FreeInference Documentation

Free LLM inference for coding agents and IDEs

FreeInference provides free access to state-of-the-art language models specifically designed for coding agents like Cursor, Codex, Roo Code, and other AI-powered IDEs.

Key Features

Free Access

Free inference for coding agents and development tools

Multiple Models

Access GLM, Qwen, MiniMax, Llama, and other powerful models

Codebase Indexing

Free embedding endpoint (BGE-M3) for semantic code search in Roo Code and Kilo Code

IDE Integration

Easy setup with Cursor, Codex, Roo Code, Kilo Code, and more

Getting Started

  1. Get your API key - Register at https://freeinference.org and create your API key

  2. Choose your IDE:

  3. Configure and start coding!

See the Quick Start guide for detailed setup instructions.

Available Models

Model

Context Length

Best For

GLM-5 recommended

200K tokens

Most capable, bilingual

GLM-4.7

200K tokens

Long context, bilingual

GLM-4.7-Flash

200K tokens

Fast and cost-effective

MiniMax M2.5 new

1M tokens

Ultra-long context, multimodal

MiniMax M2

196K tokens

Large codebases

Qwen3 Coder 30B

32K tokens

Code generation

Llama 3.3 70B limited

131K tokens

General coding tasks

Llama 4 Maverick limited

128K tokens

Multimodal support

BGE-M3 (Embedding)

8K tokens

Codebase indexing

See the complete Available Models list for all available models.

Support

Need help? Check out: