Available Models
FreeInference provides access to multiple state-of-the-art LLM models for coding agents and IDEs.
Model Overview
Model ID |
Name |
Context Length |
Max Output |
Features |
|---|---|---|---|---|
|
GLM-5.1 |
200K tokens |
128K tokens |
Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
|
GLM-4.7 |
200K tokens |
128K tokens |
Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
|
GLM-5 Turbo |
200K tokens |
128K tokens |
Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
|
Qwen3.6 35B |
262K tokens |
8K tokens |
Function calling, Structured output |
|
MiniMax M2.7 |
205K tokens |
131K tokens |
Function calling, Structured output |
|
MiniMax M2.5 |
205K tokens |
131K tokens |
Function calling, Structured output, Thinking mode, Multimodal (text+image) |
Model Details
GLM-5.1
Model ID: glm-5.1
Aliases: freeinference-glm-5.1
Context length: 200,000 tokens
Max output: 128,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Tool streaming: Yes
GLM-4.7
Model ID: glm-4.7
Aliases: freeinference-glm-4.7
Context length: 200,000 tokens
Max output: 128,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Tool streaming: Yes
GLM-5 Turbo
Model ID: glm-5-turbo
Aliases: freeinference-glm-5-turbo
Context length: 200,000 tokens
Max output: 128,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Tool streaming: Yes
Qwen3.6 35B
Model ID: qwen3.6-35b
Context length: 262,144 tokens
Max output: 8,192 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
MiniMax M2.7
Model ID: minimax-m2.7
Context length: 204,800 tokens
Max output: 131,072 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
MiniMax M2.5
Model ID: minimax-m2.5
Context length: 204,800 tokens
Max output: 131,072 tokens
Quantization: bf16
Input modalities: text, image
Output modalities: text
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Switching Models
To use different models, change the model name in your IDE configuration:
Cursor: Select from the dropdown in settings
Kilo Code: Select from the dropdown in extension settings. A good default is glm-5.1; switch to glm-5-turbo for faster iteration or minimax-m2.5 for long-context and image-aware workflows.
Roo Code: Select from the dropdown in extension settings