Available Models
FreeInference provides access to multiple state-of-the-art LLM models for coding agents and IDEs.
Model Overview
Model ID |
Name |
Context Length |
Max Output |
Features |
|---|---|---|---|---|
|
GLM-5 |
200K tokens |
128K tokens |
Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
|
GLM-4.7 |
200K tokens |
128K tokens |
Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
|
GLM-4.7-Flash |
200K tokens |
128K tokens |
Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
|
MiniMax M2.5 |
1M tokens |
128K tokens |
Function calling, Structured output, Thinking mode, Multimodal (text+image) |
|
MiniMax M2 |
196K tokens |
8K tokens |
Function calling, Structured output |
|
Qwen3 Coder 30B |
32K tokens |
8K tokens |
Function calling, Structured output |
|
Llama 3.3 70B Instruct |
131K tokens |
8K tokens |
Function calling, Structured output |
|
Llama 4 Scout |
128K tokens |
16K tokens |
Function calling, Structured output |
|
Llama 4 Maverick |
128K tokens |
16K tokens |
Function calling, Structured output, Multimodal (text+image) |
Note: Llama models are available with limited capacity. Availability may vary during peak usage.
Embedding Models
Model ID |
Name |
Dimensions |
Context Length |
Use Case |
|---|---|---|---|---|
|
BGE-M3 |
1024 |
8K tokens |
Codebase indexing, semantic search |
Model Details
GLM-5
Model ID: glm-5
Context length: 200,000 tokens
Max output: 128,000 tokens
Architecture: 745B MoE (44B active parameters)
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Tool streaming: Yes
GLM-4.7
Model ID: glm-4.7
Context length: 200,000 tokens
Max output: 128,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Tool streaming: Yes
GLM-4.7-Flash
Model ID: glm-4.7-flash
Context length: 200,000 tokens
Max output: 128,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Tool streaming: Yes
MiniMax M2.5
Model ID: minimax-m2.5
Context length: 1,000,000 tokens
Max output: 131,072 tokens
Architecture: 230B MoE (10B active parameters)
Quantization: bf16
Input modalities: text, image
Output modalities: text
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
MiniMax M2
Model ID: minimax-m2
Context length: 196,608 tokens
Max output: 8,192 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Qwen3 Coder 30B
Model ID: qwen3-coder-30b
Context length: 32,768 tokens
Max output: 8,192 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Llama 3.3 70B Instruct (Limited Capacity)
Model ID: llama-3.3-70b-instruct
Context length: 131,072 tokens
Max output: 8,192 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Llama 4 Scout (Limited Capacity)
Model ID: llama-4-scout
Context length: 128,000 tokens
Max output: 16,384 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Llama 4 Maverick (Limited Capacity)
Model ID: llama-4-maverick
Context length: 128,000 tokens
Max output: 16,384 tokens
Quantization: fp8
Input modalities: text, image
Output modalities: text
Function calling: Yes
Structured output: Yes
BGE-M3 (Embedding)
Model ID: bge-m3
Type: Embedding
Dimensions: 1024
Context length: 8,192 tokens
Quantization: fp16
Input modalities: text
Output modalities: embedding
Multilingual: Yes (100+ languages)
Use this model for codebase indexing in Roo Code, Kilo Code, and other tools that support semantic code search. See the integration guide for setup instructions.
Switching Models
To use different models, change the model name in your IDE configuration:
Cursor: Select from the dropdown in settings
Codex: Edit ~/.codex/config.toml:
model = "glm-5" # Change to any model ID
Roo Code / Kilo Code: Select from the dropdown in extension settings