Available Models

FreeInference provides access to multiple state-of-the-art LLM models for coding agents and IDEs.

Model Overview

Model ID

Name

Context Length

Max Output

Features

glm-5

GLM-5

200K tokens

128K tokens

Function calling, Structured output, Bilingual (Chinese/English), Thinking mode

glm-4.7

GLM-4.7

200K tokens

128K tokens

Function calling, Structured output, Bilingual (Chinese/English), Thinking mode

glm-4.7-flash

GLM-4.7-Flash

200K tokens

128K tokens

Function calling, Structured output, Bilingual (Chinese/English), Thinking mode

minimax-m2.5

MiniMax M2.5

1M tokens

128K tokens

Function calling, Structured output, Thinking mode, Multimodal (text+image)

minimax-m2

MiniMax M2

196K tokens

8K tokens

Function calling, Structured output

qwen3-coder-30b

Qwen3 Coder 30B

32K tokens

8K tokens

Function calling, Structured output

llama-3.3-70b-instruct

Llama 3.3 70B Instruct

131K tokens

8K tokens

Function calling, Structured output

llama-4-scout

Llama 4 Scout

128K tokens

16K tokens

Function calling, Structured output

llama-4-maverick

Llama 4 Maverick

128K tokens

16K tokens

Function calling, Structured output, Multimodal (text+image)

Note: Llama models are available with limited capacity. Availability may vary during peak usage.

Embedding Models

Model ID

Name

Dimensions

Context Length

Use Case

bge-m3

BGE-M3

1024

8K tokens

Codebase indexing, semantic search


Model Details

GLM-5

Model ID: glm-5

  • Context length: 200,000 tokens

  • Max output: 128,000 tokens

  • Architecture: 745B MoE (44B active parameters)

  • Quantization: fp8

  • Input modalities: text

  • Output modalities: text

  • Language support: Chinese, English

  • Function calling: Yes

  • Structured output: Yes

  • Thinking mode: Yes

  • Tool streaming: Yes


GLM-4.7

Model ID: glm-4.7

  • Context length: 200,000 tokens

  • Max output: 128,000 tokens

  • Quantization: fp8

  • Input modalities: text

  • Output modalities: text

  • Language support: Chinese, English

  • Function calling: Yes

  • Structured output: Yes

  • Thinking mode: Yes

  • Tool streaming: Yes


GLM-4.7-Flash

Model ID: glm-4.7-flash

  • Context length: 200,000 tokens

  • Max output: 128,000 tokens

  • Quantization: fp8

  • Input modalities: text

  • Output modalities: text

  • Language support: Chinese, English

  • Function calling: Yes

  • Structured output: Yes

  • Thinking mode: Yes

  • Tool streaming: Yes


MiniMax M2.5

Model ID: minimax-m2.5

  • Context length: 1,000,000 tokens

  • Max output: 131,072 tokens

  • Architecture: 230B MoE (10B active parameters)

  • Quantization: bf16

  • Input modalities: text, image

  • Output modalities: text

  • Function calling: Yes

  • Structured output: Yes

  • Thinking mode: Yes


MiniMax M2

Model ID: minimax-m2

  • Context length: 196,608 tokens

  • Max output: 8,192 tokens

  • Quantization: bf16

  • Input modalities: text

  • Output modalities: text

  • Function calling: Yes

  • Structured output: Yes


Qwen3 Coder 30B

Model ID: qwen3-coder-30b

  • Context length: 32,768 tokens

  • Max output: 8,192 tokens

  • Quantization: bf16

  • Input modalities: text

  • Output modalities: text

  • Function calling: Yes

  • Structured output: Yes


Llama 3.3 70B Instruct (Limited Capacity)

Model ID: llama-3.3-70b-instruct

  • Context length: 131,072 tokens

  • Max output: 8,192 tokens

  • Quantization: bf16

  • Input modalities: text

  • Output modalities: text

  • Function calling: Yes

  • Structured output: Yes


Llama 4 Scout (Limited Capacity)

Model ID: llama-4-scout

  • Context length: 128,000 tokens

  • Max output: 16,384 tokens

  • Quantization: fp8

  • Input modalities: text

  • Output modalities: text

  • Function calling: Yes

  • Structured output: Yes


Llama 4 Maverick (Limited Capacity)

Model ID: llama-4-maverick

  • Context length: 128,000 tokens

  • Max output: 16,384 tokens

  • Quantization: fp8

  • Input modalities: text, image

  • Output modalities: text

  • Function calling: Yes

  • Structured output: Yes


BGE-M3 (Embedding)

Model ID: bge-m3

  • Type: Embedding

  • Dimensions: 1024

  • Context length: 8,192 tokens

  • Quantization: fp16

  • Input modalities: text

  • Output modalities: embedding

  • Multilingual: Yes (100+ languages)

Use this model for codebase indexing in Roo Code, Kilo Code, and other tools that support semantic code search. See the integration guide for setup instructions.


Switching Models

To use different models, change the model name in your IDE configuration:

Cursor: Select from the dropdown in settings

Codex: Edit ~/.codex/config.toml:

model = "glm-5"  # Change to any model ID

Roo Code / Kilo Code: Select from the dropdown in extension settings