z-ai/glm-5

Online Chat

智谱

Publish time

2026/2/12

Model Series

GLM

Input type

Output type

Context Window

128,000

Max Output Length

2,048

Input Price

¥4 / 1M tokens

Output Price

¥18 / 1M tokens

GLM-5 是面向 Coding 与 Agent 场景的新一代大模型，在复杂系统工程与长程任务中达到开源 SOTA，真实编程体验逼近 Claude Opus 级别；基于 744B 新基座、异步强化学习与稀疏注意力，实现从“写代码”到“写工程”的全面升级。

Providers for z-ai/glm-5

Zhinao API routes requests to the best-fit provider and automatically fails over to the one with highest availability.

huaweicloud

国内

TTFT

5.14s

Throughput

18.90tps

Uptime

97.00%

Provider Model

huaweicloud/z-ai/glm-5

Supported Parameters

temperaturetop_ptop_k

Recent Uptime

5月14日 11 PM96.62%

Reasoning

Toggleable

Supported Response Formats

OpenAI Chat Completions

Request Log Collection

ZDR Supported

Distillable

Yes

Total Context

128,000

Max Output

2,048

Input Price

¥4 / 1M tokens

Output Price

¥18 / 1M tokens

Performance for z-ai/glm-5

Compare different providers across Zhinao API

Throughput

27.99 tok/s

TTFT

3.13 s

Uptime for z-ai/glm-5

Uptime for z-ai/glm-5 across all providers

Sample code and API for z-ai/glm-5

Get API Key

Zhinao API normalizes requests and responses across providers for you

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.360.cn/v1",
  apiKey: process.env.ZHINAO_API_KEY,
});

const response = await client.chat.completions.create({
  model: "z-ai/glm-5",
  messages: [
    { role: "user", content: "Hello, how are you?" }
  ],
  temperature: 0.7,
  max_tokens: 1000,
});

console.log(response.choices[0].message.content);

z-ai/glm-5

Online Chat

智谱

Publish time

2026/2/12

Model Series

GLM

Input type

Output type

Context Window

128,000

Max Output Length

2,048

Input Price

¥4 / 1M tokens

Output Price

¥18 / 1M tokens

Providers for z-ai/glm-5

Zhinao API routes requests to the best-fit provider and automatically fails over to the one with highest availability.

huaweicloud

国内

TTFT

5.14s

Throughput

18.90tps

Uptime

97.00%

Provider Model

huaweicloud/z-ai/glm-5

Supported Parameters

temperaturetop_ptop_k

Recent Uptime

5月14日 11 PM96.62%

Reasoning

Toggleable

Supported Response Formats

OpenAI Chat Completions

Request Log Collection

ZDR Supported

Distillable

Yes

Total Context

128,000

Max Output

2,048

Input Price

¥4 / 1M tokens

Output Price

¥18 / 1M tokens

Performance for z-ai/glm-5

Compare different providers across Zhinao API

Throughput

27.99 tok/s

TTFT

3.13 s

Uptime for z-ai/glm-5

Uptime for z-ai/glm-5 across all providers

Sample code and API for z-ai/glm-5

Get API Key

Zhinao API normalizes requests and responses across providers for you

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.360.cn/v1",
  apiKey: process.env.ZHINAO_API_KEY,
});

const response = await client.chat.completions.create({
  model: "z-ai/glm-5",
  messages: [
    { role: "user", content: "Hello, how are you?" }
  ],
  temperature: 0.7,
  max_tokens: 1000,
});

console.log(response.choices[0].message.content);