Skip to content

Feature: Add Z.AI (Zhipu AI) provider support #68

@Societus

Description

@Societus

Feature: Add Z.AI (Zhipu AI) provider support

Summary

Add first-class support for Z.AI (Zhipu AI) as an LLM provider. Z.AI offers OpenAI-compatible APIs with two plan types and several capable models including the GLM-5 family.

Motivation

Z.AI provides competitive models (GLM-5-turbo, GLM-5.1, GLM-4.7) at accessible pricing through both subscription and pay-as-you-go plans. Their API is OpenAI-compatible, making integration straightforward. However, the GLM-5 family models are reasoning models with thinking enabled by default, which causes a significant problem for repowise: reasoning tokens consume the entire output budget, leaving no tokens for the actual response content. This means Z.AI support requires handling the thinking toggle as a first-class concern, not just wiring up another OpenAI-compatible endpoint.

API Details

Endpoints

Z.AI has two API plans with different base URLs:

Plan Base URL Billing
Coding https://api.z.ai/api/coding/paas/v4 Subscription (resource package)
General https://api.z.ai/api/paas/v4 Pay-as-you-go

Both endpoints use identical request/response formats. The difference is purely billing -- which models are available and how you're charged.

Note: Based on how other tools handle this (e.g., separate provider entries for different OpenRouter plans or Anthropic AWS vs direct), the maintainers may prefer to split these into two separate provider entries (e.g., zai-coding and zai) rather than a single configurable provider. I'll leave that decision to you.

Available Models

Tested and confirmed working (coding plan):

  • glm-5-turbo
  • glm-5.1
  • glm-5
  • glm-4.7

Authentication

Bearer token via Authorization header. The API is OpenAI-compatible and works with the openai Python SDK's AsyncOpenAI client.

Reasoning Model Handling (Critical)

The GLM-5 family has reasoning/thinking enabled by default. This means every token in the output budget goes to chain-of-thought reasoning first, and content tokens only appear after reasoning completes. For repowise's structured generation prompts, this wastes 85-95% of the token budget on invisible thinking.

The fix: Z.AI supports disabling thinking via the thinking parameter in the request body:

{
  "model": "glm-5.1",
  "messages": [...],
  "thinking": { "type": "disabled" }
}

This produces clean, direct output with zero reasoning token overhead:

$ curl https://api.z.ai/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -d '{"model":"glm-5.1","messages":[{"role":"user","content":"Explain DHT in one sentence."}],"max_tokens":200,"thinking":{"type":"disabled"}}'

# Response: 35 completion tokens, 0 reasoning tokens
# "A Distributed Hash Table (DHT) is a decentralized storage system..."

Without thinking.type=disabled, the same request produces 469 reasoning tokens and only 37 content tokens -- a 92% overhead.

Note: Other Z.AI model families (GLM-4, self-hosted via vLLM/SGLang) use a different parameter: chat_template_kwargs: { enable_thinking: false }. The provider should default to the thinking parameter style since that's what the Z.AI hosted API expects, but could potentially support both.

Proposed Implementation

Architecture

class ZAIProvider(BaseProvider):
    """Z.AI (Zhipu AI) chat provider.

    Uses the OpenAI-compatible API with thinking disabled by default
    for efficient structured output generation.
    """

    _CODING_BASE_URL = "https://api.z.ai/api/coding/paas/v4"
    _GENERAL_BASE_URL = "https://api.z.ai/api/paas/v4"
    _DEFAULT_MODEL = "glm-5.1"

Environment Variables

Variable Purpose Default
ZAI_API_KEY API key for authentication (required)
ZAI_BASE_URL Override API base URL https://api.z.ai/api/coding/paas/v4
ZAI_THINKING Enable/disable thinking disabled

Files to Create/Modify

  • repowise/core/providers/llm/zai.py -- New provider class
  • repowise/core/providers/llm/registry.py -- Register "zai" provider
  • repowise/core/providers/llm/__init__.py -- Update docstring
  • repowise/core/rate_limiter.py -- Add PROVIDER_DEFAULTS["zai"]
  • repowise/cli/helpers.py -- Add ZAI_API_KEY auto-detection
  • repowise/server/provider_config.py -- Add to server provider catalog (if applicable)

Key Design Decisions

  1. Thinking disabled by default. The primary use case for repowise is structured generation, where reasoning tokens are pure waste. The provider should default to thinking: { type: "disabled" } on every request.

  2. Configurable thinking mode. Allow users to enable thinking via ZAI_THINKING=enabled if they want to use a reasoning model's full capability (e.g., for complex architectural decision extraction).

  3. Uses openai package dependency. Same as the ollama provider -- no new dependencies needed since Z.AI's API is OpenAI-compatible.

  4. extra_body for non-standard params. The thinking parameter is passed via the openai SDK's extra_body kwarg, which injects it as a top-level field in the JSON request body.

Test Results

Verified working with a local patch using the openai provider + extra_body:

Provider: openai / glm-5.1 (via Z.AI Coding API)
Phase 1: Ingestion -- 278 files ingested (works, no LLM needed)
Phase 2: Decision extraction -- tested inline marker structuring
Phase 3: Generation -- symbol_spotlight pages generated successfully

Compared to a reasoning model (qwopus3.5-27b-v3 via vLLM) on the same codebase:

  • Reasoning model: 0/24 file_pages succeeded (all timed out, 100% failure)
  • GLM-5.1 with thinking disabled: direct output, zero reasoning overhead

API Documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions