Feature: Add Z.AI (Zhipu AI) provider support

# Feature: Add Z.AI (Zhipu AI) provider support

## Summary

Add first-class support for [Z.AI](https://z.ai) (Zhipu AI) as an LLM provider. Z.AI offers OpenAI-compatible APIs with two plan types and several capable models including the GLM-5 family.

## Motivation

Z.AI provides competitive models (GLM-5-turbo, GLM-5.1, GLM-4.7) at accessible pricing through both subscription and pay-as-you-go plans. Their API is OpenAI-compatible, making integration straightforward. However, the GLM-5 family models are **reasoning models with thinking enabled by default**, which causes a significant problem for repowise: reasoning tokens consume the entire output budget, leaving no tokens for the actual response content. This means Z.AI support requires handling the thinking toggle as a first-class concern, not just wiring up another OpenAI-compatible endpoint.

## API Details

### Endpoints

Z.AI has two API plans with different base URLs:

| Plan | Base URL | Billing |
|------|----------|---------|
| Coding | `https://api.z.ai/api/coding/paas/v4` | Subscription (resource package) |
| General | `https://api.z.ai/api/paas/v4` | Pay-as-you-go |

Both endpoints use identical request/response formats. The difference is purely billing -- which models are available and how you're charged.

> **Note:** Based on how other tools handle this (e.g., separate provider entries for different OpenRouter plans or Anthropic AWS vs direct), the maintainers may prefer to split these into two separate provider entries (e.g., `zai-coding` and `zai`) rather than a single configurable provider. I'll leave that decision to you.

### Available Models

Tested and confirmed working (coding plan):
- `glm-5-turbo`
- `glm-5.1`
- `glm-5`
- `glm-4.7`

### Authentication

Bearer token via `Authorization` header. The API is OpenAI-compatible and works with the `openai` Python SDK's `AsyncOpenAI` client.

### Reasoning Model Handling (Critical)

The GLM-5 family has reasoning/thinking **enabled by default**. This means every token in the output budget goes to chain-of-thought reasoning first, and content tokens only appear after reasoning completes. For repowise's structured generation prompts, this wastes 85-95% of the token budget on invisible thinking.

**The fix:** Z.AI supports disabling thinking via the `thinking` parameter in the request body:

```json
{
  "model": "glm-5.1",
  "messages": [...],
  "thinking": { "type": "disabled" }
}
```

This produces clean, direct output with zero reasoning token overhead:

```
$ curl https://api.z.ai/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -d '{"model":"glm-5.1","messages":[{"role":"user","content":"Explain DHT in one sentence."}],"max_tokens":200,"thinking":{"type":"disabled"}}'

# Response: 35 completion tokens, 0 reasoning tokens
# "A Distributed Hash Table (DHT) is a decentralized storage system..."
```

Without `thinking.type=disabled`, the same request produces 469 reasoning tokens and only 37 content tokens -- a 92% overhead.

> **Note:** Other Z.AI model families (GLM-4, self-hosted via vLLM/SGLang) use a different parameter: `chat_template_kwargs: { enable_thinking: false }`. The provider should default to the `thinking` parameter style since that's what the Z.AI hosted API expects, but could potentially support both.

## Proposed Implementation

### Architecture

```python
class ZAIProvider(BaseProvider):
    """Z.AI (Zhipu AI) chat provider.

    Uses the OpenAI-compatible API with thinking disabled by default
    for efficient structured output generation.
    """

    _CODING_BASE_URL = "https://api.z.ai/api/coding/paas/v4"
    _GENERAL_BASE_URL = "https://api.z.ai/api/paas/v4"
    _DEFAULT_MODEL = "glm-5.1"
```

### Environment Variables

| Variable | Purpose | Default |
|----------|---------|---------|
| `ZAI_API_KEY` | API key for authentication | (required) |
| `ZAI_BASE_URL` | Override API base URL | `https://api.z.ai/api/coding/paas/v4` |
| `ZAI_THINKING` | Enable/disable thinking | `disabled` |

### Files to Create/Modify

- `repowise/core/providers/llm/zai.py` -- New provider class
- `repowise/core/providers/llm/registry.py` -- Register `"zai"` provider
- `repowise/core/providers/llm/__init__.py` -- Update docstring
- `repowise/core/rate_limiter.py` -- Add `PROVIDER_DEFAULTS["zai"]`
- `repowise/cli/helpers.py` -- Add `ZAI_API_KEY` auto-detection
- `repowise/server/provider_config.py` -- Add to server provider catalog (if applicable)

### Key Design Decisions

1. **Thinking disabled by default.** The primary use case for repowise is structured generation, where reasoning tokens are pure waste. The provider should default to `thinking: { type: "disabled" }` on every request.

2. **Configurable thinking mode.** Allow users to enable thinking via `ZAI_THINKING=enabled` if they want to use a reasoning model's full capability (e.g., for complex architectural decision extraction).

3. **Uses `openai` package dependency.** Same as the `ollama` provider -- no new dependencies needed since Z.AI's API is OpenAI-compatible.

4. **`extra_body` for non-standard params.** The `thinking` parameter is passed via the openai SDK's `extra_body` kwarg, which injects it as a top-level field in the JSON request body.

## Test Results

Verified working with a local patch using the openai provider + `extra_body`:

```
Provider: openai / glm-5.1 (via Z.AI Coding API)
Phase 1: Ingestion -- 278 files ingested (works, no LLM needed)
Phase 2: Decision extraction -- tested inline marker structuring
Phase 3: Generation -- symbol_spotlight pages generated successfully
```

Compared to a reasoning model (qwopus3.5-27b-v3 via vLLM) on the same codebase:
- Reasoning model: 0/24 file_pages succeeded (all timed out, 100% failure)
- GLM-5.1 with thinking disabled: direct output, zero reasoning overhead

## API Documentation

- Z.AI API docs: https://open.bigmodel.cn/dev/api
- OpenAI-compatible endpoint reference: https://open.bigmodel.cn/dev/api/normal-model/glm-5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add Z.AI (Zhipu AI) provider support #68

Feature: Add Z.AI (Zhipu AI) provider support

Summary

Motivation

API Details

Endpoints

Available Models

Authentication

Reasoning Model Handling (Critical)

Proposed Implementation

Architecture

Environment Variables

Files to Create/Modify

Key Design Decisions

Test Results

API Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plan	Base URL	Billing
Coding	`https://api.z.ai/api/coding/paas/v4`	Subscription (resource package)
General	`https://api.z.ai/api/paas/v4`	Pay-as-you-go

Variable	Purpose	Default
`ZAI_API_KEY`	API key for authentication	(required)
`ZAI_BASE_URL`	Override API base URL	`https://api.z.ai/api/coding/paas/v4`
`ZAI_THINKING`	Enable/disable thinking	`disabled`

Feature: Add Z.AI (Zhipu AI) provider support #68

Description

Feature: Add Z.AI (Zhipu AI) provider support

Summary

Motivation

API Details

Endpoints

Available Models

Authentication

Reasoning Model Handling (Critical)

Proposed Implementation

Architecture

Environment Variables

Files to Create/Modify

Key Design Decisions

Test Results

API Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions