Skip to main content
POST
https://ai.kaiho.cc
/
v1
/
chat
/
completions
General Chat API
curl --request POST \
  --url https://ai.kaiho.cc/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {
      "role": "<string>",
      "content": "<string>"
    }
  ],
  "temperature": 123,
  "max_tokens": 123,
  "stream": true
}
'

Overview

The General Chat API provides a unified interface to access multiple large language models, fully compatible with the OpenAI Chat Completions API format.

Supported Models

GPT Series

  • gpt-4o
  • gpt-4o-mini
  • gpt-4-turbo
  • gpt-3.5-turbo

Claude Series

  • claude-3-5-sonnet
  • claude-3-opus
  • claude-3-sonnet
  • claude-3-haiku

Gemini Series

  • gemini-2.0-flash
  • gemini-1.5-pro
  • gemini-1.5-flash

Other Models

  • deepseek-chat
  • qwen-max
  • glm-4

Request Parameters

model
string
required
Model ID to use. Supports any model from the list above.
messages
array
required
Array of message conversation history.
temperature
number
default:1
Sampling temperature, range 0-2. Higher values (e.g. 0.8) make output more random, lower values (e.g. 0.2) make it more deterministic.
max_tokens
integer
Maximum number of tokens to generate.
stream
boolean
default:false
Whether to enable streaming output.

Request Example

curl https://ai.kaiho.cc/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user",
        "content": "What is machine learning?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

Response Format

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Machine learning is a branch of artificial intelligence..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 150,
    "total_tokens": 170
  }
}

Streaming Output

Enable stream: true for real-time responses:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Best Practice: Use temperature to control output creativity. Use lower values (0.2-0.5) for factual content, higher values (0.7-1.0) for creative content.