Skip to main content
POST
https://ai.kaiho.cc
/
v1
/
chat
/
completions
OpenAI Multimodal Responses API
curl --request POST \
  --url https://ai.kaiho.cc/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "response_format": {
    "type": "<string>"
  }
}
'

Overview

The OpenAI Multimodal Responses API allows models to generate combined responses containing text and images, suitable for scenarios requiring visual explanations.

Supported Models

GPT-4o

Most powerful multimodal model

GPT-4o-mini

Efficient multimodal model

Enable Multimodal Output

Enable multimodal responses by setting the response_format parameter:
response_format
object
Response format configuration.

Request Example

import openai

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://ai.kaiho.cc/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Draw a flowchart showing the user registration process"
        }
    ],
    response_format={"type": "multimodal"}
)

# Handle multimodal response
for item in response.choices[0].message.content:
    if item.type == "text":
        print(f"Text: {item.text}")
    elif item.type == "image_url":
        print(f"Image: {item.image_url.url}")
Billing: Multimodal responses are billed separately for generated text tokens and number of images.