Skip to main content
POST
https://ai.kaiho.cc
/
v1
/
audio
/
transcriptions
Whisper-1 Audio Transcription
curl --request POST \
  --url https://ai.kaiho.cc/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "response_format": "<string>"
}
'

Overview

Whisper-1 is OpenAI’s speech recognition model that accurately transcribes audio to text, supporting multiple languages and audio formats.

Supported Features

Multi-language Recognition

Supports automatic recognition of 99+ languages

Timestamps

Optional word-level or segment-level timestamps

Speaker Separation

Distinguish different speakers (beta feature)

Format Conversion

Supports multiple output formats

Request Parameters

file
file
required
Audio file to transcribe.Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webmMaximum file size: 25 MB
model
string
required
Model ID, use whisper-1.
language
string
Language code (ISO-639-1) of the audio, e.g. en or zh.Specifying the language improves accuracy and speed.
response_format
string
default:"json"
Output format:
  • json - JSON format
  • text - Plain text
  • srt - Subtitle file format
  • verbose_json - Detailed JSON with timestamps
  • vtt - WebVTT subtitle format

Request Example

import openai

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://ai.kaiho.cc/v1"
)

# Basic transcription
with open("audio.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="en"
    )

print(transcription.text)
Accuracy Improvement: Specify the correct language code and include proper nouns or technical terms in the prompt to significantly improve transcription accuracy.