Whisper-1 Audio Transcription

curl --request POST \
  --url https://ai.kaiho.cc/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "response_format": "<string>"
}
'

POST

https://ai.kaiho.cc

audio

transcriptions

Whisper-1 Audio Transcription

curl --request POST \
  --url https://ai.kaiho.cc/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "response_format": "<string>"
}
'

Overview

Whisper-1 is OpenAI’s speech recognition model that accurately transcribes audio to text, supporting multiple languages and audio formats.

Supported Features

Multi-language Recognition

Supports automatic recognition of 99+ languages

Timestamps

Optional word-level or segment-level timestamps

Speaker Separation

Distinguish different speakers (beta feature)

Format Conversion

Supports multiple output formats

Request Parameters

file

required

Audio file to transcribe.Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webmMaximum file size: 25 MB

model

string

required

Model ID, use whisper-1.

language

string

Language code (ISO-639-1) of the audio, e.g. en or zh.Specifying the language improves accuracy and speed.

response_format

string

default:"json"

Output format:

json - JSON format
text - Plain text
srt - Subtitle file format
verbose_json - Detailed JSON with timestamps
vtt - WebVTT subtitle format

Request Example

import openai

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://ai.kaiho.cc/v1"
)

# Basic transcription
with open("audio.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="en"
    )

print(transcription.text)

Accuracy Improvement: Specify the correct language code and include proper nouns or technical terms in the prompt to significantly improve transcription accuracy.

Sora2 Video Generation TTS Text-to-Speech

Text Series

Image Series

Video Series

Audio Series

Task Management

Whisper-1 Audio Transcription

Overview

Supported Features

Multi-language Recognition

Timestamps

Speaker Separation

Format Conversion

Request Parameters

Request Example

Text Series

Image Series

Video Series

Audio Series

Task Management

​Overview

​Supported Features

Multi-language Recognition

Timestamps

Speaker Separation

Format Conversion

​Request Parameters

​Request Example

Overview

Supported Features

Request Parameters

Request Example