Overview
Whisper-1 is OpenAI’s speech recognition model that accurately transcribes audio to text, supporting multiple languages and audio formats.
Supported Features
Multi-language Recognition Supports automatic recognition of 99+ languages
Timestamps Optional word-level or segment-level timestamps
Speaker Separation Distinguish different speakers (beta feature)
Format Conversion Supports multiple output formats
Request Parameters
Audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm Maximum file size: 25 MB
Language code (ISO-639-1) of the audio, e.g. en or zh. Specifying the language improves accuracy and speed.
Output format:
json - JSON format
text - Plain text
srt - Subtitle file format
verbose_json - Detailed JSON with timestamps
vtt - WebVTT subtitle format
Request Example
import openai
client = openai.OpenAI(
api_key = "YOUR_API_KEY" ,
base_url = "https://ai.kaiho.cc/v1"
)
# Basic transcription
with open ( "audio.mp3" , "rb" ) as audio_file:
transcription = client.audio.transcriptions.create(
model = "whisper-1" ,
file = audio_file,
language = "en"
)
print (transcription.text)
Accuracy Improvement: Specify the correct language code and include proper nouns or technical terms in the prompt to significantly improve transcription accuracy.