Skip to main content
POST
https://ai.kaiho.cc
/
v1
/
audio
/
speech
TTS Text-to-Speech
curl --request POST \
  --url https://ai.kaiho.cc/v1/audio/speech \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "voice": "<string>",
  "speed": 123
}
'

Overview

The TTS (Text-to-Speech) API uses advanced speech synthesis technology to convert text into natural, fluent human voices, supporting multiple languages and voice options.

Supported Voices

Alloy

Neutral, balanced voice

Echo

Male, steady voice

Fable

Male, expressive voice

Onyx

Male, deep and powerful

Nova

Female, friendly and warm

Shimmer

Female, soft and sweet

Request Parameters

model
string
required
TTS model:
  • tts-1 - Standard quality, fast speed
  • tts-1-hd - High definition quality, more natural
input
string
required
Text to convert to speech, up to 4096 characters.
voice
string
required
Voice to use: alloy, echo, fable, onyx, nova, shimmer
speed
number
default:1
Speech speed, range 0.25 - 4.0.
  • 1.0 = Normal speed
  • 0.5 = Half speed
  • 2.0 = Double speed

Request Example

import openai
from pathlib import Path

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://ai.kaiho.cc/v1"
)

# Generate speech
response = client.audio.speech.create(
    model="tts-1-hd",
    voice="nova",
    input="Welcome to Kaihoxz text-to-speech service. We provide the most natural AI voice synthesis technology.",
    speed=1.0
)

# Save to file
speech_file_path = Path("speech.mp3")
response.stream_to_file(speech_file_path)

print(f"Speech saved to: {speech_file_path}")

Voice Characteristics

VoiceGenderCharacteristicsUse Cases
AlloyNeutralClear, professionalNews, education, customer service
EchoMaleSteady, authoritativeBusiness, broadcasting
FableMaleLively, expressiveStorytelling, advertising
OnyxMaleDeep, powerfulDocumentaries, serious content
NovaFemaleFriendly, warmAssistants, guidance
ShimmerFemaleSoft, sweetChildren’s content, casual scenarios
Usage Guidelines: Please ensure that generated speech content complies with local laws and regulations. It must not be used for fraud, impersonation, or other illegal purposes.
Performance Optimization: For real-time applications, use the tts-1 model with opus format to achieve the lowest latency.