shipfeedAI news, curated daily

23:56:56 CET
20 MAY23:56:56shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI

OpenAI launches three audio models via its API — gpt-4o-transcribe, gpt-4o-mini-tts with promptable prosody, and a semantic VAD update — while adding audio support to its Agents SDK.

Mar 20 · · primary fetch1 sourceupdated Mar 20 ·

OpenAI has launched three new state-of-the-art audio models in their API, including gpt-4o-transcribe, a speech-to-text model outperforming Whisper, and gpt-4o-mini-tts, a text-to-speech model with promptable prosody allowing control over timing and emotion. The Agents SDK now supports audio, enabling voice agents. OpenAI also updated turn detection for real-time voice activity detection (VAD) based on speech content.

Additionally, OpenAI's o1-pro model is available to select developers with advanced features like vision and function calling, though at higher compute costs. The community shows strong enthusiasm for these audio advancements, with a radio contest for TTS creations underway. Meanwhile, Kokoro-82M v1.0 emerges as a leading open weights TTS model with competitive pricing on Replicate.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiPromptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AIprimary