Logo

StreamSpeech

StreamSpeech is an “All in One” seamless model ASR, translation, TTS.

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis. Highlight: 1. StreamSpeech achieves SOTA performance on both offline and simultaneous speech-to-speech translation. 2. StreamSpeech performs streaming ASR, simultaneous speech-to-text translation and simultaneous speech-to-speech translation via an "All in One" seamless model. 3. StreamSpeech can present intermediate results (i.e., ASR or translation results) during simultaneous translation, offering a more comprehensive low-latency communication experience. Support 8 Tasks: - Offline: Speech Recognition (ASR)✅, Speech-to-Text Translation (S2TT)✅, Speech-to-Speech Translation (S2ST)✅, Speech Synthesis (TTS)✅ - Simultaneous: Streaming ASR✅, Simultaneous S2TT✅, Simultaneous S2ST✅, Real-time TTS✅ under any latency (with one model)