Skip to content

Text-to-Speech Gallery

A collection of Text-to-Speech models ready to use with FastRTC. Click on the tags below to find the TTS model you're looking for!

Note

The model you want to use does not have to be in the gallery. This is just a collection of models with a common interface that are easy to "plug and play" into your FastRTC app. But You can use any model you want without having to do any special setup. Simply use it from your stream handler!

  • 🗣👀 Orpheus.cpp


    Description: A llama.cpp port of Orpheus for fast lifelike speech synthesis on CPU!

    Install Instructions

    pip install orpheus-cpp
    

    Repository

  • 🗣👀 Your TTS Model


    Description

    Install Instructions

    Usage

    Demo

    Repository

How to add your own Text-to-Speech model

  1. Your model can be implemented in any framework you want but it must implement the TTSModel protocol.

    class TTSModel(Protocol):
        def tts(
            self, text: str, options: TTSOptions | None = None
        ) -> tuple[int, NDArray[np.float32 | np.int16]]: ...
    
        async def stream_tts(
            self, text: str, options: TTSOptions | None = None
        ) -> AsyncGenerator[tuple[int, NDArray[np.float32 | np.int16]], None]: ...
    
        def stream_tts_sync(
            self, text: str, options: TTSOptions | None = None
        ) -> Generator[tuple[int, NDArray[np.float32 | np.int16]], None, None]: ...
    
    • The tts methods should take in a string of the text to be spoken and an optional TTSOptions.

    • The audio tuple should be of the form (sample_rate, audio_array) where sample_rate is the sample rate of the audio array and audio_array is a numpy array of the audio data. It can be of type np.int16 or np.float32.

  2. Once you have your model implemented, you can use it in your handler!

    from fastrtc import Stream, AdditionalOutputs, ReplyOnPause, get_stt_model
    from your_model import YourModel
    
    model = YourModel() # implement the TTSModel protocol
    options = YourTTSOptions() # implement the TTSOptions protocol
    stt_model = get_stt_model(model)
    
    def echo(audio):
        text = stt_model.tts(audio)
        for chunk in model.stream_tts(text, options):
            yield chunk
    
    stream = Stream(ReplyOnPause(echo), mode="send-receive", modality="audio",
                    additional_outputs=[gr.Textbox(label="Transcription")],
                    additional_outputs_handler=lambda old,new:old + new)
    stream.ui.launch()
    
  3. Open a PR to add your model to the gallery! Ideally your model package should be pip installable so other can try it out easily.