Skip to content

Speech-to-Text Gallery

A collection of Speech-to-Text models ready to use with FastRTC. Click on the tags below to find the STT model you're looking for!

Note

The model you want to use does not have to be in the gallery. This is just a collection of models with a common interface that are easy to "plug and play" into your FastRTC app. But You can use any model you want without having to do any special setup. Simply use it from your stream handler!

  • 🗣👀 distil-whisper-FastRTC


    Description: Distil-whisper from Hugging Face wraped in a pypi package for plug and play!

    Install Instructions

    pip install distil-whisper-fastrtc
    
    Use it the same way you would the native fastRTC TTS model!

    Demo

    Repository

  • 🗣👀 Kroko-ASR


    Description Kroko-ASR is a lightweight TTS model

    Install Instructions

    pip install fastrtc-kroko
    
    Check out the fastRTC-Kroko docs for examples!

    Repository

  • 🗣👀 Your STT Model


    Description

    Install Instructions

    Usage

    Demo

    Repository

How to add your own STT model

  1. Your model can be implemented in any framework you want but it must implement the STTModel protocol.

    class STTModel(Protocol):
        def stt(self, audio: tuple[int, NDArray[np.int16 | np.float32]]) -> str: ...
    
    • The stt method should take in an audio tuple (sample_rate, audio_array) and return a string of the transcribed text.

    • The audio tuple should be of the form (sample_rate, audio_array) where sample_rate is the sample rate of the audio array and audio_array is a numpy array of the audio data. It can be of type np.int16 or np.float32.

  2. Once you have your model implemented, you can use it in your handler!

    from fastrtc import Stream, AdditionalOutputs, ReplyOnPause
    from your_model import YourModel
    
    model = YourModel() # implement the STTModel protocol
    
    def echo(audio):
        text = model.stt(audio)
        yield AdditionalOutputs(text)
    
    stream = Stream(ReplyOnPause(echo), mode="send-receive", modality="audio",
                    additional_outputs=[gr.Textbox(label="Transcription")],
                    additional_outputs_handler=lambda old,new:old + new)
    stream.ui.launch()
    
  3. Open a PR to add your model to the gallery! Ideally you model package should be pip installable so other can try it out easily.