awesome-generative-ai

Speech-to-Text (STT) Models

Comprehensive collection of open-source and production-ready STT models, libraries, and tools for offline and real-time transcription.


Table of Contents


Whisper-Based Models

OpenAI Whisper

faster-whisper

WhisperX


Traditional and Real-Time STT Engines

Kaldi

Vosk API

DeepSpeech (Mozilla)


PyTorch-Based Frameworks

SpeechBrain

RealtimeSTT


Lightweight and Embedded STT

Silero Models

Moonshine

sherpa-onnx


Utility Libraries

speech_recognition (Python)

annyang (JS)

react-native-voice


Selection Guide

Use Case Recommended Model Why
General transcription OpenAI Whisper High accuracy, multilingual
Production deployment faster-whisper Optimized performance
Real-time applications Vosk API Low latency, offline
Research projects SpeechBrain Comprehensive toolkit
Mobile/Edge devices Silero Models Lightweight, efficient
Web applications annyang Browser integration

Additional Resources


Tip: Choose models based on your deployment target (server, mobile, edge) and language coverage needs.