decibri captures audio. These integrations process it. Use the table below to find the right one for your use case.
| Use case | Integration | Latency | Cost | Offline |
|---|---|---|---|---|
| Real-time local STT | Sherpa-ONNX | Low | Free | Yes |
| High-accuracy local STT | Whisper.cpp | Medium | Free | Yes |
| Wake word detection | Sherpa-ONNX KWS | Low | Free | Yes |
| Voice activity detection | Sherpa-ONNX VAD | Low | Free | Yes |
| Real-time cloud STT | Deepgram | Low | Pay-per-use (free tier) | No |
| Real-time cloud STT | AssemblyAI | Low | Pay-per-use | No |
| Real-time cloud STT | OpenAI Realtime | Low | Pay-per-use | No |
| Real-time cloud STT | Mistral Voxtral | Low | Pay-per-use | No |
| Real-time cloud STT | AWS Transcribe | Low | Pay-per-use (free tier) | No |
| Real-time cloud STT | Google Speech-to-Text | Low | Pay-per-use (free tier) | No |
| Real-time cloud STT | Azure Speech-to-Text | Low | Pay-per-use (free tier) | No |
Local integrations (Sherpa-ONNX, Whisper.cpp) run entirely on-device. No API key, no network, no usage fees. Audio never leaves the machine. Trade-off: you supply the compute and manage the model files.
Cloud integrations (Deepgram, AssemblyAI, OpenAI, Mistral, AWS Transcribe, Google Speech-to-Text, Azure Speech-to-Text) stream audio to an external API. Higher accuracy on some benchmarks, no local GPU required, and managed model updates. Trade-off: requires an API key, network connectivity, and incurs per-use costs.
Real-time local transcription
Detect spoken keywords with sherpa-onnx
Detect speech vs silence with Silero VAD