Integrations

Decibri connects audio to integrations. Speech-to-text (STT), Voice activity detection (VAD), and Keyword spotting (KWS) consume decibri's capture stream. Text-to-speech (TTS) runs in the other direction: it generates speech that decibri plays through the speaker. Pick the category, then pick the provider that fits your accuracy, cost, and offline requirements.

Speech-to-text (STT)

Transcribe microphone audio. Cloud providers (Deepgram, AssemblyAI, OpenAI, AWS Transcribe, Azure AI Speech, Google Cloud Speech-to-Text, Mistral Voxtral) and local providers (Sherpa-ONNX, Whisper.cpp) are supported.

Voice activity detection (VAD)

Detect when someone is speaking. Silero v5 neural model for accuracy-critical use cases, or the built-in RMS detector for simple pipelines. Both bundled with decibri.

Keyword spotting (KWS)

Wake words and voice command triggers. Cheaper than full STT when you only need to catch specific phrases. Sherpa-ONNX runs entirely on-device.

Text-to-speech (TTS)

Generate speech and play it through decibri's Speaker. Sherpa-ONNX runs the Kokoro model entirely on-device, no API key and no network dependency.

Integrations

Speech-to-text (STT)

Voice activity detection (VAD)

Keyword spotting (KWS)

Text-to-speech (TTS)

Related