Welcome to the Decibri documentation

This documentation is here to help you get up and running with decibri and integrate it into your voice and audio applications. Decibri is a cross-platform audio capture and output library available as a Python package, a Node.js package, a Rust library, a browser-side AudioWorklet, and a standalone CLI. The Rust core is built on cpal with pre-built binaries. One audio engine, five ways to use it.

Getting started

Getting started

Install decibri, capture your first PCM chunk, and build real-time voice applications.

Audio Processing

Decibri can condition microphone audio inside the engine, before it reaches your code. Every stage is off by default, so a plain capture is unchanged until you opt in.

Audio Capture Engine (ACE)

The built-in conditioning chain: DC removal, denoise, high-pass filter, automatic gain control, and limiter. Keyless, local, and available across Python, Node.js, and Rust.

Integrations

Decibri ships audio to four kinds of integrations: speech-to-text, text-to-speech, voice activity detection, and keyword spotting. Nine STT providers are supported today (seven cloud, two local), plus a local TTS provider, built-in VAD, and a local KWS engine.

Speech-to-text (STT)

AssemblyAI

Cloud speech-to-text with AssemblyAI's Universal-3 Pro streaming model.

AWS Transcribe

Cloud speech-to-text with AWS Transcribe streaming over HTTP/2.

Azure AI Speech

Cloud speech-to-text with Azure AI Speech using PushAudioInputStream.

Deepgram

Cloud speech-to-text with Deepgram's Nova-3 model over WebSocket.

Google Cloud Speech-to-Text

Cloud speech-to-text with Google Cloud Speech-to-Text. The simplest integration: one pipe() call.

Mistral Voxtral

Cloud speech-to-text with Mistral's Voxtral open-weights model (Apache 2.0).

OpenAI

Stream audio to OpenAI's Realtime API for cloud speech-to-text. 24 kHz sample rate, raw WebSocket.

Sherpa-ONNX

Real-time local transcription with a streaming Zipformer model. No API key, no network.

Whisper.cpp

Local buffered transcription with OpenAI's Whisper model via a native addon. No API key, no network.

Text-to-speech (TTS)

Sherpa-ONNX

Local text-to-speech with the Kokoro model. Generate speech offline and play it through decibri's Speaker. No API key, no network.

Voice activity detection (VAD)

Silero VAD

Neural voice activity detection. The Silero v5 ONNX model is bundled with decibri.

Keyword spotting (KWS)

Sherpa-ONNX Keyword Spotting

Detect spoken keywords and wake phrases locally. BPE-based keyword spotter, no cloud, no API key.

APIs

Decibri surfaces as four runtime bindings with the same audio backend and chunk semantics.

Python API

Sync and async Microphone and Speaker, voice activity detection, asyncio support, and exception hierarchy. Python 3.10 and above.

Node.js API

Rust native addon via napi-rs. Provides a Readable stream and standard Node.js events.

Browser API

AudioWorklet implementation via conditional exports. Same npm package, same API shape, async permission flow.

CLI API

Standalone statically-linked binary: decibri-cli. Capture WAV, play WAV, enumerate audio devices.