Silero VAD

Decibri ships Silero VAD v5 bundled. Set the VAD mode to 'silero' to enable neural voice activity detection with no extra dependencies, no separate model download, and no sherpa-onnx required. This page covers both Python and Node.js. Tabs above each code block switch between languages.

What this does

Silero v5 is a neural voice activity detection model. It runs locally in Rust via ONNX Runtime. No cloud API, no server, no fine-tuning required. The ~2.3 MB ONNX model ships inside the decibri package on both PyPI and npm.

Decibri offers two VAD modes out of the box: an RMS energy detector ('energy') that uses simple amplitude thresholding, and the Silero v5 neural model ('silero') that handles noisy environments, background music, and multi-speaker scenarios with higher accuracy. The model set is exactly these two.

When to use which: Use the RMS energy detector for clean-audio scenarios and minimal CPU overhead. Use 'silero' for noisy environments, background music, or any case where accuracy matters more than CPU budget.

Prerequisites

Install Decibri

$ pip install decibri

The Silero v5 ONNX model is bundled in both the PyPI and npm packages. No separate download, no sherpa-onnx needed. The install command above switches with the language tabs in the code blocks below.

Code walkthrough

1. Create a Microphone with Silero VAD

Set the VAD mode to 'silero' to enable voice activity detection with the neural model instead of the RMS energy detector. The bare string shorthand uses the mode defaults: a 0.5 probability threshold for Silero and a 300 ms holdoff.

import decibri

mic = decibri.Microphone(sample_rate=16000, channels=1, vad="silero")

const { Microphone } = require('decibri');

const mic = new Microphone({ sampleRate: 16000, channels: 1, vad: 'silero' });

To tune the detection threshold or the holdoff, pass a Vad config object instead of the bare string. In Python that is decibri.Vad(model=, threshold=, holdoff_ms=); in Node.js it is a { model, threshold, holdoffMs } object. The 'energy' mode is configured the same way.

import decibri

# Override the threshold and holdoff with a Vad config object.
mic = decibri.Microphone(
    sample_rate=16000,
    channels=1,
    vad=decibri.Vad(model="silero", threshold=0.6, holdoff_ms=200),
)

const { Microphone } = require('decibri');

// Override the threshold and holdoff with a Vad config object.
const mic = new Microphone({
  sampleRate: 16000,
  channels: 1,
  vad: { model: 'silero', threshold: 0.6, holdoffMs: 200 },
});

2. React to speech and silence

The two bindings expose voice activity differently. In Node.js, decibri emits a 'speech' event when voice activity starts and a 'silence' event when audio stays below the threshold for the holdoff window (the holdoffMs field, default 300 ms); read the latest score from mic.vadScore. In Python there are no events: read mic.is_speaking and mic.vad_score as properties in the chunk loop, and detect the transitions yourself. Both patterns ride alongside the regular audio stream.

prev_speaking = False

with mic:
    for chunk in mic:
        if mic.is_speaking and not prev_speaking:
            print("[speech started]")
        elif not mic.is_speaking and prev_speaking:
            print("[speech ended]")
        prev_speaking = mic.is_speaking
        # chunk is int16 PCM bytes; mic.vad_score is the latest Silero probability

mic.on('speech', () => {
  console.log('[speech started]');
});

mic.on('silence', () => {
  console.log('[speech ended]');
});

mic.on('data', (chunk) => {
  // Audio arrives continuously. mic.vadScore is the latest Silero probability.
});

3. Clean shutdown

Stop the microphone when the user presses Ctrl+C. In Python the with block stops capture on exit, so catching KeyboardInterrupt is enough. In Node.js, call mic.stop() from a SIGINT handler.

try:
    with mic:
        for chunk in mic:
            ...  # detect speech as shown above
except KeyboardInterrupt:
    print("\nStopping...")

process.on('SIGINT', () => {
  console.log('\nStopping...');
  mic.stop();
  process.exit(0);
});

Full example

View complete code

import decibri

mic = decibri.Microphone(sample_rate=16000, channels=1, vad="silero")

print("Listening for speech... (Ctrl+C to stop)\n")
prev_speaking = False
try:
    with mic:
        for chunk in mic:
            if mic.is_speaking and not prev_speaking:
                print("[speech start]")
            elif not mic.is_speaking and prev_speaking:
                print("[speech end]")
            prev_speaking = mic.is_speaking
            # mic.vad_score is the latest Silero probability (0.0 to 1.0)
except KeyboardInterrupt:
    print("\nStopping...")

'use strict';

const { Microphone } = require('decibri');

const mic = new Microphone({ sampleRate: 16000, channels: 1, vad: 'silero' });

mic.on('speech', () => console.log('[speech start]'));
mic.on('silence', () => console.log('[speech end]'));

mic.on('error', (err) => console.error('Mic error:', err.message));

process.on('SIGINT', () => {
  console.log('\nStopping...');
  mic.stop();
  process.exit(0);
});

console.log('Listening for speech... (Ctrl+C to stop)\n');

Configuration options

Voice activity detection is off unless you enable it. There are two ways to turn it on.

Shorthand, default settings. Pass a mode name to enable VAD with that mode's default threshold and holdoff.

mic = decibri.Microphone(vad="silero")   # or "energy"

Object, your own values. Pass a config object to set the threshold and holdoff yourself.

mic = decibri.Microphone(
    vad=decibri.Vad(model="silero", threshold=0.6, holdoff_ms=200)
)

The config object's fields:

Field	Default	Description
`model`	`'silero'`	`'silero'` (bundled Silero v5 neural model) or `'energy'` (built-in RMS detector).
`threshold`	`0.5` silero, `0.01` energy	Speech-probability cutoff, range 0 to 1. Out-of-range values raise an error.
`holdoff_ms`	`300`	Milliseconds of sub-threshold audio before speech is considered ended. Higher tolerates brief pauses.

To use a custom Silero ONNX build instead of the bundled v5, pass model_path alongside the 'silero' mode.

Shorthand, default settings. Pass a mode name to enable VAD with that mode's default threshold and holdoff.

const mic = new Microphone({ vad: 'silero' });   // or 'energy'

Object, your own values. Pass a config object to set the threshold and holdoff yourself.

const mic = new Microphone({
  vad: { model: 'silero', threshold: 0.6, holdoffMs: 200 }
});

The config object's fields:

Field	Default	Description
`model`	required	`'silero'` (bundled Silero v5 neural model) or `'energy'` (built-in RMS detector).
`threshold`	`0.5` silero, `0.01` energy	Speech-probability cutoff, range 0 to 1. Out-of-range values raise an error.
`holdoffMs`	`300`	Milliseconds of sub-threshold audio before speech is considered ended. Higher tolerates brief pauses.

To use a custom Silero ONNX build instead of the bundled v5, pass modelPath alongside the 'silero' mode.

Enabling VAD: name the mode explicitly. Passing True (Python) or true (Node.js) to enable VAD is rejected with a migration message. When enabled, Node.js emits 'speech' / 'silence' events and Python updates mic.is_speaking / mic.vad_score.

Silero VAD

What this does

Prerequisites

Install Decibri

Code walkthrough

1. Create a Microphone with Silero VAD

2. React to speech and silence

3. Clean shutdown

Full example

Configuration options

Related