Decibri ships Silero VAD v5 bundled. Set vadMode: 'silero' to enable neural voice activity detection with no extra dependencies, no separate model download, and no sherpa-onnx required.
Silero v5 is a neural voice activity detection model. It runs locally in Rust via ONNX Runtime. No cloud API, no server, no fine-tuning required. The ~2.3 MB ONNX model ships with the decibri npm package.
Decibri offers two VAD implementations out of the box: an RMS energy detector (the default, vadMode: 'energy') that uses simple amplitude thresholding, and the Silero v5 neural model (vadMode: 'silero') that handles noisy environments, background music, and multi-speaker scenarios with much higher accuracy.
vadMode: 'silero' for noisy environments, background music, or any case where accuracy matters more than CPU budget.
The Silero v5 ONNX model is bundled. No separate download, no sherpa-onnx needed.
Set vad: true to enable voice activity detection, and vadMode: 'silero' to use the neural model instead of the default RMS detector.
const Decibri = require('decibri');
const mic = new Decibri({
sampleRate: 16000,
channels: 1,
vad: true,
vadMode: 'silero',
});
Decibri emits 'speech' when voice activity is detected and 'silence' when audio falls below the threshold for vadHoldoff milliseconds. Both events fire alongside the regular 'data' stream. You can segment the raw audio using these markers, or ignore them and consume only the gated regions.
mic.on('speech', () => {
console.log('[speech started]');
});
mic.on('silence', () => {
console.log('[speech ended]');
});
mic.on('data', (chunk) => {
// Audio data arrives continuously.
// Use 'speech' / 'silence' events to segment into utterances.
});
Stop the microphone when the user presses Ctrl+C.
process.on('SIGINT', () => {
console.log('\nStopping...');
mic.stop();
process.exit(0);
});
'use strict';
const Decibri = require('decibri');
const mic = new Decibri({
sampleRate: 16000,
channels: 1,
vad: true,
vadMode: 'silero',
});
mic.on('speech', () => console.log('[speech start]'));
mic.on('silence', () => console.log('[speech end]'));
mic.on('error', (err) => console.error('Mic error:', err.message));
process.on('SIGINT', () => {
console.log('\nStopping...');
mic.stop();
process.exit(0);
});
console.log('Listening for speech... (Ctrl+C to stop)\n');
VAD behaviour is controlled through Decibri constructor options. See the Node.js API reference for the full option surface.
| Option | Default | Description |
|---|---|---|
vad |
false |
Enable voice activity detection. When true, decibri emits 'speech' and 'silence' events. |
vadMode |
'energy' |
Detector to use. 'energy' is the built-in RMS threshold. 'silero' uses the bundled Silero v5 neural model. |
vadThreshold |
0.01 |
Detection threshold. For 'silero', interpret as the speech-probability cutoff (0–1). |
vadHoldoff |
300 |
Milliseconds of sub-threshold audio before 'silence' fires. Tune higher to tolerate brief pauses within speech. |
modelPath |
bundled Silero v5 | Path to a custom Silero ONNX build. Use for benchmarking alternate versions or loading a fine-tuned model. |
Decibri ships with Silero v5 by default. To use a different ONNX Silero build (for benchmarking an older version, or loading a fine-tune), pass the modelPath option.
const mic = new Decibri({
sampleRate: 16000,
channels: 1,
vad: true,
vadMode: 'silero',
modelPath: './my-silero-v6.onnx',
});
For sample-level speech-segment timestamps, explicit isSpeechDetected() polling, or segment draining with front() / pop(), you can use sherpa-onnx as a standalone library alongside decibri. In this pattern decibri provides only the microphone audio; sherpa-onnx handles the VAD entirely on its own API surface. It is not a decibri mode or option; the two packages run side by side. This was the recommended pattern before decibri v3 bundled Silero natively. Prefer the decibri-native approach above unless you specifically need the sherpa-onnx API.
Download the Silero VAD ONNX model from the sherpa-onnx releases:
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
This is a single ~2.3 MB file. Note that this is a separate copy of the Silero model from the one decibri bundles; it is here so sherpa-onnx can find it at the path you configure.
Configure the sherpa-onnx VAD with the model path and detection parameters, then feed audio in 512-sample windows and drain completed speech segments.
const Decibri = require('decibri');
const sherpa = require('sherpa-onnx');
const config = {
sileroVad: {
model: './silero_vad.onnx',
threshold: 0.5,
minSilenceDuration: 0.25,
minSpeechDuration: 0.25,
windowSize: 512,
},
sampleRate: 16000,
debug: false,
bufferSizeInSeconds: 60,
};
const vad = new sherpa.Vad(config);
const mic = new Decibri({ sampleRate: 16000, channels: 1 });
const windowSize = 512;
let speechActive = false;
mic.on('data', (chunk) => {
const int16 = new Int16Array(chunk.buffer, chunk.byteOffset, chunk.length / 2);
const float32 = new Float32Array(int16.length);
for (let i = 0; i < int16.length; i++) {
float32[i] = int16[i] / 32768;
}
for (let offset = 0; offset + windowSize <= float32.length; offset += windowSize) {
const window = float32.subarray(offset, offset + windowSize);
vad.acceptWaveform(window);
if (vad.isSpeechDetected() && !speechActive) {
speechActive = true;
console.log('[speech start]');
}
if (!vad.isSpeechDetected() && speechActive) {
speechActive = false;
console.log('[speech end]');
}
while (!vad.isEmpty()) {
const segment = vad.front();
const duration = (segment.samples.length / 16000).toFixed(2);
console.log(` segment: ${duration}s of speech`);
vad.pop();
}
}
});
process.on('SIGINT', () => {
mic.stop();
vad.free();
process.exit(0);
});
console.log('Listening for speech... (Ctrl+C to stop)');