Audio Capture Engine (ACE)

The Audio Capture Engine (ACE) is decibri's opt-in audio front-end: it conditions microphone audio before it reaches your code. Every stage is off by default, so a plain Microphone(...) is byte-identical to capture without ACE, and you turn on only the stages you want.

What makes ACE useful is not the individual filters; it is the package they come in. The chain is keyless and local: no API key, no account, no network call. It runs on-device, and the bundled denoise model ships inside the decibri package. The same conditioning options, with the same names and ranges, are available from Python, Node.js, and Rust.

The stages are conventional, and each is described plainly below.

Quick example

Enable the stages you want by passing options to the Microphone constructor. Here, DC removal, denoise, a 100 Hz high-pass, AGC targeting -18 dBFS, and a -1.0 dBFS limiter:

import decibri

mic = decibri.Microphone(
    sample_rate=16000,
    dc_removal=True,
    denoise="fastenhancer-t",
    highpass=100,
    agc=-18,
    limiter=-1.0,
)

with mic:
    for chunk in mic:
        process(chunk)  # chunk is already conditioned

Leave an option unset to keep that stage off. With every stage off, capture is unchanged.

The pipeline

Decibri opens the microphone and produces a raw block of samples. Before that block reaches you, it passes through normalization (downmix to mono and resample to your target rate), then the ACE conditioning chain. The chunk you receive is the conditioned output.

The conditioning stages always run in this fixed order, regardless of the order you pass the options:

DC removal → denoise → high-pass → AGC → limiter

This order is deliberate. DC removal clears any constant offset first. Denoise runs early, on a near-full-band signal, so the model has the most to work with. The high-pass then trims low-frequency rumble below the voice band. AGC sets a consistent working level. The limiter sits last as a ceiling that catches any transient the AGC would let through.

VAD reads the pre-enhancement signal. Voice activity detection takes its reading after normalization but before the conditioning chain. Turning on enhancement does not change what counts as speech; it only changes the audio you receive. See Voice activity detection.

The stages

Each stage is an independent option on the Microphone constructor. The names below are the Python form; the Node.js form is listed with each stage and in the options reference.

DC removal

Removes a constant (DC) offset from the signal with a one-pole DC-blocking high-pass. A DC offset is a fixed bias away from zero that some hardware introduces; left in, it wastes headroom and can upset downstream processing. This stage runs first in the chain.

Enable it with a boolean. It is the only stage with no numeric configuration.

mic = decibri.Microphone(dc_removal=True)

Denoise

An optional bundled single-channel speech-enhancement model that suppresses background noise while preserving the voice. It is the one neural stage in the chain; the others are classical filters. The model runs on-device through the same ONNX Runtime decibri already uses for Silero VAD, and it ships inside the decibri package, so there is no separate download.

Enable it by naming the model. Today there is one model, "fastenhancer-t":

mic = decibri.Microphone(denoise="fastenhancer-t")

Denoise is designed for 16 kHz audio, which is the default sample rate. It is the only stage that adds latency: the conditioned output is delivered in steps of 256 samples, about 16 ms at 16 kHz. The other stages are sample-for-sample and add none.

High-pass

A second-order Butterworth high-pass filter that removes low-frequency rumble below the voice band, the kind of energy that comes from handling noise, air conditioning, or desk vibration. The cutoff is a fixed choice of 80 or 100 Hz, not a free frequency: pass one of those two integers.

mic = decibri.Microphone(highpass=100)  # or 80

Any value other than 80 or 100 is rejected at construction.

Automatic gain control (AGC)

Drives the running signal level toward a target with a smoothed, rate-limited gain, so quiet and loud speakers arrive at a more consistent level. The target is a level in dBFS, an integer from -40 to -3; -18 is a typical choice. AGC adds no latency.

mic = decibri.Microphone(agc=-18)

A value outside -40 to -3 is rejected at construction.

Limiter

A sample-peak ceiling that catches a transient the AGC would let exceed full scale. It is the safety net at the end of the chain: a fast peak limiter plus a hard clamp so the output never clips. The ceiling is a level in dBFS, a float from -3.0 to 0.0; -1.0 is a typical choice. The limiter adds no latency.

mic = decibri.Microphone(limiter=-1.0)

A value outside -3.0 to 0.0 is rejected at construction.

Options reference

The same five options across the bindings. Python uses snake_case, Node.js uses camelCase, and Rust exposes them as MicrophoneConfig fields. Each is off when unset.

Stage Python Node.js Type Values Default
DC removal dc_removal dcRemoval boolean True / False off
Denoise denoise denoise string "fastenhancer-t" off
High-pass highpass highpass integer (Hz) 80 or 100 off
AGC agc agc integer (dBFS) -40 to -3 off
Limiter limiter limiter float (dBFS) -3.0 to 0.0 off

An out-of-range value is rejected when the Microphone is constructed. Python raises ValueError; Node.js raises TypeError for an unknown denoise model and RangeError for an out-of-range high-pass, AGC, or limiter. dc_removal is a boolean and has no invalid value.

Not available in the browser. ACE is a native stage of the engine, so the conditioning options are available in the Node.js, Python, and Rust builds but not in the browser build. Passing an ACE option to a browser Microphone has no effect. See the Browser API.

Per-language detail

The conditioning options appear in each binding's constructor reference alongside the rest of the capture surface:

Complete example

This puts the chain together with voice activity detection: ACE conditions each chunk, and VAD tells you when someone is speaking. Python exposes speaking state as the is_speaking property; Node.js emits 'speech' and 'silence' events. Here, 16 kHz capture with denoise, a high-pass, AGC, and a limiter, gated by the bundled Silero VAD:

import decibri

# ACE conditions the audio; VAD gates it. VAD reads the pre-enhancement
# signal, so enabling the conditioning never changes what counts as speech.
mic = decibri.Microphone(
    sample_rate=16000,
    denoise="fastenhancer-t",
    highpass=100,
    agc=-18,
    limiter=-1.0,
    vad=decibri.Vad(model="silero", threshold=0.6, holdoff_ms=200),
)

with mic:
    for chunk in mic:
        if mic.is_speaking:
            process(chunk)  # chunk is conditioned (post-ACE)

The chunks you receive are conditioned by ACE, while VAD reports speech from the signal before conditioning. The two are independent: changing the conditioning does not move the speech and silence boundaries.