Stream live microphone audio to AssemblyAI for real-time cloud transcription using decibri with the official AssemblyAI SDK. This page covers both Python and Node.js. Tabs above each code block switch between them.
This integration captures live audio from your microphone using decibri and streams it to AssemblyAI's cloud API. Transcription results return in real-time using a turn-based model, where speech is grouped into natural segments with partial and final results for each turn. There is no model download, no local inference, and no format conversion required.
streaming.eu.assemblyai.com for data residency requirements. If your use case requires audio to stay entirely on-device, use the local integrations: sherpa-onnx (real-time streaming) or whisper.cpp (batch transcription).
.env file in your project root:ASSEMBLYAI_API_KEY=your_key_here
The dotenv package (python-dotenv on PyPI, dotenv on npm) loads your API key from the .env file. If you set environment variables another way, you can skip it. The install command above switches with the language tabs in the code blocks below.
No model download is required. All processing happens in AssemblyAI's cloud.
Import decibri, the AssemblyAI SDK, and dotenv. Create a client with your API key.
import os
from dotenv import load_dotenv
import decibri
from assemblyai.streaming.v3 import (
StreamingClient,
StreamingClientOptions,
StreamingEvents,
StreamingParameters,
)
load_dotenv()
client = StreamingClient(
StreamingClientOptions(api_key=os.environ["ASSEMBLYAI_API_KEY"])
)
require('dotenv').config();
const { Microphone } = require('decibri');
const { AssemblyAI } = require('assemblyai');
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
Configure the streaming session. The speech model is required and has no default; omitting it causes the connection to fail. The Python SDK builds these into a StreamingParameters object passed at connect time; the Node SDK passes them to client.streaming.transcriber().
params = StreamingParameters(sample_rate=16000, speech_model="u3-rt-pro")
const transcriber = client.streaming.transcriber({
speechModel: 'u3-rt-pro',
sampleRate: 16_000,
});
Register handlers before connecting. This ensures no events are missed during the connection handshake. Both SDKs expose at least turn and error events; the Node SDK additionally exposes open and close.
def on_turn(client, event):
if event.end_of_turn and event.transcript:
print(event.transcript)
def on_error(client, error):
print(f"AssemblyAI error: {error}")
client.on(StreamingEvents.Turn, on_turn)
client.on(StreamingEvents.Error, on_error)
transcriber.on('open', ({ id }) => {
console.log('Session:', id);
});
transcriber.on('turn', (turn) => {
if (turn.transcript) {
console.log(turn.transcript);
}
});
transcriber.on('error', (err) => {
console.error('AssemblyAI error:', err);
});
transcriber.on('close', (code, reason) => {
console.log('Connection closed:', code, reason);
});
frames_per_buffer / framesPerBuffer. The streaming code block below batches incoming audio into 100 ms (3200 bytes of int16 mono 16 kHz) before forwarding to AssemblyAI, and drops any trailing remainder under 50 ms (1600 bytes).
Connect to AssemblyAI, then prepare the microphone. In Python the connect call is synchronous and Microphone is only constructed here; capture starts in the next block when it is opened with a with statement. In Node the connect call returns a promise that must be awaited, and new Microphone(...) begins capture as soon as it is constructed. Audio must only be sent after the connection is established.
client.connect(params)
mic = decibri.Microphone(sample_rate=16000, channels=1)
await transcriber.connect();
const mic = new Microphone({ sampleRate: 16000, channels: 1 });
Read chunks from decibri and forward them to AssemblyAI. AssemblyAI streaming v3 requires audio in 50 ms to 1000 ms windows; decibri on Windows may emit chunks shorter than 50 ms, so batch incoming audio into 100 ms (3200 bytes of int16 mono at 16 kHz) before forwarding. Drop any trailing remainder under 50 ms (1600 bytes). No format conversion is needed; decibri produces raw int16 PCM which AssemblyAI accepts as-is.
BATCH_BYTES = 3200 # 100 ms of int16 mono at 16 kHz
MIN_BYTES = 1600 # 50 ms; AssemblyAI streaming v3 minimum
def audio_iter():
buffer = bytearray()
for chunk in mic:
buffer.extend(chunk)
while len(buffer) >= BATCH_BYTES:
yield bytes(buffer[:BATCH_BYTES])
del buffer[:BATCH_BYTES]
if len(buffer) >= MIN_BYTES:
yield bytes(buffer)
const BATCH_BYTES = 3200; // 100 ms of int16 mono at 16 kHz
const MIN_BYTES = 1600; // 50 ms; AssemblyAI streaming v3 minimum
let buffer = Buffer.alloc(0);
mic.on('data', (chunk) => {
buffer = Buffer.concat([buffer, chunk]);
while (buffer.length >= BATCH_BYTES) {
transcriber.sendAudio(buffer.subarray(0, BATCH_BYTES));
buffer = buffer.subarray(BATCH_BYTES);
}
});
mic.on('end', () => {
if (buffer.length >= MIN_BYTES) {
transcriber.sendAudio(buffer);
}
});
AssemblyAI groups speech into turns, which are natural segments of speech separated by pauses. Each turn emits multiple events as audio is processed:
end_of_turn flag is false while the result is partial and still being refined.end_of_turn flag is true when the turn is complete with a final transcript.turn_order index increments with each new turn (starting from 0).transcript field carries the recognised text. It grows as more words are finalised during the turn and contains the complete final text when end_of_turn is true.The event object exposes the same fields in both SDKs; only the callback parameter name differs (event in Python, turn in Node).
To show only final results, filter on end_of_turn:
def on_turn(client, event):
if event.end_of_turn and event.transcript:
print(event.transcript)
transcriber.on('turn', (turn) => {
if (turn.end_of_turn && turn.transcript) {
console.log(turn.transcript);
}
});
Stop the microphone and close the AssemblyAI connection when the user presses Ctrl+C.
print("Listening... (Ctrl+C to stop)")
try:
with mic:
client.stream(audio_iter())
except KeyboardInterrupt:
print("\nStopping...")
finally:
client.disconnect(terminate=True)
process.on('SIGINT', async () => {
console.log('\nStopping...');
mic.stop();
await transcriber.close();
process.exit(0);
});
import os
from dotenv import load_dotenv
import decibri
from assemblyai.streaming.v3 import (
StreamingClient,
StreamingClientOptions,
StreamingEvents,
StreamingParameters,
)
BATCH_BYTES = 3200 # 100 ms of int16 mono at 16 kHz
MIN_BYTES = 1600 # 50 ms; AssemblyAI streaming v3 minimum
load_dotenv()
client = StreamingClient(
StreamingClientOptions(api_key=os.environ["ASSEMBLYAI_API_KEY"])
)
def on_turn(client, event):
if event.end_of_turn and event.transcript:
print(event.transcript)
def on_error(client, error):
print(f"AssemblyAI error: {error}")
client.on(StreamingEvents.Turn, on_turn)
client.on(StreamingEvents.Error, on_error)
client.connect(StreamingParameters(sample_rate=16000, speech_model="u3-rt-pro"))
mic = decibri.Microphone(sample_rate=16000, channels=1)
def audio_iter():
buffer = bytearray()
for chunk in mic:
buffer.extend(chunk)
while len(buffer) >= BATCH_BYTES:
yield bytes(buffer[:BATCH_BYTES])
del buffer[:BATCH_BYTES]
if len(buffer) >= MIN_BYTES:
yield bytes(buffer)
print("Listening... (Ctrl+C to stop)")
try:
with mic:
client.stream(audio_iter())
except KeyboardInterrupt:
print("\nStopping...")
finally:
client.disconnect(terminate=True)
'use strict';
require('dotenv').config();
const { Microphone } = require('decibri');
const { AssemblyAI } = require('assemblyai');
const BATCH_BYTES = 3200; // 100 ms of int16 mono at 16 kHz
const MIN_BYTES = 1600; // 50 ms; AssemblyAI streaming v3 minimum
const run = async () => {
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
const transcriber = client.streaming.transcriber({
speechModel: 'u3-rt-pro',
sampleRate: 16_000,
});
transcriber.on('open', ({ id }) => {
console.log('AssemblyAI connected. Session:', id);
});
transcriber.on('turn', (turn) => {
if (turn.end_of_turn && turn.transcript) {
console.log(turn.transcript);
}
});
transcriber.on('error', (err) => {
console.error('AssemblyAI error:', err);
});
transcriber.on('close', (code, reason) => {
console.log('Connection closed:', code, reason);
});
await transcriber.connect();
const mic = new Microphone({ sampleRate: 16000, channels: 1 });
let buffer = Buffer.alloc(0);
mic.on('data', (chunk) => {
buffer = Buffer.concat([buffer, chunk]);
while (buffer.length >= BATCH_BYTES) {
transcriber.sendAudio(buffer.subarray(0, BATCH_BYTES));
buffer = buffer.subarray(BATCH_BYTES);
}
});
mic.on('error', (err) => {
console.error('Mic error:', err.message);
});
process.on('SIGINT', async () => {
console.log('\nStopping...');
mic.stop();
if (buffer.length >= MIN_BYTES) {
transcriber.sendAudio(buffer);
}
await transcriber.close();
process.exit(0);
});
console.log('Listening... (Ctrl+C to stop)\n');
};
run().catch(console.error);
The transcriber options control how AssemblyAI processes your audio. Names differ slightly between SDKs (Python uses snake_case, Node uses camelCase) but the meaning is identical. Here are the key ones:
| Option | Value | Description |
|---|---|---|
speech_model / speechModel |
'u3-rt-pro' |
Required. Universal-3 Pro Streaming model. |
sample_rate / sampleRate |
16000 |
Must match decibri's sample rate. |
Additional options such as keyterm prompting and speaker diarization are available. See the AssemblyAI streaming documentation for the complete list.