AssemblyAI

Stream live microphone audio to AssemblyAI for real-time cloud transcription using decibri with the official AssemblyAI SDK. This page covers both Python and Node.js. Tabs above each code block switch between them.

What this does

This integration captures live audio from your microphone using decibri and streams it to AssemblyAI's cloud API. Transcription results return in real-time using a turn-based model, where speech is grouped into natural segments with partial and final results for each turn. There is no model download, no local inference, and no format conversion required.

Cloud vs local

Note: AssemblyAI is a cloud service. Audio is sent to AssemblyAI's servers for processing. An EU endpoint is available at streaming.eu.assemblyai.com for data residency requirements. If your use case requires audio to stay entirely on-device, use the local integrations: sherpa-onnx (real-time streaming) or whisper.cpp (batch transcription).

Prerequisites

Get an API key

Sign up at assemblyai.com
Upgrade your account (Settings > Billing > add a payment method). Streaming is only available on upgraded accounts.
Copy your API key from the dashboard
Store it in a .env file in your project root:

ASSEMBLYAI_API_KEY=your_key_here

Install packages

$ pip install decibri assemblyai python-dotenv

The dotenv package (python-dotenv on PyPI, dotenv on npm) loads your API key from the .env file. If you set environment variables another way, you can skip it. The install command above switches with the language tabs in the code blocks below.

No model download is required. All processing happens in AssemblyAI's cloud.

Code walkthrough

1. Configuration

Import decibri, the AssemblyAI SDK, and dotenv. Create a client with your API key.

import os
from dotenv import load_dotenv
import decibri
from assemblyai.streaming.v3 import (
    StreamingClient,
    StreamingClientOptions,
    StreamingEvents,
    StreamingParameters,
)

load_dotenv()
client = StreamingClient(
    StreamingClientOptions(api_key=os.environ["ASSEMBLYAI_API_KEY"])
)

require('dotenv').config();

const { Microphone } = require('decibri');
const { AssemblyAI } = require('assemblyai');

const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

2. Create streaming transcriber

Configure the streaming session. The speech model is required and has no default; omitting it causes the connection to fail. The Python SDK builds these into a StreamingParameters object passed at connect time; the Node SDK passes them to client.streaming.transcriber().

params = StreamingParameters(sample_rate=16000, speech_model="u3-rt-pro")

const transcriber = client.streaming.transcriber({
  speechModel: 'u3-rt-pro',
  sampleRate: 16_000,
});

3. Register event handlers

Register handlers before connecting. This ensures no events are missed during the connection handshake. Both SDKs expose at least turn and error events; the Node SDK additionally exposes open and close.

def on_turn(client, event):
    if event.end_of_turn and event.transcript:
        print(event.transcript)

def on_error(client, error):
    print(f"AssemblyAI error: {error}")

client.on(StreamingEvents.Turn, on_turn)
client.on(StreamingEvents.Error, on_error)

transcriber.on('open', ({ id }) => {
  console.log('Session:', id);
});

transcriber.on('turn', (turn) => {
  if (turn.transcript) {
    console.log(turn.transcript);
  }
});

transcriber.on('error', (err) => {
  console.error('AssemblyAI error:', err);
});

transcriber.on('close', (code, reason) => {
  console.log('Connection closed:', code, reason);
});

4. Connect and open microphone

Note: AssemblyAI streaming v3 requires audio chunks between 50 ms and 1000 ms (error 3007 on violation). On Windows, decibri's audio backend may emit chunks shorter than 50 ms regardless of frames_per_buffer / framesPerBuffer. The streaming code block below batches incoming audio into 100 ms (3200 bytes of int16 mono 16 kHz) before forwarding to AssemblyAI, and drops any trailing remainder under 50 ms (1600 bytes).

Connect to AssemblyAI, then prepare the microphone. In Python the connect call is synchronous and Microphone is only constructed here; capture starts in the next block when it is opened with a with statement. In Node the connect call returns a promise that must be awaited, and new Microphone(...) begins capture as soon as it is constructed. Audio must only be sent after the connection is established.

client.connect(params)
mic = decibri.Microphone(sample_rate=16000, channels=1)

await transcriber.connect();

const mic = new Microphone({ sampleRate: 16000, channels: 1 });

5. Stream audio

Read chunks from decibri and forward them to AssemblyAI. AssemblyAI streaming v3 requires audio in 50 ms to 1000 ms windows; decibri on Windows may emit chunks shorter than 50 ms, so batch incoming audio into 100 ms (3200 bytes of int16 mono at 16 kHz) before forwarding. Drop any trailing remainder under 50 ms (1600 bytes). No format conversion is needed; decibri produces raw int16 PCM which AssemblyAI accepts as-is.

BATCH_BYTES = 3200   # 100 ms of int16 mono at 16 kHz
MIN_BYTES = 1600     # 50 ms; AssemblyAI streaming v3 minimum

def audio_iter():
    buffer = bytearray()
    for chunk in mic:
        buffer.extend(chunk)
        while len(buffer) >= BATCH_BYTES:
            yield bytes(buffer[:BATCH_BYTES])
            del buffer[:BATCH_BYTES]
    if len(buffer) >= MIN_BYTES:
        yield bytes(buffer)

const BATCH_BYTES = 3200;  // 100 ms of int16 mono at 16 kHz
const MIN_BYTES = 1600;    // 50 ms; AssemblyAI streaming v3 minimum

let buffer = Buffer.alloc(0);
mic.on('data', (chunk) => {
  buffer = Buffer.concat([buffer, chunk]);
  while (buffer.length >= BATCH_BYTES) {
    transcriber.sendAudio(buffer.subarray(0, BATCH_BYTES));
    buffer = buffer.subarray(BATCH_BYTES);
  }
});

mic.on('end', () => {
  if (buffer.length >= MIN_BYTES) {
    transcriber.sendAudio(buffer);
  }
});

6. Understanding turn-based results

AssemblyAI groups speech into turns, which are natural segments of speech separated by pauses. Each turn emits multiple events as audio is processed:

The end_of_turn flag is false while the result is partial and still being refined.
The end_of_turn flag is true when the turn is complete with a final transcript.
The turn_order index increments with each new turn (starting from 0).
The transcript field carries the recognised text. It grows as more words are finalised during the turn and contains the complete final text when end_of_turn is true.

The event object exposes the same fields in both SDKs; only the callback parameter name differs (event in Python, turn in Node).

To show only final results, filter on end_of_turn:

def on_turn(client, event):
    if event.end_of_turn and event.transcript:
        print(event.transcript)

transcriber.on('turn', (turn) => {
  if (turn.end_of_turn && turn.transcript) {
    console.log(turn.transcript);
  }
});

7. Clean shutdown

Stop the microphone and close the AssemblyAI connection when the user presses Ctrl+C.

print("Listening... (Ctrl+C to stop)")
try:
    with mic:
        client.stream(audio_iter())
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    client.disconnect(terminate=True)

process.on('SIGINT', async () => {
  console.log('\nStopping...');
  mic.stop();
  await transcriber.close();
  process.exit(0);
});

Full example

View complete code

import os
from dotenv import load_dotenv

import decibri
from assemblyai.streaming.v3 import (
    StreamingClient,
    StreamingClientOptions,
    StreamingEvents,
    StreamingParameters,
)

BATCH_BYTES = 3200   # 100 ms of int16 mono at 16 kHz
MIN_BYTES = 1600     # 50 ms; AssemblyAI streaming v3 minimum

load_dotenv()

client = StreamingClient(
    StreamingClientOptions(api_key=os.environ["ASSEMBLYAI_API_KEY"])
)

def on_turn(client, event):
    if event.end_of_turn and event.transcript:
        print(event.transcript)

def on_error(client, error):
    print(f"AssemblyAI error: {error}")

client.on(StreamingEvents.Turn, on_turn)
client.on(StreamingEvents.Error, on_error)

client.connect(StreamingParameters(sample_rate=16000, speech_model="u3-rt-pro"))

mic = decibri.Microphone(sample_rate=16000, channels=1)

def audio_iter():
    buffer = bytearray()
    for chunk in mic:
        buffer.extend(chunk)
        while len(buffer) >= BATCH_BYTES:
            yield bytes(buffer[:BATCH_BYTES])
            del buffer[:BATCH_BYTES]
    if len(buffer) >= MIN_BYTES:
        yield bytes(buffer)

print("Listening... (Ctrl+C to stop)")
try:
    with mic:
        client.stream(audio_iter())
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    client.disconnect(terminate=True)

'use strict';
require('dotenv').config();

const { Microphone } = require('decibri');
const { AssemblyAI } = require('assemblyai');

const BATCH_BYTES = 3200;  // 100 ms of int16 mono at 16 kHz
const MIN_BYTES = 1600;    // 50 ms; AssemblyAI streaming v3 minimum

const run = async () => {
  const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

  const transcriber = client.streaming.transcriber({
    speechModel: 'u3-rt-pro',
    sampleRate: 16_000,
  });

  transcriber.on('open', ({ id }) => {
    console.log('AssemblyAI connected. Session:', id);
  });

  transcriber.on('turn', (turn) => {
    if (turn.end_of_turn && turn.transcript) {
      console.log(turn.transcript);
    }
  });

  transcriber.on('error', (err) => {
    console.error('AssemblyAI error:', err);
  });

  transcriber.on('close', (code, reason) => {
    console.log('Connection closed:', code, reason);
  });

  await transcriber.connect();

  const mic = new Microphone({ sampleRate: 16000, channels: 1 });

  let buffer = Buffer.alloc(0);
  mic.on('data', (chunk) => {
    buffer = Buffer.concat([buffer, chunk]);
    while (buffer.length >= BATCH_BYTES) {
      transcriber.sendAudio(buffer.subarray(0, BATCH_BYTES));
      buffer = buffer.subarray(BATCH_BYTES);
    }
  });

  mic.on('error', (err) => {
    console.error('Mic error:', err.message);
  });

  process.on('SIGINT', async () => {
    console.log('\nStopping...');
    mic.stop();
    if (buffer.length >= MIN_BYTES) {
      transcriber.sendAudio(buffer);
    }
    await transcriber.close();
    process.exit(0);
  });

  console.log('Listening... (Ctrl+C to stop)\n');
};

run().catch(console.error);

Configuration options

The transcriber options control how AssemblyAI processes your audio. Names differ slightly between SDKs (Python uses snake_case, Node uses camelCase) but the meaning is identical. Here are the key ones:

Option	Value	Description
`speech_model` / `speechModel`	`'u3-rt-pro'`	Required. Universal-3 Pro Streaming model.
`sample_rate` / `sampleRate`	`16000`	Must match decibri's sample rate.

Additional options such as keyterm prompting and speaker diarization are available. See the AssemblyAI streaming documentation for the complete list.