Stream Audio¶

Neon +2.8.31 +1.7

Using the receive_audio_frame method, you can receive audio frames, which you can use it to play them live, record, or perform real-time analysis like speech-to-text or sound analysis.

The data returned is an instance of AudioFrame.

AudioFrame(
    av_frame=<av.AudioFrame pts=None, 1024 samples at 8000Hz, mono, fltp at 0x1189e0ac0>,
    timestamp_unix_seconds=1758211800.8221593,
    resampler=<av.audio.resampler.AudioResampler object at 0x1189e04c0>
)

AudioFrame

AudioFrame ¶

Bases: NamedTuple

An audio frame with timestamp information.

This class represents an audio frame from the audio stream with associated timestamp information. The Class inherits AudioFrame from py.av library.

Note

Audio in Neon is streamed as fltp mono 8K, this class takes the decoded packets as av.AudioFrames.

Methods:

to_ndarray –

Convert the audio frame to a NumPy array.
to_resampled_ndarray –

Convert the audio frame to a resampled s16 NumPy array

Attributes:

av_frame (AudioFrame) –

The audio frame.
datetime (datetime) –

Get timestamp as a datetime object.
resampler (AudioResampler) –

A reference to a shared AudioResampler instance.
timestamp_unix_ns (int) –

Get timestamp in nanoseconds since Unix epoch.
timestamp_unix_seconds (float) –

Timestamp in seconds since Unix epoch.

av_frame `instance-attribute` ¶

av_frame: AudioFrame

The audio frame.

datetime `property` ¶

datetime: datetime

Get timestamp as a datetime object.

resampler `instance-attribute` ¶

resampler: AudioResampler

A reference to a shared AudioResampler instance.

timestamp_unix_ns `property` ¶

timestamp_unix_ns: int

Get timestamp in nanoseconds since Unix epoch.

timestamp_unix_seconds `instance-attribute` ¶

timestamp_unix_seconds: float

Timestamp in seconds since Unix epoch.

to_ndarray ¶

to_ndarray(*args: Any, **kwargs: Any) -> NDArray

Convert the audio frame to a NumPy array.

Source code in src/pupil_labs/realtime_api/streaming/audio.py

def to_ndarray(self, *args: Any, **kwargs: Any) -> npt.NDArray:
    """Convert the audio frame to a NumPy array."""
    return self.av_frame.to_ndarray(*args, **kwargs)

to_resampled_ndarray ¶

to_resampled_ndarray(*args: Any, **kwargs: Any) -> Iterator[NDArray]

Convert the audio frame to a resampled s16 NumPy array

Source code in src/pupil_labs/realtime_api/streaming/audio.py

def to_resampled_ndarray(self, *args: Any, **kwargs: Any) -> Iterator[npt.NDArray]:
    """Convert the audio frame to a resampled s16 NumPy array"""
    for frame in self.resampler.resample(self.av_frame):
        yield frame.to_ndarray(*args, **kwargs)

By default, the audio signal is streamed in mono using the AAC codec. The stream is downsampled from the original 48 kHz source to a sampling rate of 8 kHz to save bandwidth, and uses a 32-bit floating-point planar (fltp) format.

The audio stream does not have it's own RTSP stream but is multiplexed with video, so in this client, we create a virtual sensor component using the Scene Camera stream.

Working with Audio Data¶

You can easily receive audio frames and convert them to NumPy arrays using the to_ndarray method and feed these to any audio library of your choice like librosa for analysis.

Check the whole example code here

stream_audio.py
from pupil_labs.realtime_api.simple import discover_one_device

# Look for devices. Returns as soon as it has found the first device.
print("Looking for the next best device...")
device = discover_one_device(max_search_duration_seconds=10)
if device is None:
    print("No device found.")
    raise SystemExit(-1)
print(f"Found device: {device}")
device.streaming_start("audio")  # optional, if not called, stream is started on-demand
try:
    while True:
        audio_frame = device.receive_audio_frame(timeout_seconds=5)
        if audio_frame:
            print(audio_frame)
except KeyboardInterrupt:
    pass
finally:
    print("Stopping...")
    # device.streaming_stop()  # optional, if not called, stream is stopped on close
    device.close()  # explicitly stop auto-update

Playing Audio¶

Audio Playback in realtime can be tricky, on the examples we propose SoundDevice. This library digest NumPy arrays and allows to play them back quickly, with the only caveat that it does not accept 32 bit planar audio format, thus, we have to resample it.

For commodity, we included a PyAv AudioResampler object to the AudioFrame class, it lazy loads, and calling to_resampled_ndarray will convert convert the av.AudioFrame to a NumPy array in signed 16-bit integer format. We also include an AudioPlayer class. It handles audio buffering and playback in a background thread, using a circular buffer to guarantee smooth playback without glitches or silence, you can find it in the audio_player.py file.

You can find a simple example below that streams audio and plays it back using the AudioPlayer class.

AudioPlayer

AudioPlayer ¶

AudioPlayer(samplerate: int, channels: int, dtype: str = 'int16')

Bases: Thread

A threaded, low-latency audio player using a shared RingBuffer.

Methods:

add_data –

Directly write data to the shared RingBuffer.
close –

Signal the thread to stop and clean up resources.
get_buffer_size –

Get the current number of samples in the buffer for debugging.
run –

Run the main entrypoint for the thread.