Skip to content

Stream Audio

Neon +2.8.31 +1.7

Using the receive_audio_frame method, you can receive audio frames, which you can use it to play them live, record, or perform real-time analysis like speech-to-text or sound analysis.

The data returned is an instance of AudioFrame.

AudioFrame(
    av_frame=<av.AudioFrame pts=None, 1024 samples at 8000Hz, mono, fltp at 0x1189e0ac0>,
    timestamp_unix_seconds=1758211800.8221593,
    resampler=<av.audio.resampler.AudioResampler object at 0x1189e04c0>
)
AudioFrame

AudioFrame

Bases: NamedTuple

An audio frame with timestamp information.

This class represents an audio frame from the audio stream with associated timestamp information. The Class inherits AudioFrame from py.av library.

Note

Audio in Neon is streamed as fltp mono 8K, this class takes the decoded packets as av.AudioFrames.

Methods:

Attributes:

av_frame instance-attribute

av_frame: AudioFrame

The audio frame.

datetime property

datetime: datetime

Get timestamp as a datetime object.

resampler instance-attribute

resampler: AudioResampler

A reference to a shared AudioResampler instance.

timestamp_unix_ns property

timestamp_unix_ns: int

Get timestamp in nanoseconds since Unix epoch.

timestamp_unix_seconds instance-attribute

timestamp_unix_seconds: float

Timestamp in seconds since Unix epoch.

to_ndarray

to_ndarray(*args: Any, **kwargs: Any) -> NDArray

Convert the audio frame to a NumPy array.

Source code in src/pupil_labs/realtime_api/streaming/audio.py
58
59
60
def to_ndarray(self, *args: Any, **kwargs: Any) -> npt.NDArray:
    """Convert the audio frame to a NumPy array."""
    return self.av_frame.to_ndarray(*args, **kwargs)

to_resampled_ndarray

to_resampled_ndarray(*args: Any, **kwargs: Any) -> Iterator[NDArray]

Convert the audio frame to a resampled s16 NumPy array

Source code in src/pupil_labs/realtime_api/streaming/audio.py
62
63
64
65
def to_resampled_ndarray(self, *args: Any, **kwargs: Any) -> Iterator[npt.NDArray]:
    """Convert the audio frame to a resampled s16 NumPy array"""
    for frame in self.resampler.resample(self.av_frame):
        yield frame.to_ndarray(*args, **kwargs)

By default, the audio signal is streamed in mono using the AAC codec. The stream is downsampled from the original 48 kHz source to a sampling rate of 8 kHz to save bandwidth, and uses a 32-bit floating-point planar (fltp) format.

The audio stream does not have it's own RTSP stream but is multiplexed with video, so in this client, we create a virtual sensor component using the Scene Camera stream.

Working with Audio Data

You can easily receive audio frames and convert them to NumPy arrays using the to_ndarray method and feed these to any audio library of your choice like librosa for analysis.

Check the whole example code here
stream_audio.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from pupil_labs.realtime_api.simple import discover_one_device

# Look for devices. Returns as soon as it has found the first device.
print("Looking for the next best device...")
device = discover_one_device(max_search_duration_seconds=10)
if device is None:
    print("No device found.")
    raise SystemExit(-1)
print(f"Found device: {device}")
device.streaming_start("audio")  # optional, if not called, stream is started on-demand
try:
    while True:
        audio_frame = device.receive_audio_frame(timeout_seconds=5)
        if audio_frame:
            print(audio_frame)
except KeyboardInterrupt:
    pass
finally:
    print("Stopping...")
    # device.streaming_stop()  # optional, if not called, stream is stopped on close
    device.close()  # explicitly stop auto-update

Playing Audio

Audio Playback in realtime can be tricky, on the examples we propose SoundDevice. This library digest NumPy arrays and allows to play them back quickly, with the only caveat that it does not accept 32 bit planar audio format, thus, we have to resample it.

For commodity, we included a PyAv AudioResampler object to the AudioFrame class, it lazy loads, and calling to_resampled_ndarray will convert convert the av.AudioFrame to a NumPy array in signed 16-bit integer format. We also include an AudioPlayer class. It handles audio buffering and playback in a background thread, using a circular buffer to guarantee smooth playback without glitches or silence, you can find it in the audio_player.py file.

You can find a simple example below that streams audio and plays it back using the AudioPlayer class.

AudioPlayer

AudioPlayer

AudioPlayer(samplerate: int, channels: int, dtype: str = 'int16')

Bases: Thread

A threaded, low-latency audio player using a shared RingBuffer.

Methods:

  • add_data

    Directly write data to the shared RingBuffer.

  • close

    Signal the thread to stop and clean up resources.

  • get_buffer_size

    Get the current number of samples in the buffer for debugging.

  • run

    Run the main entrypoint for the thread.

Source code in src/pupil_labs/realtime_api/audio_player.py
112
113
114
115
116
117
118
119
120
121
122
123
124
def __init__(self, samplerate: int, channels: int, dtype: str = "int16"):
    super().__init__(daemon=True)
    self.samplerate = samplerate
    self.channels = channels
    self.dtype = dtype

    self._stop_event = threading.Event()
    self._buffer = RingBuffer(
        capacity=1024,
        dtype=np.int16,
        channels=channels,
    )
    self.stream: sd.OutputStream | None = None

add_data

add_data(data: NDArray[int16]) -> None

Directly write data to the shared RingBuffer.

Source code in src/pupil_labs/realtime_api/audio_player.py
155
156
157
def add_data(self, data: npt.NDArray[np.int16]) -> None:
    """Directly write data to the shared RingBuffer."""
    self._buffer.write(data)

close

close() -> None

Signal the thread to stop and clean up resources.

Source code in src/pupil_labs/realtime_api/audio_player.py
163
164
165
166
167
168
def close(self) -> None:
    """Signal the thread to stop and clean up resources."""
    logging.debug("Closing audio player...")
    self._stop_event.set()
    self.join()  # Wait for the thread to finish
    logging.info("Audio player closed.")

get_buffer_size

get_buffer_size() -> int

Get the current number of samples in the buffer for debugging.

Source code in src/pupil_labs/realtime_api/audio_player.py
159
160
161
def get_buffer_size(self) -> int:
    """Get the current number of samples in the buffer for debugging."""
    return self._buffer.size

run

run() -> None

Run the main entrypoint for the thread.

Source code in src/pupil_labs/realtime_api/audio_player.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
def run(self) -> None:
    """Run the main entrypoint for the thread."""
    try:
        self.stream = sd.OutputStream(
            samplerate=self.samplerate,
            channels=self.channels,
            dtype=self.dtype,
            callback=self._callback,
            blocksize=0,  # Let the device choose the optimal size for low latency
            latency="low",
        )
        with self.stream:
            logging.debug("Audio stream started.")
            self._stop_event.wait()  # Wait until the close() method is called
    except Exception:
        logging.exception("Error in audio thread.")
    finally:
        logging.debug("Audio stream closed.")
Check the whole example code here
stream_audio_and_play.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
from pupil_labs.realtime_api.audio_player import AudioPlayer
from pupil_labs.realtime_api.simple import discover_one_device


def main():
    # Look for devices. Returns as soon as it has found the first device.
    print("Looking for the next best device...")
    device = discover_one_device(max_search_duration_seconds=10)
    if device is None:
        print("No device found.")
        raise SystemExit(-1)
    print(f"Found device: {device}")
    device.streaming_start(
        "audio"
    )  # optional, if not called, stream is started on-demand
    player: AudioPlayer | None = None
    try:
        while True:
            audio_frame = device.receive_audio_frame(timeout_seconds=5)
            if audio_frame:
                if not player:
                    # Initialize the player with the correct parameters from first frame
                    player = AudioPlayer(
                        samplerate=audio_frame.av_frame.sample_rate,
                        channels=audio_frame.av_frame.layout.nb_channels,
                        dtype="int16",
                    )
                    # Start the player process
                    player.start()
                    print(
                        f"Audio stream parameters: "
                        f"Sample Rate: {audio_frame.av_frame.sample_rate}, "
                        f"Channels: {audio_frame.av_frame.layout.nb_channels}, "
                        f"Layout: {audio_frame.av_frame.layout.name}"
                    )
                    print("Started audio playback.")
                else:
                    player.add_data(next(audio_frame.to_resampled_ndarray()).T)

    except KeyboardInterrupt:
        pass
    finally:
        print("Stopping...")
        # device.streaming_stop()  # optional, if not called, stream is stopped on close
        device.close()  # explicitly stop auto-update
        if player:
            player.close()


if __name__ == "__main__":
    main()

Note

Now, you can also use a different audio library like PyAudio or pygame to play back the audio data, but you might need to install portaudio, and the latter is more suited for game development.

Playing Video and Audio

Here you can find an example that shows how to play both video with gaze overlay and audio using OpenCV and the AudioPlayer class.

Check the whole example code here
stream_video_gaze_and_audio.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import cv2
import numpy as np

# Workaround for https://github.com/opencv/opencv/issues/21952
cv2.imshow("cv/av bug", np.zeros(1))
cv2.destroyAllWindows()

from pupil_labs.realtime_api.audio_player import AudioPlayer  # noqa: E402
from pupil_labs.realtime_api.simple import discover_one_device  # noqa: E402

# NOTE: Audio playback is done in a separate thread with SoundDevice and a circular
# buffer to avoid blocking the main thread and ensure smooth playback.
# An AudioPlayer class is provided in the realtime_api package for this purpose.


def main():
    # Look for devices. Returns as soon as it has found the first device.
    print("Looking for the next best device...")
    device = discover_one_device(max_search_duration_seconds=10)
    if device is None:
        print("No device found.")
        raise SystemExit(-1)

    print(f"Connecting to {device}...")
    player = AudioPlayer(samplerate=8000, channels=1, dtype="int16")
    try:
        player.start()
        while True:
            matched = device.receive_matched_scene_video_frame_and_audio(
                timeout_seconds=5
            )
            if matched is None:
                continue

            frame, audio, gaze = matched

            # We add all audio frames to the player's queue
            for audio_frame in audio:
                player.add_data(next(audio_frame.to_resampled_ndarray()).T)

            buffer_fill_ms = 1000 * player.get_buffer_size() / player.samplerate

            # We display the number of audio frames received with the video frame
            time_diff_ms = [
                frame.timestamp_unix_seconds - af.timestamp_unix_seconds for af in audio
            ]
            if audio:
                cv2.putText(
                    frame.bgr_pixels,
                    (
                        f"Audio frames: {len(audio)} / "
                        f"Buffer: {buffer_fill_ms:.0f} ms / "
                        f"Mean diff. audio-scene: {np.mean(time_diff_ms) * 1000:.0f} ms"
                    ),
                    (10, 30),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    1,
                    (0, 255, 0),
                    2,
                )
            else:
                cv2.putText(
                    frame.bgr_pixels,
                    f"No audio / Buffer: {buffer_fill_ms:.0f} ms",
                    (10, 30),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    1,
                    (0, 0, 255),
                    2,
                )
            if gaze:
                cv2.circle(
                    frame.bgr_pixels,
                    (int(gaze.x), int(gaze.y)),
                    radius=80,
                    color=(0, 0, 255),
                    thickness=15,
                )
            cv2.imshow("Scene camera and audio", frame.bgr_pixels)
            if cv2.waitKey(1) & 0xFF == 27:
                break
    except KeyboardInterrupt:
        pass
    finally:
        print("Stopping...")
        player.close()
        device.close()  # explicitly stop auto-update


if __name__ == "__main__":
    main()

Bonus

On the Async API examples you can also find how to use the audio and plot the spectrum using librosa library.