Skip to content

video

pupil_labs.video package.

A high-level wrapper of PyAV providing an easy to use interface to video data.

Modules:

  • frame
  • indexing
  • reader
  • writer

Classes:

AudioFrame dataclass

AudioFrame(av_frame: AudioFrame, time: float, index: int, source: Any)

Bases: BaseFrame

Methods:

  • to_ndarray

    Convert the audio samples of the AudioFrame to a numpy array.

Attributes:

  • av_frame (AudioFrame) –

    the original av.AudioFrame for this frame

  • index (int) –

    index of frame

  • source (Any) –

    source of this frame, eg. reader or filename

  • time (float) –

    timestamp of frame

av_frame instance-attribute

av_frame: AudioFrame

the original av.AudioFrame for this frame

index instance-attribute

index: int

index of frame

source instance-attribute

source: Any

source of this frame, eg. reader or filename

time instance-attribute

time: float

timestamp of frame

to_ndarray

to_ndarray() -> NDArray[float64]

Convert the audio samples of the AudioFrame to a numpy array.

Source code in src/pupil_labs/video/frame.py
68
69
70
def to_ndarray(self) -> npt.NDArray[np.float64]:
    """Convert the audio samples of the AudioFrame to a numpy array."""
    return cast(npt.NDArray[np.float64], self.av_frame.to_ndarray())

Reader

Reader(source: Path | str, stream: Literal['video'] = 'video', container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None, logger: Logger | None = None)
Reader(source: Path | str, stream: Literal['audio'] = 'audio', container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None, logger: Logger | None = None)
Reader(source: Path | str, stream: Literal['audio', 'video'] | tuple[Literal['audio', 'video'], int] = 'video', container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None, logger: Optional[Logger] = None)

Bases: Generic[ReaderFrameType]

Parameters:

  • source (Path | str) –

    Path to a video file. Can be a local path or an http-address.

  • stream (Literal['audio', 'video'] | tuple[Literal['audio', 'video'], int], default: 'video' ) –

    The stream to read from, either "audio", "video". If the video file contains multiple streams of the deisred kind, a tuple can be provided to specify which stream to use, e.g. ("audio", 2) to use the audio stream at index 2.

  • container_timestamps (Optional[ContainerTimestamps | list[float]] | None, default: None ) –

    Array containing the timestamps of the video frames in container time (equal to PTS * time_base). If not provided, timestamps will be inferred from the container. Providing pre-loaded values can speed up initialization for long videos by avoiding demuxing of the entire video to obtain PTS.

  • logger (Optional[Logger], default: None ) –

    Python logger to use. Decreases performance.

Attributes:

  • audio (Reader[AudioFrame] | None) –

    Returns an Reader providing access to the audio data of the video only.

  • average_rate (float) –

    Return the average framerate of the video in Hz.

  • by_container_timestamps (Indexer[ReaderFrameType]) –

    Time-based access to video frames using container timestamps.

  • container_timestamps (ContainerTimestamps) –

    Frame timestamps in container time.

  • duration (float) –

    Return the duration of the video in seconds.

  • filename (str) –

    Return the filename of the video

  • gop_size (int) –

    Return the amount of frames per keyframe in a video

  • height (int | None) –

    Height of the video in pixels.

  • pts (list[int]) –

    Return all presentation timestamps in video.time_base

  • rate (Fraction | int | None) –

    Return the framerate of the video in Hz.

  • source (Any) –

    Return the source of the video

  • video (Reader[VideoFrame] | None) –

    Returns an Reader providing access to the video data of the video only.

  • width (int | None) –

    Width of the video in pixels.

Source code in src/pupil_labs/video/reader.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
def __init__(
    self,
    source: Path | str,
    stream: Literal["audio", "video"]
    | tuple[Literal["audio", "video"], int] = "video",
    container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None,
    logger: Optional[Logger] = None,
):
    """Create a reader for a video file.

    Args:
        source: Path to a video file. Can be a local path or an http-address.
        stream: The stream to read from, either "audio", "video". If the video file
            contains multiple streams of the deisred kind, a tuple can be provided
            to specify which stream to use, e.g. `("audio", 2)` to use the audio
            stream at index `2`.
        container_timestamps: Array containing the timestamps of the video frames in
            container time (equal to PTS * time_base). If not provided, timestamps
            will be inferred from the container. Providing pre-loaded values can
            speed up initialization for long videos by avoiding demuxing of the
            entire video to obtain PTS.
        logger: Python logger to use. Decreases performance.

    """
    self._container_timestamps: ContainerTimestamps | None = None
    if container_timestamps is not None:
        if isinstance(container_timestamps, list):
            container_timestamps = np.array(container_timestamps)
        self.container_timestamps = container_timestamps

    self.lazy_frame_slice_limit = LAZY_FRAME_SLICE_LIMIT
    self._times_were_provided = container_timestamps is not None
    self._source = source
    self._logger = logger or DEFAULT_LOGGER
    self.stats = Stats()

    if not isinstance(stream, tuple):
        stream = (stream, 0)
    self._stream_kind, self._stream_index = stream

    self._log = bool(logger)
    self._is_at_start = True
    self._last_processed_dts = -maxsize
    self._partial_pts = list[int]()
    self._partial_dts = list[int]()
    self._partial_pts_to_index = dict[int, int]()
    self._all_pts_are_loaded = False
    self._decoder_frame_buffer = deque[AVFrame]()
    self._current_decoder_index: int | None = -1
    self._indexed_frames_buffer: deque[ReaderFrameType] = deque(maxlen=1000)
    # TODO(dan): can we avoid it?
    # this forces loading the gopsize on initialization to set the buffer length
    assert self.gop_size

audio cached property

audio: Reader[AudioFrame] | None

Returns an Reader providing access to the audio data of the video only.

average_rate property

average_rate: float

Return the average framerate of the video in Hz.

by_container_timestamps cached property

by_container_timestamps: Indexer[ReaderFrameType]

Time-based access to video frames using container timestamps.

Container time is measured in seconds relative to begining of the video. Accordingly, the first frame typically has timestamp 0.0.

When accessing a specific key, e.g. reader[t], a frame with this exact timestamp needs to exist, otherwise an IndexError is raised. When acessing a slice, e.g. reader[a:b] an ArrayLike is returned such that a <= frame.time < b for every frame.

Large slices are returned as a lazy view, which avoids immediately loading all frames into RAM.

Note that numerical imprecisions of float numbers can lead to issues when accessing individual frames by their container timestamp. It is recommended to prefer indexing frames via slices.

container_timestamps deletable property writable

container_timestamps: ContainerTimestamps

Frame timestamps in container time.

Container time is measured in seconds relative to begining of the video. Accordingly, the first frame typically has timestamp 0.0.

If these values were not provided when creating the Reader, they will be inferred from the video container.

duration property

duration: float

Return the duration of the video in seconds.

If the duration is not available in the container, it will be calculated based on the frames timestamps.

filename property

filename: str

Return the filename of the video

gop_size cached property

gop_size: int

Return the amount of frames per keyframe in a video

height property

height: int | None

Height of the video in pixels.

pts cached property

pts: list[int]

Return all presentation timestamps in video.time_base

rate property

rate: Fraction | int | None

Return the framerate of the video in Hz.

source property

source: Any

Return the source of the video

video cached property

video: Reader[VideoFrame] | None

Returns an Reader providing access to the video data of the video only.

width property

width: int | None

Width of the video in pixels.

VideoFrame dataclass

VideoFrame(av_frame: VideoFrame, time: float, index: int, source: Any)

Bases: BaseFrame

Methods:

  • to_ndarray

    Convert the image of the VideoFrame to a numpy array.

Attributes:

  • av_frame (VideoFrame) –

    the original av.VideoFrame for this frame

  • bgr (NDArray[uint8]) –

    Numpy image array in BGR format

  • gray (NDArray[uint8]) –

    Numpy image array in gray format

  • index (int) –

    index of frame

  • rgb (NDArray[uint8]) –

    Numpy image array in RGB format

  • source (Any) –

    source of this frame, eg. reader or filename

  • time (float) –

    timestamp of frame

av_frame instance-attribute

av_frame: VideoFrame

the original av.VideoFrame for this frame

bgr property

bgr: NDArray[uint8]

Numpy image array in BGR format

gray property

gray: NDArray[uint8]

Numpy image array in gray format

index instance-attribute

index: int

index of frame

rgb property

rgb: NDArray[uint8]

Numpy image array in RGB format

source instance-attribute

source: Any

source of this frame, eg. reader or filename

time instance-attribute

time: float

timestamp of frame

to_ndarray

to_ndarray(pixel_format: PixelFormat) -> NDArray[uint8]

Convert the image of the VideoFrame to a numpy array.

Source code in src/pupil_labs/video/frame.py
104
105
106
107
def to_ndarray(self, pixel_format: PixelFormat) -> npt.NDArray[np.uint8]:
    """Convert the image of the VideoFrame to a numpy array."""
    # TODO: add caching for decoded frames?
    return av_frame_to_ndarray_fast(self.av_frame, pixel_format)

Writer

Writer(path: str | Path, lossless: bool = False, fps: int | None = None, bit_rate: int = 2000000, logger: Logger | None = None)

Parameters:

  • path (str | Path) –

    The path to write the video to.

  • lossless (bool, default: False ) –

    If True, the video will be encoded in lossless H264.

  • fps (int | None, default: None ) –

    The desired framerate of the video.

  • bit_rate (int, default: 2000000 ) –

    The desired bit rate of the video.

  • logger (Logger | None, default: None ) –

    Python logger to use. Decreases performance.

Methods:

Source code in src/pupil_labs/video/writer.py
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def __init__(
    self,
    path: str | Path,
    lossless: bool = False,
    fps: int | None = None,
    bit_rate: int = 2_000_000,
    logger: Logger | None = None,
) -> None:
    """Video writer for creating videos from image arrays.

    Args:
        path: The path to write the video to.
        lossless: If True, the video will be encoded in lossless H264.
        fps: The desired framerate of the video.
        bit_rate: The desired bit rate of the video.
        logger: Python logger to use. Decreases performance.

    """
    self.path = path
    self.lossless = lossless
    self.fps = fps
    self.bit_rate = bit_rate
    self.logger = logger or DEFAULT_LOGGER
    self.container = av.open(self.path, "w")

write_image

write_image(image: NDArray[uint8], time: Optional[float] = None, pix_fmt: Optional[PixelFormat] = None) -> None

Write an image to the video.

Parameters:

  • image (NDArray[uint8]) –

    The image to write. Can have 1 or 3 channels.

  • time (Optional[float], default: None ) –

    The time of the frame in seconds.

  • pix_fmt (Optional[PixelFormat], default: None ) –

    The pixel format of the image. If None, the pixel format will be gray for 1-channel images and bgr24 for 3-channel images.

Source code in src/pupil_labs/video/writer.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
def write_image(
    self,
    image: npt.NDArray[np.uint8],
    time: Optional[float] = None,
    pix_fmt: Optional[PixelFormat] = None,
) -> None:
    """Write an image to the video.

    Args:
        image: The image to write. Can have 1 or 3 channels.
        time: The time of the frame in seconds.
        pix_fmt: The pixel format of the image. If None, the pixel format will be
            `gray` for 1-channel images and `bgr24` for 3-channel images.

    """
    if pix_fmt is None:
        pix_fmt = "bgr24"
        if image.ndim == 2:
            pix_fmt = "gray"

    frame = av.VideoFrame.from_ndarray(image, str(pix_fmt))
    self.write_frame(frame, time=time)