video ¶

pupil_labs.video package.

A high-level wrapper of PyAV providing an easy to use interface to video data.

Modules:

frame –
indexing –
reader –
writer –

Classes:

AudioFrame –
Reader –
VideoFrame –
Writer –

AudioFrame `dataclass` ¶

AudioFrame(av_frame: AudioFrame, time: float, index: int, source: Any)

Bases: BaseFrame

Methods:

to_ndarray –

Convert the audio samples of the AudioFrame to a numpy array.

Attributes:

av_frame (AudioFrame) –

the original av.AudioFrame for this frame
index (int) –

index of frame
source (Any) –

source of this frame, eg. reader or filename
time (float) –

timestamp of frame

av_frame `instance-attribute` ¶

av_frame: AudioFrame

the original av.AudioFrame for this frame

index `instance-attribute` ¶

index: int

index of frame

source `instance-attribute` ¶

source: Any

source of this frame, eg. reader or filename

time `instance-attribute` ¶

time: float

timestamp of frame

to_ndarray ¶

to_ndarray() -> NDArray[float64]

Convert the audio samples of the AudioFrame to a numpy array.

Source code in src/pupil_labs/video/frame.py

def to_ndarray(self) -> npt.NDArray[np.float64]:
    """Convert the audio samples of the AudioFrame to a numpy array."""
    return cast(npt.NDArray[np.float64], self.av_frame.to_ndarray())

Reader ¶

Reader(source: Path | str, stream: Literal['video'] = 'video', container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None, logger: Logger | None = None)

Reader(source: Path | str, stream: Literal['audio'] = 'audio', container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None, logger: Logger | None = None)

Reader(source: Path | str, stream: Literal['audio', 'video'] | tuple[Literal['audio', 'video'], int] = 'video', container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None, logger: Optional[Logger] = None)

Bases: Generic[ReaderFrameType]

Parameters:

source (Path | str) –

Path to a video file. Can be a local path or an http-address.
stream (Literal['audio', 'video'] | tuple[Literal['audio', 'video'], int], default: 'video' ) –

The stream to read from, either "audio", "video". If the video file contains multiple streams of the deisred kind, a tuple can be provided to specify which stream to use, e.g. ("audio", 2) to use the audio stream at index 2.
container_timestamps (Optional[ContainerTimestamps | list[float]] | None, default: None ) –

Array containing the timestamps of the video frames in container time (equal to PTS * time_base). If not provided, timestamps will be inferred from the container. Providing pre-loaded values can speed up initialization for long videos by avoiding demuxing of the entire video to obtain PTS.
logger (Optional[Logger], default: None ) –

Python logger to use. Decreases performance.

Attributes:

audio (Reader[AudioFrame] | None) –

Returns an Reader providing access to the audio data of the video only.
average_rate (float) –

Return the average framerate of the video in Hz.
by_container_timestamps (Indexer[ReaderFrameType]) –

Time-based access to video frames using container timestamps.
container_timestamps (ContainerTimestamps) –

Frame timestamps in container time.
duration (float) –

Return the duration of the video in seconds.
filename (str) –

Return the filename of the video
gop_size (int) –

Return the amount of frames per keyframe in a video
height (int | None) –

Height of the video in pixels.
pts (list[int]) –

Return all presentation timestamps in video.time_base
rate (Fraction | int | None) –

Return the framerate of the video in Hz.
source (Any) –

Return the source of the video
video (Reader[VideoFrame] | None) –

Returns an Reader providing access to the video data of the video only.
width (int | None) –

Width of the video in pixels.

Source code in src/pupil_labs/video/reader.py

def __init__(
    self,
    source: Path | str,
    stream: Literal["audio", "video"]
    | tuple[Literal["audio", "video"], int] = "video",
    container_timestamps: Optional[ContainerTimestamps | list[float]] | None = None,
    logger: Optional[Logger] = None,
):
    """Create a reader for a video file.

    Args:
        source: Path to a video file. Can be a local path or an http-address.
        stream: The stream to read from, either "audio", "video". If the video file
            contains multiple streams of the deisred kind, a tuple can be provided
            to specify which stream to use, e.g. `("audio", 2)` to use the audio
            stream at index `2`.
        container_timestamps: Array containing the timestamps of the video frames in
            container time (equal to PTS * time_base). If not provided, timestamps
            will be inferred from the container. Providing pre-loaded values can
            speed up initialization for long videos by avoiding demuxing of the
            entire video to obtain PTS.
        logger: Python logger to use. Decreases performance.

    """
    self._container_timestamps: ContainerTimestamps | None = None
    if container_timestamps is not None:
        if isinstance(container_timestamps, list):
            container_timestamps = np.array(container_timestamps)
        self.container_timestamps = container_timestamps

    self.lazy_frame_slice_limit = LAZY_FRAME_SLICE_LIMIT
    self._times_were_provided = container_timestamps is not None
    self._source = source
    self._logger = logger or DEFAULT_LOGGER
    self.stats = Stats()

    if not isinstance(stream, tuple):
        stream = (stream, 0)
    self._stream_kind, self._stream_index = stream

    self._log = bool(logger)
    self._is_at_start = True
    self._last_processed_dts = -maxsize
    self._partial_pts = list[int]()
    self._partial_dts = list[int]()
    self._partial_pts_to_index = dict[int, int]()
    self._all_pts_are_loaded = False
    self._decoder_frame_buffer = deque[AVFrame]()
    self._current_decoder_index: int | None = -1
    self._indexed_frames_buffer: deque[ReaderFrameType] = deque(maxlen=1000)
    # TODO(dan): can we avoid it?
    # this forces loading the gopsize on initialization to set the buffer length
    assert self.gop_size

audio `cached` `property` ¶

audio: Reader[AudioFrame] | None

Returns an Reader providing access to the audio data of the video only.

average_rate `property` ¶

average_rate: float

Return the average framerate of the video in Hz.

by_container_timestamps `cached` `property` ¶

by_container_timestamps: Indexer[ReaderFrameType]

Time-based access to video frames using container timestamps.

Container time is measured in seconds relative to begining of the video. Accordingly, the first frame typically has timestamp 0.0.

When accessing a specific key, e.g. reader[t], a frame with this exact timestamp needs to exist, otherwise an IndexError is raised. When acessing a slice, e.g. reader[a:b] an ArrayLike is returned such that a <= frame.time < b for every frame.

Large slices are returned as a lazy view, which avoids immediately loading all frames into RAM.

Note that numerical imprecisions of float numbers can lead to issues when accessing individual frames by their container timestamp. It is recommended to prefer indexing frames via slices.

container_timestamps `deletable` `property` `writable` ¶

container_timestamps: ContainerTimestamps

Frame timestamps in container time.

Container time is measured in seconds relative to begining of the video. Accordingly, the first frame typically has timestamp 0.0.

If these values were not provided when creating the Reader, they will be inferred from the video container.

duration `property` ¶

duration: float

Return the duration of the video in seconds.

If the duration is not available in the container, it will be calculated based on the frames timestamps.

filename `property` ¶

filename: str

Return the filename of the video

gop_size `cached` `property` ¶

gop_size: int

Return the amount of frames per keyframe in a video

height `property` ¶

height: int | None

Height of the video in pixels.

pts `cached` `property` ¶

pts: list[int]

Return all presentation timestamps in video.time_base

rate `property` ¶

rate: Fraction | int | None

Return the framerate of the video in Hz.

source `property` ¶

source: Any

Return the source of the video

video `cached` `property` ¶

video: Reader[VideoFrame] | None

Returns an Reader providing access to the video data of the video only.

width `property` ¶

width: int | None

Width of the video in pixels.

VideoFrame `dataclass` ¶

VideoFrame(av_frame: VideoFrame, time: float, index: int, source: Any)

Bases: BaseFrame

Methods:

to_ndarray –

Convert the image of the VideoFrame to a numpy array.

Attributes:

av_frame (VideoFrame) –

the original av.VideoFrame for this frame
bgr (NDArray[uint8]) –

Numpy image array in BGR format
gray (NDArray[uint8]) –

Numpy image array in gray format
index (int) –

index of frame
rgb (NDArray[uint8]) –

Numpy image array in RGB format
source (Any) –

source of this frame, eg. reader or filename
time (float) –

timestamp of frame

av_frame `instance-attribute` ¶

av_frame: VideoFrame

the original av.VideoFrame for this frame

bgr `property` ¶

bgr: NDArray[uint8]

Numpy image array in BGR format

gray `property` ¶

gray: NDArray[uint8]

Numpy image array in gray format

index `instance-attribute` ¶

index: int

index of frame

rgb `property` ¶

rgb: NDArray[uint8]

Numpy image array in RGB format

source `instance-attribute` ¶

source: Any

source of this frame, eg. reader or filename

time `instance-attribute` ¶

time: float

timestamp of frame

to_ndarray ¶

to_ndarray(pixel_format: PixelFormat) -> NDArray[uint8]

Convert the image of the VideoFrame to a numpy array.

Source code in src/pupil_labs/video/frame.py

def to_ndarray(self, pixel_format: PixelFormat) -> npt.NDArray[np.uint8]:
    """Convert the image of the VideoFrame to a numpy array."""
    # TODO: add caching for decoded frames?
    return av_frame_to_ndarray_fast(self.av_frame, pixel_format)

Writer ¶

Writer(path: str | Path, lossless: bool = False, fps: int | None = None, bit_rate: int = 2000000, logger: Logger | None = None)

Parameters:

path (str | Path) –

The path to write the video to.
lossless (bool, default: False ) –

If True, the video will be encoded in lossless H264.
fps (int | None, default: None ) –

The desired framerate of the video.
bit_rate (int, default: 2000000 ) –

The desired bit rate of the video.
logger (Logger | None, default: None ) –

Python logger to use. Decreases performance.

Methods:

write_image –

Write an image to the video.

Source code in src/pupil_labs/video/writer.py

def __init__(
    self,
    path: str | Path,
    lossless: bool = False,
    fps: int | None = None,
    bit_rate: int = 2_000_000,
    logger: Logger | None = None,
) -> None:
    """Video writer for creating videos from image arrays.

    Args:
        path: The path to write the video to.
        lossless: If True, the video will be encoded in lossless H264.
        fps: The desired framerate of the video.
        bit_rate: The desired bit rate of the video.
        logger: Python logger to use. Decreases performance.

    """
    self.path = path
    self.lossless = lossless
    self.fps = fps
    self.bit_rate = bit_rate
    self.logger = logger or DEFAULT_LOGGER
    self.container = av.open(self.path, "w")

write_image ¶

write_image(image: NDArray[uint8], time: Optional[float] = None, pix_fmt: Optional[PixelFormat] = None) -> None

Write an image to the video.

Parameters:

image (NDArray[uint8]) –

The image to write. Can have 1 or 3 channels.
time (Optional[float], default: None ) –

The time of the frame in seconds.
pix_fmt (Optional[PixelFormat], default: None ) –

The pixel format of the image. If None, the pixel format will be gray for 1-channel images and bgr24 for 3-channel images.

Source code in src/pupil_labs/video/writer.py

def write_image(
    self,
    image: npt.NDArray[np.uint8],
    time: Optional[float] = None,
    pix_fmt: Optional[PixelFormat] = None,
) -> None:
    """Write an image to the video.

    Args:
        image: The image to write. Can have 1 or 3 channels.
        time: The time of the frame in seconds.
        pix_fmt: The pixel format of the image. If None, the pixel format will be
            `gray` for 1-channel images and `bgr24` for 3-channel images.

    """
    if pix_fmt is None:
        pix_fmt = "bgr24"
        if image.ndim == 2:
            pix_fmt = "gray"

    frame = av.VideoFrame.from_ndarray(image, str(pix_fmt))
    self.write_frame(frame, time=time)

video ¶

AudioFrame dataclass ¶

av_frame instance-attribute ¶

index instance-attribute ¶

source instance-attribute ¶

time instance-attribute ¶

to_ndarray ¶

Reader ¶

audio cached property ¶

average_rate property ¶

by_container_timestamps cached property ¶

container_timestamps deletable property writable ¶

duration property ¶

filename property ¶

gop_size cached property ¶

height property ¶

pts cached property ¶

rate property ¶

source property ¶

video cached property ¶

width property ¶

VideoFrame dataclass ¶

av_frame instance-attribute ¶

bgr property ¶

gray property ¶

index instance-attribute ¶

rgb property ¶

source instance-attribute ¶

time instance-attribute ¶

to_ndarray ¶

Writer ¶

write_image ¶

AudioFrame `dataclass` ¶

av_frame `instance-attribute` ¶

index `instance-attribute` ¶

source `instance-attribute` ¶

time `instance-attribute` ¶

audio `cached` `property` ¶

average_rate `property` ¶

by_container_timestamps `cached` `property` ¶

container_timestamps `deletable` `property` `writable` ¶

duration `property` ¶

filename `property` ¶

gop_size `cached` `property` ¶

height `property` ¶

pts `cached` `property` ¶

rate `property` ¶

source `property` ¶

video `cached` `property` ¶

width `property` ¶

VideoFrame `dataclass` ¶

av_frame `instance-attribute` ¶

bgr `property` ¶

gray `property` ¶

index `instance-attribute` ¶

rgb `property` ¶

source `instance-attribute` ¶

time `instance-attribute` ¶