Description
Description
VideoDataset
using a GeneratorVideo
does not work
Context
I'm truing to create a video using GeneratorVideo
to see if I can free up some memory. I already tried successfully with SequentialVideo
(which works quite well btw) and refactored to use a generator that yields frames and a GeneratorVideo.
Steps to Reproduce
# catalog.yml
test_video:
type: video.VideoDataset
filepath: data/03_primary/test.mp4
# nodes.py
from collections.abc import Generator
from PIL import Image
from kedro_datasets.video.video_dataset import GeneratorVideo
def make_video() -> GeneratorVideo:
"""Makes a video with three frames: one red, one green and one blue at 1 fps"""
def frames() -> Generator[Image.Image, None, None]:
w, h = 256, 256
red_frame = Image.new("RGB", (w, h), (255, 0, 0))
green_frame = Image.new("RGB", (w, h), (0, 255, 0))
blue_frame = Image.new("RGB", (w, h), (0, 0, 255))
frames = [red_frame, green_frame, blue_frame]
yield from frames
return GeneratorVideo(frames(), length=None, fps=1)
# pipeline.py
from kedro.pipeline import Pipeline, pipeline, node
from .nodes import make_video
def create_pipeline(**kwargs) -> Pipeline:
return pipeline([node(make_video, inputs=None, outputs="test_video")])
Expected Result
A colorful video similar to this one ( in the preview does not work, hope it does when published)
test.mp4
Actual Result
This error!
kedro.io.core.DatasetError: Failed while saving data to dataset VideoDataset(filepath=<removed>, protocol=file).
'Image' object has no attribute 'fps'
If one changes the node
to use a SequenceVideo
like so:
def make_video() -> SequenceVideo:
"""Makes a video with three frames
one red, one green and one blue at 1 fps"""
def frames() -> list:
w, h = 256, 256
red_frame = Image.new("RGB", (w, h), (254, 0, 0))
green_frame = Image.new("RGB", (w, h), (0, 254, 0))
blue_frame = Image.new("RGB", (w, h), (0, 0, 254))
frames = [red_frame, green_frame, blue_frame, blue_frame]
return frames
return SequenceVideo(frames(), fps=1)
It works well.
Now here it comes my debugging report:
One can see that there's a moment when running the pipeline, when the program is at
kedro.runner._run_node_sequential:528
, the code does
items = zip(it.cycle(keys), interleave(*streams))
where streams
is a list containing my GeneratorVideo
which gets iterated in the chaining. The problem is that the stream itself is an Iterator that gets crystallized into an iterator of Image.Image
in the operation and iterated over while calling catalog.save(name, data)
. Then VideoDataset
takes the control and fails instantly because the input is no longer a GeneratorVideo
nor a SequenceVideo
, it is now an Image
From here I have no more clue about how this can be fixed tho :_)
Your Environment
- Kedro version used (
pip show kedro
orkedro -V
): 0.19.9 - Python version used (
python -V
): Python 3.11.9 - Operating system and version: Linux 6.8.0-48-generic 22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Metadata
Assignees
Type
Projects
Status
In Progress
Activity