Skip to content

PartitionedDataset has inconsistent lazy behavior #4037

Closed
@BielStela

Description

Hi! While doing some experiments I found out this thingy which more than a bug is a nuance. I get that maybe is not so big deal to add a check to fix this in user code but just to make the note on this her you go the bug report 😃🌈

Description

PartitionedDataset returns a callable depending on the incoming dataset is read from disk. If it comes from an in memory dataset it has the materialized parts already thus not being lazy and not a callable.
This makes that the nodes have to take care of checking if the parts are callable or not.

Context

Sometimes to speed things up I remove intermediate disk writes but this changes the behavior of the PartitionedDataset

Steps to Reproduce

load a partitioned dataset from disk or load it from memory

Expected Result

both should be the same and don't leak implementation into the node taking care of the dataset parts

Actual Result

normally an error like

object is not callable

or the other way around if I'm not expecting a function

'function' object has no attribute 'x'

Your Environment

  • kedro, version 0.19.6
  • Python 3.10.12
  • Linux 6.5.0-44-generic~22.04.1-Ubuntu

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions