Skip to content

How to correctly create uri_file data assets ?  #70

Open
@Gabriel2409

Description

@Gabriel2409

I am trying to save a data asset as a uri_file and the dataset is incorrectly saved as a uri_folder when I launch kedro azureml run

I have the following catalog:

projects_train_raw_local:
  type: pandas.CSVDataSet
  filepath: data/01_raw/dataset.csv

projects_train_raw:
    type: kedro_azureml.datasets.AzureMLAssetDataSet
    azureml_dataset: projects_train_raw
    root_dir: data/00_azurelocals/ 
    versioned: True
    azureml_type: uri_file
    dataset:
        type: pandas.CSVDataSet
        filepath: "dataset.csv"

and the following pipeline which just opens the local file and saves it

def create_pipeline(**kwargs) -> Pipeline:
    return Pipeline(
        nodes=[
            node(
                func=lambda x: x,
                inputs="projects_train_raw_local",
                outputs="projects_train_raw",
                name="create_train_dataasset",
            )
        ]
    )

I expected a new data asset to be created on azure as an uri_file. However, i get the following info on azure
image

image

It seems my file is not saved correctly, which seems to correspond to this part in cli.py if I am not mistaken

    # 2. Save dummy outputs
    # In distributed computing, it will only happen on nodes with rank 0
    if not pipeline_data_passing and is_distributed_master_node():
        for data_path in azure_outputs.values():
            (Path(data_path) / "output.txt").write_text("#getindata")
    else:
        logger.info("Skipping saving Azure outputs on non-master distributed nodes")

How can I correctly create a uri_file data asset ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions