Welcome to kedro_datasets
, the home of Kedro's data connectors. Here you will find AbstractDataset
implementations powering Kedro's DataCatalog created by QuantumBlack and external contributors.
kedro-datasets
is a Python plugin. To install it:
pip install kedro-datasets
Datasets are organised into groups e.g. pandas
, spark
and pickle
. Each group has a collection of datasets, e.g.pandas.CSVDataset
, pandas.ParquetDataset
and more. You can install dependencies for an entire group of dependencies as follows:
pip install "kedro-datasets[<group>]"
This installs Kedro-Datasets and dependencies related to the dataset group. An example of this could be a workflow that depends on the data types in pandas
. Run pip install 'kedro-datasets[pandas]'
to install Kedro-Datasets and the dependencies for the datasets in the pandas
group.
To limit installation to dependencies specific to a dataset:
pip install "kedro-datasets[<group>-<dataset>]"
For example, your workflow might require the pandas.ExcelDataset
, so to install its dependencies, run pip install "kedro-datasets[pandas-exceldataset]"
.
From `kedro-datasets` version 3.0.0 onwards, the names of the optional dataset-level dependencies have been normalised to follow [PEP 685](https://peps.python.org/pep-0685/). The '.' character has been replaced with a '-' character and the names are in lowercase. For example, if you had `kedro-datasets[pandas.ExcelDataset]` in your requirements file, it would have to be changed to `kedro-datasets[pandas-exceldataset]`.
We support a range of data connectors, including CSV, Excel, Parquet, Feather, HDF5, JSON, Pickle, SQL Tables, SQL Queries, Spark DataFrames and more. We even allow support for working with images.
These data connectors are supported with the APIs of pandas
, spark
, networkx
, matplotlib
, yaml
and more.
The Data Catalog allows you to work with a range of file formats on local file systems, network file systems, cloud object stores, and Hadoop.
Here is a full list of supported data connectors and APIs.
Take a look at our instructions on how to create your own AbstractDataset
implementation.
Yes! Want to help build Kedro-Datasets? Check out our guide to contributing.
Kedro-Datasets is licensed under the Apache 2.0 License.
- The Kedro-Datasets package follows the NEP 29 Python version support policy.