Skip to content

GigaSpeech on HuggingFace #117

Open
Open
@dophist

Description

@dophist

GigaSpeech dataset is now available on HuggingFace Hub.

Highlights of GigaSpeech on HuggingFace

  • easy to use (a two-liner in python)
  • Smoother and faster downloading from US & EU, even support on-the-fly downloading during training
  • preprocessed:
    • decompressed
    • short audio files(.wav) are segmented and extracted from raw long audio
    • supervisions are extracted from raw metadata.json
  • subsets can be downloaded separately (e.g. XS/S/M/L/XL for training, DEV/TEST for benchmarking)
  • users can even listen to audio samples via HuggingFace's dataset viewer

How-to

Useful links

Credits

Many thanks to The Dataset Team & Speech Team at HuggingFace, particularly @polinaeterna , @patrickvonplaten , @sanchit-gandhi , GigaSpeech just becomes more accessible to the entire speech community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions