FSDD

The sound dataset was gathered from this git repository -> Free Spoken Digit Dataset (FSDD): https://github.com/Jakobovski/free-spoken-digit-dataset

MNIST

The analysis.py is using MNIST (Modified National Institute of Standards and Tehchnology) as a dataset for pre-analysis purpose; dataset of handwritten digits.

Youtube reference

Youtube tutorials on music generation: https://youtube.com/playlist?list=PL-wATfeyAMNpEyENTc-tVH5tfLGKtSWPp&si=53DtJN6I_OKJFAr-

Steps to follow in this project

step 0 - Understand vanilla autoencoder which consists of both an encoder and a decoder.

build an encoder
build a decoder
combine and make the autoencoder
train the autoencoder
test the autoencoder with mnist dataset
plot the testing results

step 1 - Implement Variational Autoencoder (VAE)

modify encoder component (modify the bottleneck -> z = u + sum(epsilon))
modify loss function: RMSE + KL (Kullback-Leibler Divergence (closed form))
train vae

step 2 - Preprocessing Audio Datasets

use Free Spoken Digit Dataset (FSDD) (an audio preprocessing library)
implement Loader and Padder for file processing
implement LogSpectrogramExtractor to preprocess audio files as spectrograms
implement MinMaxNormaliser
implement the Preprocessing Pipeline
implement Saver

step 3 - Training a VAE with speech data in Keras

load Free Sound Digits Dataset (FSDD)
reshape the data
train the VAE

step 4 - Sound Generation with VAE

build a SoundGenerator class
Implement a generate.py script
generate Sound from Spectrograms

Run this Sound Generative Model

parameters and weights of the trained model in step 3 is saved in the model folder
use the FSDD as the data for sound generation
download and use step 4 folder as this is the final step and contains the final version of the files for running the model

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
0. autoencoders (encoder + decoder)		0. autoencoders (encoder + decoder)
1. implement VAE		1. implement VAE
2.Processing Audio Datasets		2.Processing Audio Datasets
3. Training a VAE with speech data in Keras		3. Training a VAE with speech data in Keras
4. Generating Sound Digits with a VAE		4. Generating Sound Digits with a VAE
model		model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSDD

MNIST

Youtube reference

Steps to follow in this project

Run this Sound Generative Model

About

Releases

Packages

Languages

dchen376/ML---Sound-Generation-with-CNNs-and-VAE

Folders and files

Latest commit

History

Repository files navigation

FSDD

MNIST

Youtube reference

Steps to follow in this project

Run this Sound Generative Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages