An implementation of a variational autoencoder that runs on both SMILES and SELFIES. Included is code that compares the SMILES and SELFIES representations for a VAE using reconstruction quality, diversity, and latent space validity as metrics of interest.
Dependencies are pytorch
, rdkit
, and pyyaml
, which can be installed
using Conda.
chemistry_vae.py
: the main file; contains the model definitions, the data processing, and the training.settings.yml
: a file containing the hyperparameters of the model and the training. Also configures the VAE to run on either SMILES or SELFIES.data_loader.py
: contains helper methods that convert SMILES and SELFIES into integer-encoded or one-hot encoded vectors.
- Python 3.7
CPU and GPU supported
For comments, bug reports or feature ideas, please send an email to [email protected] and [email protected]