This package is a Python wrapper for the DPMMSubClustersStreaming.jl Julia package.
(for our paper Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data, AISTATS 2022.).
- Install Julia from: https://julialang.org/downloads/platform
- Add our DPMMSubClusterStreaming package from within a Julia terminal via Julia package manager:
] add DPMMSubClustersStreaming
- Add our dpmmpythonStreaming package in python: pip install dpmmpythonStreaming
- Add Environment Variables:
- Add to the "PATH" environment variable the path to the Julia executable (e.g., in .bashrc add: export PATH =$PATH:$HOME/julia/julia-1.6.0/bin).
- Add to the "PATH" environment variable the path to the Julia executable (e.g., C:\Users<USER>\AppData\Local\Programs\Julia\Julia-1.6.0\bin).
- Install PyJulia from within a Python terminal:
import julia;julia.install();
from julia.api import Julia
jl = Julia(compiled_modules=False)
from dpmmpythonStreaming.dpmmwrapper import DPMMPython
from dpmmpythonStreaming.priors import niw
import numpy as np
data,gt = DPMMPython.generate_gaussian_data(10000, 2, 10, 100.0)
batch1 = data[:,0:5000]
batch2 = data[:,5000:]
prior = DPMMPython.create_prior(2, 0, 1, 1, 1)
model= DPMMPython.fit_init(batch1,100.0,prior = prior,verbose = True, burnout = 5, gt = None, epsilon = 0.0000001)
labels = DPMMPython.get_labels(model)
model = DPMMPython.fit_partial(model,1, 2, batch2)
labels = DPMMPython.get_labels(model)
print(labels)
For any questions: [email protected]
Contributions, feature requests, suggestion etc.. are welcomed.
If you use this code for your work, please cite the following:
@inproceedings{dinari2022streaming,
title={Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data},
author={Dinari, Or and Freifeld, Oren},
booktitle={International Conference on Artificial Intelligence and Statistics},
year={2022}
}