In this project, we aim to implement some baseline active learning strategies in R, experiment on a famous dataset Iris, highlight some insights and suggest future directions for active learning in the Statistics domain.
Active_Learning_in_R
.
├── Active_Learning_in_R_Rcode.Rmd (Source code)
├── iris.data (data)
├── Active_Learning_in_R_Rcode_Simulation.pdf (Source code simulation using RMarkdown)
├── Active_Learning_in_R.pdf (A seemingly formal report)
├── Active_Learning_in_R_PPTslides.pptx (A short talk and user guide)
├── Active_Learning_in_R_Presentation.mp4 (A short talk and user guide)
├── README.md
I also published this project in Medium if you are interested to look at here
We implement two basic strategies "Uncertainty Sampling" and "Random Sampling".
Select the point with least confidence. One criteria is the point nearest to the current decision boundary.
Query the point randomly from the unlabeled pool.
The model we currently use is logistic regression, which is a classifier for binary classification problems.