Female patients at least 21 years old 768 patients observation rows 10 columns -> 9 featues columns + 1 diabetes col (True/False)
closer the data is to what you are predicting, the better
data needs to be formatted the way we need it
- learning type
- result
- complexity
- basic vs enhanced
"Use the ML Workflow to process and transform Pima Indian Data to Create a prediction model. This Model must predict which people are likely to develop diabetes woth 70% or greater accuracy"
Prediction Model => Supervised ML (binary output)
we will stick to basic algos.
- Nave Bayes (We use this)
- Logistic Regession
- Decision Tree
split the data : 70% Training + 30% Testing training the model with Algo
after training the data and makeing the predictions for the test data we can improve the predition of Ture positives
- adjust current algo
- get more data or improve data
- improve data
- switch algorithm and check which algo suits best