Chatbot using Support Vector Machine and Cosine Similarity
These are the things that you need for this project
Django == 2.1.2
django-cors-header
requests == 2.9.1
numpy == 1.15.4
pandas == 0.23.4
scikit-learn == 0.20.1
scipy == 1.1.0
Dataset fetched from Informatika UNPAD and UNPAD and saved as QA.csv
If you want to changed it, please select your own dataset, move to folder code/data/csv
and change this code:
dataset = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data", "csv", "changetoyourfilename.csv")
Because this is web based, using UI code from Fabio Ottaviani with some changes to fit the program.
Support Vector machine using sklearn.
Cosine Similarity using sklearn.
The flow of this program is:
- Observing website for the dataset
- Preprocessing dataset
- Tokenization
- Stop Words Removal
- Stemming
- Grid Search parameter to find optimal variable of C
- Train SVM using Stratified 10-Fold
- Get the user's message and preprocess it
- Predict user message's label
- Calculate the Cosine Similarity score for each dataset that has the same label as the user's message label
- Get data that have the highest score
- Reply with data that has been obtained from Cosine Similarity process
- If score is below 0.5 then default message is given
- Nurul Ilma Asfiya N - Initial work - nurulilmaan
- Indonesian Preprocessing - PySastrawi
- Banking FAQ Bot - MrJay10
- FAQ Chatbot - donowhy
- Direct Messaging UI - Fabio Ottaviani