This is our repository for the final submission of Aries Project. The project is based upon the application of neural networks and some other architectures The complete project is based upon 3 models which are as follows:
This is based on CNN architecture and is used to find out the whether the person in front of the camera is confident enough to make an eye contact with the camera. It annotates the iris to locate the eyes on screen. The collection of the data was done by a general, image capturing algorithm which uses the camera of your laptop. It uses the following libraries: tensorflow, keras, cv2, numpy, uuid, matplotlib.pyplot, The complete data set was generated by us and hence, we can trust on the accuracy of the model. Initially, only 30 images were manually labelled. The rest were augmented and were automatically labelled by the algorithm. After augmentation, the dataset grew from 30 to 5042 images. The model uses ResNet (Residual Networks) architecture to reduce the chances of Vanishing/Exploding gradient. These occur when the Neural Network that is used has a huge number of layers.
This model uses libralries like: wave and Pyaudio to handle, record and play the files in .wav format. This model further uses Google's Speech to Text Model to convert the wave files to text format. The final part of this model uses enchant library which accesses the American English diectionary to verify if the spelled words are correct.
An overall speaker efficiency is calculated based on the duration of speech, duration of eye contact of the speaker and the number of misspelled words.