This Portfolio is a compilation of all the Data Science and Machine Learning projects I have done for academic, self-learning and hobby purposes.
- Email: [email protected]
NLP - Quora Question Pair Similarity using NBC, LSTM, and BERT
In this project I have used basic and complex Machine Learning Algorithms, both pre-trained and self-trained. Conducted analysis of model performance with respect to computation resource and time elapsed, thereby observing efficiency.
NLP - TripAdvisor Topics and Reviews Sentiment Analysis using Latent Dirichlet Allocation
The objective of this project is to analyze the distribution of topics and the sentiments expressed in TripAdvisor reviews. With LDA we get a glimpse of each keywords frequency and the visualization with pyLDAvis provides unique insight into the sentiment around the specific keyword.
Interpret American Sign language using Deep Learning with Parallel Computing
In this project we conducted experiments of training and observing the efficiency of a couple of Deep Learning Models to predict American Sign language efficiently. Experiments provided the data of training time over different number of GPU's with respect to model training epochs, data distribution and parallel types of parallelization used.
Store Sales Data Analysis for Corporacion Favorita
This repository contains a serverless architecture for data analysis that is robust and can be scaled to incoroprate a multitude of users by using concurrency in lambda functions. This project uses the AWS architecture, mainly - Amazon S3, Amazon ECR and AWS Lambda functions. Quick access to the frontend web application built using Streamlit
SEVIR - Weather NowCasting Pipeline
In this project, we have deployed the code for the paper SEVIR: A Storm Event Imagery Dataset for Deep Learning Applications of Nowcasting using images provided via Radar and Satellite Meteorology. Used pretrained model, migrated data to GCP from AWS, designed and implemented the nowcasting pipeline. Nowcasting is a prediction task of generating weather forecasts like radar echoes, precipitation, cloud coverage using meteorological knowledge. The models here take input of 13 VIL images, each sampled at every 5 minutes, train and generate the next 12 images in the sequence for the following hour
NLP - Text Generation using Corpus of choice
The notebook provides full code to load data, build, train and generate text from the LSTM [Long Shert term Memory] a RNN model. It can be easily adapted to any text corpus. The resulting model can generate new realistic text after training on a large dataset.
NLP - Classify 20 News Group using Naive Bayes, LSTM, and BERT
This Jupyter notebook classifies various news into one of the 20 news groups using basic to complex Machine Learning algorithms. The notebook provides insights into model training, accuracy and time elapsed, thereby providing us with efficiency of each model.
FinTech - Stock Financial Analysis and Prediction using Machine Learning
In this project I applied various financial analysis techniques using Python for fundamental decision making. Further I explored trading startegies based on the obersvations made after fitting data to existing moving averages or training a Machine Learning model to fit the historical and predict prices or generate signals.
Rain Prediction in Australia using Machine Learning
We will try to answer the question of whether or not it will rain tomorrow in Australia. We implemented kNN, Decision tree, Random Forest with Python and Scikit-Learn. We have used the Rain in Australia dataset for this project.
MAVEN - COVID Virus Evolution Simulator
This project is built to demonstrate Virus Evolution Simulation of variants of a positive-sense single-stranded RNA virus ( SARS-CoV-2 ).
- PROGRAMMING LANGUAGES: PYTHON, SQL, JAVA, C++, R, JAVASCRIPT, MATLAB, SAS
- DATABASE: ER STUDIO, SNOWFLAKE, BIGQUERY, RDBMS, POSTGRESQL, SPARK, NOSQL, DATAFORM
- ETL & BI SKILLS: DATA INTEGRATION, MODELING & WAREHOUSING PIPELINES, LOOKER, TABLEAU, POWER BI
- TOOLS: GIT CI/CD, JIRA, DOCKER, FAST-API, AWS, GCP, AZURE, HPC – SLURM, DATAFLOW, ALTERYX, TALEND
- ML FRAMEWORKS: NUMPY, PANDAS, SCI-KIT LEARN, SCI-PY, MATPLOTLIB, SEABORN, PLOTLY, KERAS, PYTORCH, TENSORFLOW, STATISTICAL, CLASSIFICATION & REGRESSION ANALYSIS & MODELING, NLP – LLM, BERT, LDA, NLTK