Hey guys! My name is R M Srinivas. In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and linear model (Lasso).After which I have selected the best perorming model and performed Hyper parameter tuning and then deployed an interactive application which can generate the visualization and send an email with the visualization to the users email address.
Extraction(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/scraper%20and%20extraction.py): - In the current file I used Beautiful soup to extract data from the most visited site to study/analyse/invest into Mutual funds. Extracted 20 columns from the website with 1064 mutual funds. I tried extract in a way such that there should not be much data cleaning afterwards. After which I saved the file as raw_data.xlsx.
Transform(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Tranformation.py): - As the data did not have much steps to clean I have cleaned the raw data that was taken in the above step and removed few columns that had more than 30% missing values(np.nan). Changed the column with the funds AUM in cr to float. Saved the file as cleaned_data.xlsx
Load(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/data_storage.py): - In this file I loaded/uploaded the data to Heroku server for storage of the data and I used Postgresql to send and save the data. Evertime the current file runs it takes the updated data drops the existing column if it exists and then add the updated table/data to the server.
Go through the following links for individual ipynb files.
5 year retutns models testing - >
3 year retutns models testing - >
1 year retutns models testing - >
Let's first understand the relation of our target variable(returns over the perior of 1,3 and 5 years) with the remaining variables. Let's first understand some basic definitions. AUM or Assets Under Management is the total funds that a mutual fund scheme holds.
What does NAV mean? The performance of a particular scheme of a Mutual Fund is denoted by Net Asset Value (NAV). In simple words, NAV is the market value of the securities held by the scheme. Mutual Funds invest the money collected from investors in securities markets.
Risk of the fund Mutual Fund Schemes are not guaranteed or assured return products. Investment in Mutual Fund Units involves investment risks such as trading volumes, settlement risk, liquidity risk, default risk including the possible loss of principal all of this is considered and rated accordingly.
Minium Investment Its the minimum amount limit for investing in a mutual fund.
Type of the fund. There are different funds based on there diversification in the investments they are classified. Equity fund, Debt fund , hybrid fund, Solution based funds, etc...
Here are few basic information regarding the columns using describe function. We can see that there outliers in few columns lets go ahead and investigate those columns and treat them.
Tried removing the values greater than 0.85 with mean, median and normalized each column and compared the results which I have documented as a in bottom section of the table.
Here is an table which shows us testing scores of various models on the 5 Year returns target variable.
Here is an table which shows us testing scores of various models on the 3 Year returns target variable.
Here is an table which shows us testing scores of various models on the 1 Year returns target variable.
In of the above images for 1,3,5 retunrns model testing, the best model according to the scored obtained is the random forst regressor, and performed Hyper parameter tuning individually for the best results. After pickled the models for running it in the Deployment phase of the project.
Here is the final graphs Individually after hyper parameter optimization and feature importance graph.
Front end application(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Deployment.py)
Using Streamlit Created the following application
The above application has a sidebar that can be accessed for moving through the 5 different pages. Deinition page has the basic information about the various fund related information. After which there are series of 3 pages which can predict the returns based on inputs provided. In the back end after opening each page the respective models saved in pickle format is opened and the user inputs are normalized and converted for getting the prediction. The last page will have all the visualization and analysis with description. Created a requirements.txt for future deployment of the project onto a AWS or Heroku Cloud.
Let me know if you have any suggestions. You can contact me on this email - [email protected]