Mini Proj Modified
Mini Proj Modified
Mini Proj Modified
BACHELOR OF TECHNOLOGY
In
Computer Science and Engineering
By
M.Bhavya Lakshmi Siridevi(19A81A05M5)
K.Chaitanya (19A81A05L2)
K.V.N.S.S.Krishna Sreekar(19A81A05K7)
Mrs.G.Prasanthi,M.Tech.,Assistant Professor
DepartmentofComputerScienceand Engineering(AccreditedbyN.B.A.)
SRI VASAVI ENGINEERING COLLEGE(Autonomous)
(Affiliated to JNTUK, Kakinada)
Pedatadepalli,Tadepalligudem-534101,A.P2021-22
SRI VASAVI ENGINEERING COLLEGE(Autonomous)
DepartmentOfComputerScienceandEngineering
Pedatadepalli,Tadepalligudem
Mrs.G.Prasanthi.M.Tech., Dr.DJayaKumariMTech.,Ph.D.
ExternalExaminer
DECLARATION
Project Associates
M.Bhavya Lakshmi Siridevi (19A81A05M5)
K.Chaitanya(19A81A05L2)
K.V.N.S.S.Krishna Sreekar(19A81A05K7)
ACKNOWLEDGEMENT
First and foremost, we sincerely salute to our esteemed institute “SRI VASAVI
ENGINEERING COLLEGE”, for giving us this golden opportunity to fulfill our warm
dream to become an engineer.
Our sincere gratitude to our project guide Mrs.G.Prasanthi, Department of Computer
Science and Engineering, for his timely cooperation and valuable suggestions while
carrying out this project.
We express our sincere thanks and heartful gratitude to Dr.D.Jaya Kumari, Professor &
Head of the Department of Computer Science and Engineering, for permitting us to do our
project.
We express our sincere thanks and heartful gratitude to Dr.G.V.N.S.R. Ratnakara Rao,
Principal, for providing a favourable environment and supporting us during the
development of this project.
Our special thanks to the management and all the teaching and non-teaching staff
members, Department of Computer Science and Engineering, for their support and
cooperation in various ways during our project work. It is our pleasure to acknowledge the
help of all those respected individuals.
We would like to express our gratitude to our parents, friends who helped to complete this
project.
Project Associates
M.Bhavya Lakshmi Siridevi (19A81A05M5)
K.Chaitanya(19A81A05L2)
K.V.N.S.S.KrishnaSreekar(19A81A05K7)
TABLEOFCONTENTS
1 INTRODUCTION 1
1.1 Introduction 1
1.2 Scope 1
1.3 Objective 1
2 LITERATURE SURVEY 2
3 SYSTEMSTUDYANDANALYSIS 3-6
4 SYSTEMDESIGN 7-11
i
4.3 UML Diagrams 9-11
5 TECHNOLOGIES 12-16
6 IMPLEMENTATION 17-21
7 TESTING 19-21
7.2.1 UnitTesting 22
8 OUTPUTS 25-26
9 CONCLUSIONANDFURTHERWORK 27
10 REFERENCES 28
i
ABSTRACT
Banks are making major part of profits through loans. Though lot of people are
applying for loans. It’s hard toselect the genuine applicant, who will repay the loan.
While doing the process manually, lotof misconception may happen to select the genuine
applicant. Therefore we are developing loan prection system using machine learning, so
the system automatically selects the eligible candidates.This is helpful to both bank staff
and applicant. The time period for the sanction of loan will be drastically reduced. In this
project we are predicting the loan data by using some machine learning algorithms that is
Decision Tree.
i
CHAPTER-1
INTRODUCTION
1.1 Introduction
A loan is the core business part of banks. The main portion the bank’s profit is directly
come from the profit earned from the loans. Though bank approves loan after a regress
process of verification and testimonial but still there's no surety whether the chosen
hopeful is the right hopeful or not . This process takes fresh time while doing it manually.
We can prophesy whether that particular hopeful is safe or not and the whole process of
testimonial is automated by machine literacy style. Loan Prognostic is really helpful for
retainer of banks as well as for the hopeful also.
1.2 Scope
1.3 Objective
The purpose of developing the Loan Prediction system is to computerize the traditional
way of predicting the loan manually. The main objective of this paper is to predict
1
CHAPTER – 2
LITERATURESURVEY
2
CHAPTER–3
SYSTEM STUDY AND ANALYSIS
Loan Prediction is a classification problem in which we need to classify whether the loan will
be approved or not. classification refers to a predictive modeling problem where a class label is
predicted for a given example of input data.Therefore, we came up with a project as Loan
Prediction,which will help us to predict the loan status correctly.
In the Existing System Bank employees check the details of applicant manually and give the
loan toeligible applicant. Checking the details of all applicants takes lot of time.
Time Consuming.
Less Efficient.
More manual work required.
There is chances of human error may occur due checking all details manually.
There is possibility of assigning loan to ineligible applicant.
To deal with the problem, we developed automatic loan prediction using machine learning
techniquesi.e. Decision tree technique. We will train the machine with previous dataset. so
machine can analyze andunderstand the process . Then machine will check for eligible
applicant and give us result.
3
3.4 Advantages of Proposed System
Time period for loan sanctioning will be reduced.
More accurarate.
Whole process will be automated, so human error will be avoided.
Eligible applicant will be sanctioned loan without any delay
3.7Non-FunctionalRequirements
In systems engineering and requirements engineering, a non-functional requirement is a
requirement that specifies criteria that can be used to judge the operation of a system, rather
than specific behaviours. Non-Functional requirements include quantitative constraints, such as
response time or accuracy
4
Accuracy and Precision
Thesystem should perform its process in accuracy and Precision to avoid problems.
Reliability
The system is more reliable because of the qualities that are inherited from the chosen
platform python.The code built by using python is more reliable.
Security:
The system should be secure and saving persons’s privacy
Performance
The performance characteristics of the system are outlines here:
Response time (average, maximum)
Throughput (frames processed per second)
Accuracy
Resource utilization (memory, disk, camera)
Maintainability
The maintenance group should be able to fix any problem occur suddenly.
5
3.8 System Requirements
3.8.1 Software requirements
HardDisk : 256 GB
Ram : 8GB
Processor : i3 or above
6
CHAPTER – 4
SYSTEM DESIGN
1.Data Sets:
The dataset collected for predicting loan defaultcustomers is predicted into Training
set and testing set. Generally 80:20 ratio is applied to split the training set and testing set. The
data model which was created using Decision tree is applied on the training set and based on
the test result accuracy, Test set prediction is done.
2.Data Preprocessing:
The collected data may contain missing values that may lead to inconsistency. To
gain better results data need to be preprocessed and so it'll better the effectiveness of the
algorithm. We should remove the outliers and we need to convert the variables. In order to
flooring these issues we use chart function.
3.Correlating attributes:
Based on the correlation among attributes it was observed more likely to pay back
their loans The attributes that are individual and significant can include Property area,
education, The loanamount, and lastly credit History, which is since by intuition it is
considered as important.The correlation among attributes can be identified using corplot and
7
4. Building the classification model using Decision tree algorithm:
For predicting the loan defaulter’s and non defaulter’s problem Decision tree algorithm is
used. It is effective because it provides better results in classification problem. It is extremely
intuitive, easy to implement and provide interpretable predictions. It produces out of bag
estimated error which was proven to be unbiased in many tests. It is relatively easy to tune
with. It gives highest accuracy result for the problem.
5. Data Prediction:
Prediction refers to the output of an algorithm after it has been trained on a
historical dataset anapplied to new data when forecasting thelikelihood of a
particular outcome
8
4.3 Internal architecture:
The usecase diagram is dynamic in nature, there should be some internal or external
factors for making the interactions.These internal and external agents are known as actors. Use
case diagrams consists of actors, use cases and their relationships. The diagram is used to
model the system/subsystem of an application.A single use case diagram captures a particular
functionality of a system. Hence to model the entire system, a number of use case diagrams are
used.
Use case diagrams are used to gather the requirements of a system including internal and
external influences. These requirements are mostly design requirements. Hence, when a system
is analyzed to gather its functionalities, use cases are prepared and actors are identified.
9
When the initial task is complete, use case diagrams are modelled to present the outside view.
The sequence diagram represents the flow of messages in the system and is
events, such that these lifelines took part at the run time. In UML, the life line is represented
10
by a vertical bar, whereas the message flow is represented by a vertical dotted line that
extends across the bottom of the page. It incorporates the iterations as well as branching.
11
Chapter-5
TECHNOLOGIES
Advantages of Python:
Extensive Libraries:Python downloads with an extensive library and it contain code for
various purposes like regular expressions, documentation-generation, unittesting,web browsers,
threading, databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.
Extensible: As we have seen earlier, Python can be extended to other languages.You can
write some of your code in languages like C++ or C. This comes in handy, especially in
projects.
12
Embeddable: Complimentary to extensibility, Python is embeddable as well. Youcan put
your Python code in your source code of a different language, like C++. This lets us add
scripting capabilities to our code in the other language.
Simple and Easy: When working with Java, you may have to create a class to print‘Hello
World’. But in Python, just a print statement will do. It is also quite easy to learn, understand,
and code. This is why when people pickup Python, they have a hard time adjusting to other
more verbose languages like Java.
Readable: Because it is not such a verbose language, reading Python is much like reading
English. This is the reason why it is so easy to learn, understand, and code. It also does not need
curly braces to define blocks, and indentation is mandatory. These further aids the readability of
the code.
Free and Open-Source: Like we said earlier, Python is freely available. But notonly can
you download python for free, but you can also download its source code, make changes to it,
and even distribute it. It downloads with an extensive collection of libraries to help you with
your tasks.
Portable: When you code your project in a language like C++, you may need to makesome
changes to it if you want to run it on another platform. But it isn’t the same withPython. Here,
you, need to code only once, and you can run it any where. This iscalled Write Once Run
Anywhere (WORA). However, you need to be careful enough not to include any system
dependent features
13
Interpreted:Lastly, we will say that it is an interpreted language. Since statements are
executed one by one, debugging is easier than in compiled languages.
Disadvantages of Python:
Speed Limitations: We have seen that Python code is executed line by line. But since
python is interpreted, it often results in slow execution. This, however, isn’t a
problem unless speed is a focal point for the project. In otherwords, unless high speed is
requirement, the benefits offered by Python are enough to distract us from its speed
limitations.
Design Restrictions: As you know, Python is dynamically-typed. This means that you
don’t need to declare the type of variable while writing the code. It uses ducktyping. But wait,
what’s that ? Well, it just means that if it looks like a duck, it must be a duck.While this is easy
on the programmers during coding, it can raise runtime errors.
14
1)NumPy:
• NumPy is a library for the Python programming language, adding support for large,multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical
functions to operate on these arrays.
• NumPy is a Python library used for working with arrays. It also has functions for working
in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005
by Travis Oliphant. It is an open source project and you can use it freely.It is the fundam-
ental package for scientific computing with Python.
2) Pandas:
• Pandas is a software library written for the Python programming language for datamanip-
ulation and analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series. It is free software released under the three-clause BCD
license.
• pandas is a software library written for the Python programming language for data ma-
nipulation and analysis. In particular, it offers data structures and operations for manipu-
ating numerical tables and time series.pandas is a fast, powerful, flexible and easy to use
use open source data analysis and manipulation tool, built on top of the Python
programming language.
3)Matplotlib:
• Matplotlib is a plotting library for the Python programming language and its numerical
K+.
work with the broader SciPy stack. ... Matplotlib consists of several plots like line, bar,
15
4) Seaborn:
Seaborn is a library for making statistical graphics in Python. It is built on top of matplotlib
and closely integrated with Pandas data structures. Here is some of the functionality that
Seaborn offers:
• A dataset-oriented API for examining relationships between multiple variables
• Specialized support for using categorical variables to show observations or aggregate
statistics
• Seaborn is an amazing Python visualization library built on top of matplotlib that provides
a high-level interface for drawing attractive and informative statistical graphics
• Options for visualizing univariate or bivariate distributions and for comparing them betw-
een subsets of data
• Automatic estimation and plotting of linear regression models for different kinds depend-
ent variables
• Convenient views onto the overall structure of complex datasets • High-level abstractions
for structuring multi-plot grids that let you easily build complex visualizations.
5)Scikit-Learn:
Scikit-learn is a free software machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, kmeans and DBSCAN, and is designed to
interoperate with the Pythonnumerical and scientific libraries NumPy and SciPy.
• The Jupyter Notebook is an open source web application that you can use to create and share
documents that contain live code, equations, visualizations, and text. Jupyter Notebook is
maintained by the people atProject Jupyter.
• Jupyter Notebooks are a spin-off project from the IPython project, which used
to have an IPython Notebook project itself. The name, Jupyter, comes from the core supported
programming languages that it supports: Julia, Python, and R. Jupyter ships with the IPython
kernel, which allows you to write your programs in Python, but there are currently over 100
other kernels that you can also use.
16
5.3 Google Colab NoteBook
• Notebooks. Explore and run machine learning code with Google Colab Notebooks, a cloud
computational environment that enables reproducible and collaborative analysis.
• One of the advantages to using Notebooks as your data science workbench is that you can
easily add data sources from thousands of publicly available Datasets or even upload your own.
You can also use output files from another Notebook as a data source.
17
Chapter 6
IMPLEMENTATION
Advantages:
• Compared to other algorithms decision trees requires less effort for data prepara-
tion during pre-processing.
• A decision tree does not require normalization of data.
18
• Missing values in the data also do NOT affect the process of building a decision
tree to any considerable extent.
• A Decision tree model is very intuitive and easy to explain to technical teams
as well as stakeholders.
Disadvantages:
• A small change in the data can cause a large change in the structure of the decision tree
causing instability.
• For a Decision tree sometimes calculation can go far more complex compared to other
algorithms.
• Decision tree training is relatively expensive as the complexity and time has taken are
• The Decision Tree algorithm is inadequate for applying regression and predicting continuous
values.
6.3 Code
First, we are going to import all the modules that we are going to need for training our model.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.model_selection import GridSearchCV
19
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
Loading Dataset
df_train = pd.read_csv('/content/train_u6lujuX_CVtuZ9i (1).csv')
df_test = pd.read_csv('/content/test_Y3wMUE5_7gLdaTN.csv')
21
CHAPTER – 7
TESTING
24
CHAPTER 8
OUTPUTS
25
Plotting all graphs
Accuracy
26
Chapter 9
Conclusion and Further work
Conclusion
Loan companies grant loans after a thorough verification and validation
process. However, they do not know with absolute certainty whether the applicant will be able
to repay the loan without difficulty. So,that we introduced the loan Prediction System that will
allow them to choose the most deserving applicants quickly, easily, and efficiently. It will
provide the bank with unique benefits. In this project we have reviewed the process of building
a Loan Approval Prediction System. Data Collection, Exploratory Data Analysis, Data
Preprocessing, Model Building, and Model Testing are theanalytical processes involved in
building this system.
Further Work
For further research, applicants' Age, past health records, as well as
the type of occupation they have will be utilized to evaluate the ambiguity factor of paying
debts, and possible defaults of corporate loans for businesses and startups can be forecasted.
Another method could be developed to forecast defaulters on different types of loans as well.
27
REFERENCES
`
[1] Kumar Arun, Garg Ishan, Kaur Sanmeet, May-Jun. 2016. Loan Approval Prediction
based on Machine Learning Approach, IOSR Journal of Computer Engineering (IOSR-JCE)
[2] Wei Li, Shuai Ding, Yi Chen, and Shanlin Yang, Heterogeneous Ensemble for Default
Prediction of Peer-to-Peer Lending in China, Key Laboratory of Process Optimization
and Intelligent Decision-Making, Ministry of Education, Hefei University of Technology,
Hefei 2009, China
[3] Short-term prediction of Mortgage default using ensembled machine learning models,
Jesse C.Sealand on july 20, 2018.
[4] Clustering Loan Applicants based on Risk Percentage using K-Means Clustering
Technique, Dr. K. Kavitha, International Journal of Advanced Research in Computer
Science and Software Engineering.
.
28