Mini Proj Modified

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

AMINI-PROJECTREPORT ON

The Loan Prediction using Machine Learning

Submitted in partial fulfilment for the award of the degree of

BACHELOR OF TECHNOLOGY
In
Computer Science and Engineering
By
M.Bhavya Lakshmi Siridevi(19A81A05M5)

K.Chaitanya (19A81A05L2)

K.V.N.S.S.Krishna Sreekar(19A81A05K7)

Under theEsteemed SupervisionOf

Mrs.G.Prasanthi,M.Tech.,Assistant Professor

DepartmentofComputerScienceand Engineering(AccreditedbyN.B.A.)
SRI VASAVI ENGINEERING COLLEGE(Autonomous)
(Affiliated to JNTUK, Kakinada)
Pedatadepalli,Tadepalligudem-534101,A.P2021-22
SRI VASAVI ENGINEERING COLLEGE(Autonomous)
DepartmentOfComputerScienceandEngineering

Pedatadepalli,Tadepalligudem

This is to certify that the Project Report entitled “The


The Loan Prediction using Machine
Learning”submitted by M.Bhavya Lakshmi Siridevi (19A81A05M5), K.Chaitanya
(19A81A05L2) , K.V.N.S.
N.S.S.Krishna Sreekar(19A81A05K7)for the award of the degree
of Bachelor of Technology in the Department of Computer Science and
Engineeringduringtheacademicyear
Engineeringduringtheacademicyear2021-2022.

Nameof ProjectGuide HeadoftheDepartment

Mrs.G.Prasanthi.M.Tech., Dr.DJayaKumariMTech.,Ph.D.

Assistant Professor. Professor&HOD.

ExternalExaminer
DECLARATION

Weherebydeclarethattheprojectreportentitled“The Loan Prediction Using Machine


Learning” submitted by us to Sri Vasavi Engineering
College(Autonomous),Tadepalligudem, affiliated to JNTUK Kakinada in partial
fulfilmentofthe requirementforthe awardof the degree ofB.TechinComputerScience
andEngineering is a record of Bonafide project work carried out by us under the guidance
ofMrs.G.Prasanthi .We further declare that the work reported in this project has not
beensubmitted and will not be submitted, either in part or in full,for the award of any
otherdegreeinthisinstituteor anyotherinstitute or University.

Project Associates
M.Bhavya Lakshmi Siridevi (19A81A05M5)
K.Chaitanya(19A81A05L2)
K.V.N.S.S.Krishna Sreekar(19A81A05K7)
ACKNOWLEDGEMENT

First and foremost, we sincerely salute to our esteemed institute “SRI VASAVI
ENGINEERING COLLEGE”, for giving us this golden opportunity to fulfill our warm
dream to become an engineer.
Our sincere gratitude to our project guide Mrs.G.Prasanthi, Department of Computer
Science and Engineering, for his timely cooperation and valuable suggestions while
carrying out this project.
We express our sincere thanks and heartful gratitude to Dr.D.Jaya Kumari, Professor &
Head of the Department of Computer Science and Engineering, for permitting us to do our
project.
We express our sincere thanks and heartful gratitude to Dr.G.V.N.S.R. Ratnakara Rao,
Principal, for providing a favourable environment and supporting us during the
development of this project.
Our special thanks to the management and all the teaching and non-teaching staff
members, Department of Computer Science and Engineering, for their support and
cooperation in various ways during our project work. It is our pleasure to acknowledge the
help of all those respected individuals.
We would like to express our gratitude to our parents, friends who helped to complete this
project.

Project Associates
M.Bhavya Lakshmi Siridevi (19A81A05M5)
K.Chaitanya(19A81A05L2)
K.V.N.S.S.KrishnaSreekar(19A81A05K7)
TABLEOFCONTENTS

S.NO TITLE PAGENO


ABSTRACT i

1 INTRODUCTION 1
1.1 Introduction 1
1.2 Scope 1
1.3 Objective 1

2 LITERATURE SURVEY 2

3 SYSTEMSTUDYANDANALYSIS 3-6

3.1 Problem Statement 3

3.2 Existing System 3

3.3 Limitations of the Existing System 3

3.4 Proposed System 3

3.5 Advantages of Proposed System 4

3.6 Functional Requirements 4

3.7 Non-Functional Requirements 4-5

3.8 System Requirements 6

3.8.1 Software Requirements 6

3.8.1 Hardware Requirements 6

4 SYSTEMDESIGN 7-11

4.1 System Design 7- 8

4.2 System Architecture 8

i
4.3 UML Diagrams 9-11

5 TECHNOLOGIES 12-16

5.1.1 About Python 12-14

5.1.2 Required Python Libraries 14-16

6 IMPLEMENTATION 17-21

6.1 Implementation Steps 17

6.2 Decision Tree Algorithm 18-19

6.3 Code 19-21

7 TESTING 19-21

7.1 Introduction to testing 22

7.2 Types of Testing 22

7.2.1 UnitTesting 22

7.2.1 Black Box Testing 22

7.2.2 White Box Testing 22-23

7.2.2 Integrating Testing 23

7.2.3 Functional Testing 23

7.2.4 System Testing 23

7.3 Test Strategy and Design 23-24

7.4 Test Objectives 24

7.5 Features to be Tested 24

8 OUTPUTS 25-26

9 CONCLUSIONANDFURTHERWORK 27

10 REFERENCES 28

i
ABSTRACT

Banks are making major part of profits through loans. Though lot of people are
applying for loans. It’s hard toselect the genuine applicant, who will repay the loan.
While doing the process manually, lotof misconception may happen to select the genuine
applicant. Therefore we are developing loan prection system using machine learning, so
the system automatically selects the eligible candidates.This is helpful to both bank staff
and applicant. The time period for the sanction of loan will be drastically reduced. In this
project we are predicting the loan data by using some machine learning algorithms that is
Decision Tree.

i
CHAPTER-1
INTRODUCTION

1.1 Introduction

A loan is the core business part of banks. The main portion the bank’s profit is directly
come from the profit earned from the loans. Though bank approves loan after a regress
process of verification and testimonial but still there's no surety whether the chosen
hopeful is the right hopeful or not . This process takes fresh time while doing it manually.
We can prophesy whether that particular hopeful is safe or not and the whole process of
testimonial is automated by machine literacy style. Loan Prognostic is really helpful for
retainer of banks as well as for the hopeful also.

1.2 Scope

The Scope of the project includes:

 Assists the lender in analyzing the sit

 Gives better services for use.

 Reduce the risk factor by choosing the right person.

 Save time and money for the lender.

1.3 Objective

The purpose of developing the Loan Prediction system is to computerize the traditional

way of predicting the loan manually. The main objective of this paper is to predict

whether assigning the loan to particular person will be safe or not.

1
CHAPTER – 2
LITERATURESURVEY

The Loan Prediction Using Machine Learning,Ashlesha Vaidya,


Pidikiti Supriya,Myneedi Pavani, NAgarapu Saisushma
Ashlesha Vaidya used logistic regression as a probabilistic and predictive approach to
loan approval prediction. The author pointed out how Artificial neural networks and
Logistic regression are most used for loan prediction.It still has its limitations, and it
requires a large sample of data for parameter estimation. Logistic regression also requires
that the variables be independent of each other otherwise the model tends to overweigh
the importance of the dependent variables.
Pidikiti Supriya , Myneedi Pavani , Nagarapu Saisushma proposed that Data
collection and preprocessing, applying machine learning models, training, and testing the
data were the modules covered in this paper. Outlier detection and removal, as well as
imputation removal processing, were done during the pre processing stage. To predict the
chances of current status regarding the loan approval process, SVM, DDT, KNN, and
gradient boosting models were used in this method. To divide the dataset into training
and testing processes, the 80:20 rule was used. Experimentation concluded that the
Decision Tree has significantly higher loan prediction accuracy than the other models.

2
CHAPTER–3
SYSTEM STUDY AND ANALYSIS

3.1 Problem Statement

Loan Prediction is a classification problem in which we need to classify whether the loan will
be approved or not. classification refers to a predictive modeling problem where a class label is
predicted for a given example of input data.Therefore, we came up with a project as Loan
Prediction,which will help us to predict the loan status correctly.

3.2 Existing System

In the Existing System Bank employees check the details of applicant manually and give the
loan toeligible applicant. Checking the details of all applicants takes lot of time.

3.3 Limitations of the Existing System

Time Consuming.
Less Efficient.
More manual work required.
There is chances of human error may occur due checking all details manually.
There is possibility of assigning loan to ineligible applicant.

3.4 Proposed System

To deal with the problem, we developed automatic loan prediction using machine learning
techniquesi.e. Decision tree technique. We will train the machine with previous dataset. so
machine can analyze andunderstand the process . Then machine will check for eligible
applicant and give us result.

3
3.4 Advantages of Proposed System
 Time period for loan sanctioning will be reduced.
 More accurarate.
 Whole process will be automated, so human error will be avoided.
 Eligible applicant will be sanctioned loan without any delay

3.5 Functional Requirements

In software engineering, a functional requirement defines a function of a software


system or its component. A function is described as a set of inputs, the behavior, and outputs
(see also software).Functional requirements may be calculations, technical details, data
manipulation and processing and other specific functionality that define what a system is
supposed to accomplish. Behavioural requirements describing all the cases where the system
uses the functional requirements are captured in use cases. Generally, functional requirements
are expressed in the form “system shall do ”. The plan for implementing functional
requirements is detailed in the system design. In requirement engineering, functional
requirements specify particular results of a system. Functional requirements drive the
application architecture of a system. A requirements analyst generates use cases after gathering
and validating a set of functional requirements.The hierarchy of functional requirements is:
user/stakeholder request -> feature ->use case->requirements analyst generates use cases after
gathering and validating a set of functional requirements. Functional requirements may be
technical details, data manipulation and other specific functionality of the project is to provide
the information to the user.
 System should be able to load trained / test datasets.
 System should be able to load essential python Libraries

 Loan status should be predicted.

 The predicted status should be accurate.

3.7Non-FunctionalRequirements
In systems engineering and requirements engineering, a non-functional requirement is a
requirement that specifies criteria that can be used to judge the operation of a system, rather
than specific behaviours. Non-Functional requirements include quantitative constraints, such as
response time or accuracy
4
Accuracy and Precision
Thesystem should perform its process in accuracy and Precision to avoid problems.

Reliability
The system is more reliable because of the qualities that are inherited from the chosen
platform python.The code built by using python is more reliable.

Security:
The system should be secure and saving persons’s privacy

Performance
The performance characteristics of the system are outlines here:
 Response time (average, maximum)
 Throughput (frames processed per second)
 Accuracy
 Resource utilization (memory, disk, camera)

Maintainability
The maintenance group should be able to fix any problem occur suddenly.

5
3.8 System Requirements
3.8.1 Software requirements

 Operating system : Windows7andabove

 Coding Language : Python

 Technology : Machine Learning

 Algorithm : Decision Tree

3.8.2 Hardware Requirements


 System : Computer

 HardDisk : 256 GB

 Ram : 8GB

 Processor : i3 or above

6
CHAPTER – 4
SYSTEM DESIGN

4.1 System Design


Prediction LOAN PREDICTION USING MACHINE LEARNING. Usually the systems
are designed to use this learned knowledge to raised process similar input within the future. A
ML algorithm is one that can learn from experience (observed examples) with respect to some
class of tasks and a performance measure. Classification which is additionally mentioned as
pattern recognition tech-nique is a crucial task by which machines “learn” to automatically
recognize complex patterns, to differentiate between exemplars based on their different
patterns, and to form intelligent decisions which tends to give the proper output with maxi-
mum accuracy. This Design approach includes several steps in its implementation they are:

1.Data Sets:
The dataset collected for predicting loan defaultcustomers is predicted into Training
set and testing set. Generally 80:20 ratio is applied to split the training set and testing set. The
data model which was created using Decision tree is applied on the training set and based on
the test result accuracy, Test set prediction is done.

2.Data Preprocessing:
The collected data may contain missing values that may lead to inconsistency. To
gain better results data need to be preprocessed and so it'll better the effectiveness of the
algorithm. We should remove the outliers and we need to convert the variables. In order to
flooring these issues we use chart function.

3.Correlating attributes:
Based on the correlation among attributes it was observed more likely to pay back
their loans The attributes that are individual and significant can include Property area,
education, The loanamount, and lastly credit History, which is since by intuition it is
considered as important.The correlation among attributes can be identified using corplot and

boxplot in Python platform[1].

7
4. Building the classification model using Decision tree algorithm:

For predicting the loan defaulter’s and non defaulter’s problem Decision tree algorithm is
used. It is effective because it provides better results in classification problem. It is extremely
intuitive, easy to implement and provide interpretable predictions. It produces out of bag
estimated error which was proven to be unbiased in many tests. It is relatively easy to tune
with. It gives highest accuracy result for the problem.

5. Data Prediction:
Prediction refers to the output of an algorithm after it has been trained on a
historical dataset anapplied to new data when forecasting thelikelihood of a
particular outcome

4.2. System Architecture:

8
4.3 Internal architecture:

4.4 UML Diagrams


The Unified Modelling Language allows the software engineer to express an analysis
model using the behaviour notation that is governed by a set of syntactic semantic and pragmatic
rules. A UML system is represented using five different views that describe the system from
distinctly different perspective. Each view is defined by a set of diagram, which is as follows.

4.4.1 Usecase Diagram

The usecase diagram is dynamic in nature, there should be some internal or external
factors for making the interactions.These internal and external agents are known as actors. Use
case diagrams consists of actors, use cases and their relationships. The diagram is used to
model the system/subsystem of an application.A single use case diagram captures a particular
functionality of a system. Hence to model the entire system, a number of use case diagrams are
used.

Purpose of Use Case Diagrams

Use case diagrams are used to gather the requirements of a system including internal and
external influences. These requirements are mostly design requirements. Hence, when a system
is analyzed to gather its functionalities, use cases are prepared and actors are identified.

9
When the initial task is complete, use case diagrams are modelled to present the outside view.

In brief, the purposes of use case diagrams can be said to be as follows –

 Used to gather the requirements of a system.

 Used to get an outside view of a system

 Identify the external and internal factors influencing the system.

 Show the interaction among the requirements are actors.

4.4.1.1 Usecase Diagram For Loan Prediction

4.4.2 Sequence Diagram

The sequence diagram represents the flow of messages in the system and is

also termed as an event diagram. It helps in envisioning several dynamic scenarios. It

portrays the communication between any two lifelines as a time-ordered sequence of

events, such that these lifelines took part at the run time. In UML, the life line is represented

10
by a vertical bar, whereas the message flow is represented by a vertical dotted line that

extends across the bottom of the page. It incorporates the iterations as well as branching.

Purpose of a Sequence Diagram

1. To model high-level interaction among active objects within a system.

2. To model interaction among objects inside a collaboration realizing a use


case.

3. It either models generic interactions or some certain instances of interaction.

4.4.2.1 Sequence Diagram For Loan prediction

11
Chapter-5
TECHNOLOGIES

5.11 About Python


Python is currently the most widely used multi-purpose, high-level programming language.
Python is an interpreted, object-oriented, high-level programming language with dynamic
semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together. Python's simple, easy to
learn syntax emphasizes readability and therefore reduces the cost of program maintenance.
Python supports modules and packages, which encourages program modularity and code reuse.
The Python interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed. The biggest strength of
Python is huge collection of standard library which can be used for the following
 Machine Learning
 GUI Applications (like Kivy, Tkinter, PyQtetc. )
 Web frameworks like Django (used by YouTube, Instagram, Dropbox)
 Image processing (like OpenCV, Pillow)
 Web scraping (like Scrapy, BeautifulSoup, Selenium)
 Test frameworks
 Multimedia

Advantages of Python:
Extensive Libraries:Python downloads with an extensive library and it contain code for
various purposes like regular expressions, documentation-generation, unittesting,web browsers,
threading, databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.

Extensible: As we have seen earlier, Python can be extended to other languages.You can
write some of your code in languages like C++ or C. This comes in handy, especially in
projects.

12
Embeddable: Complimentary to extensibility, Python is embeddable as well. Youcan put
your Python code in your source code of a different language, like C++. This lets us add
scripting capabilities to our code in the other language.

Improved Productivity: The language’s simplicity and extensive libraries render


rogrammers more productive than languages like Java and C++ do. Also, the fact that you need
to write less and get more things done.

Simple and Easy: When working with Java, you may have to create a class to print‘Hello
World’. But in Python, just a print statement will do. It is also quite easy to learn, understand,
and code. This is why when people pickup Python, they have a hard time adjusting to other
more verbose languages like Java.

Readable: Because it is not such a verbose language, reading Python is much like reading
English. This is the reason why it is so easy to learn, understand, and code. It also does not need
curly braces to define blocks, and indentation is mandatory. These further aids the readability of
the code.

Object-Oriented This language supports both the procedural and object-oriented


programming paradigms. While functions help us with code reusability, classes and objects let
us model the real world. A class allows the encapsulation of data and functions into one.

Free and Open-Source: Like we said earlier, Python is freely available. But notonly can
you download python for free, but you can also download its source code, make changes to it,
and even distribute it. It downloads with an extensive collection of libraries to help you with
your tasks.

Portable: When you code your project in a language like C++, you may need to makesome
changes to it if you want to run it on another platform. But it isn’t the same withPython. Here,
you, need to code only once, and you can run it any where. This iscalled Write Once Run
Anywhere (WORA). However, you need to be careful enough not to include any system
dependent features

13
Interpreted:Lastly, we will say that it is an interpreted language. Since statements are
executed one by one, debugging is easier than in compiled languages.

Disadvantages of Python:

Speed Limitations: We have seen that Python code is executed line by line. But since
python is interpreted, it often results in slow execution. This, however, isn’t a
problem unless speed is a focal point for the project. In otherwords, unless high speed is
requirement, the benefits offered by Python are enough to distract us from its speed
limitations.

Weak in Mobile Computing and Browsers: While it serves as an excellent servers


idelanguage, Python is much rarely seen on the client-side. Besides that, it is rarely ever used
to implement smart phone-based to implement smart phone-based applications. One such
application is called Carbonnelle.

Design Restrictions: As you know, Python is dynamically-typed. This means that you
don’t need to declare the type of variable while writing the code. It uses ducktyping. But wait,
what’s that ? Well, it just means that if it looks like a duck, it must be a duck.While this is easy
on the programmers during coding, it can raise runtime errors.

Under developed Database Access Layers: Compared to more widely used


technologies like JDBC (Java Data Base Connectivity) and ODBC (Open Data Base
Connectivity), Python’s data base access layers are a bit under developed. Consequently, it is
less often applied in huge enterprises.

5.1.2 Required Python Libraries:


1 ) NumPy
2) Pandas
3) Matplotlib
4) Seaborn
5) Sci-kit learn

14
1)NumPy:

• NumPy is a library for the Python programming language, adding support for large,multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical
functions to operate on these arrays.
• NumPy is a Python library used for working with arrays. It also has functions for working
in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005
by Travis Oliphant. It is an open source project and you can use it freely.It is the fundam-
ental package for scientific computing with Python.
2) Pandas:
• Pandas is a software library written for the Python programming language for datamanip-
ulation and analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series. It is free software released under the three-clause BCD
license.
• pandas is a software library written for the Python programming language for data ma-
nipulation and analysis. In particular, it offers data structures and operations for manipu-
ating numerical tables and time series.pandas is a fast, powerful, flexible and easy to use
use open source data analysis and manipulation tool, built on top of the Python
programming language.

3)Matplotlib:
• Matplotlib is a plotting library for the Python programming language and its numerical

mathematics extension NumPy. It provides an object-oriented API for embedding plots

into applications using general-purpose GUI toolkits like Tkinter,wxPython,Qt, or GT

K+.

• Matplotlib is an amazing visualization library in Python for 2D plots of arrays.Matplotlib

is a multi-platform data visualization library built on NumPy arrays and designed to

work with the broader SciPy stack. ... Matplotlib consists of several plots like line, bar,

scatter, histogram etc.

15
4) Seaborn:

Seaborn is a library for making statistical graphics in Python. It is built on top of matplotlib
and closely integrated with Pandas data structures. Here is some of the functionality that
Seaborn offers:
• A dataset-oriented API for examining relationships between multiple variables
• Specialized support for using categorical variables to show observations or aggregate
statistics
• Seaborn is an amazing Python visualization library built on top of matplotlib that provides
a high-level interface for drawing attractive and informative statistical graphics
• Options for visualizing univariate or bivariate distributions and for comparing them betw-
een subsets of data
• Automatic estimation and plotting of linear regression models for different kinds depend-
ent variables
• Convenient views onto the overall structure of complex datasets • High-level abstractions
for structuring multi-plot grids that let you easily build complex visualizations.

5)Scikit-Learn:
Scikit-learn is a free software machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, kmeans and DBSCAN, and is designed to
interoperate with the Pythonnumerical and scientific libraries NumPy and SciPy.

5.2. Jupyter Notebook

• The Jupyter Notebook is an open source web application that you can use to create and share
documents that contain live code, equations, visualizations, and text. Jupyter Notebook is
maintained by the people atProject Jupyter.
• Jupyter Notebooks are a spin-off project from the IPython project, which used
to have an IPython Notebook project itself. The name, Jupyter, comes from the core supported
programming languages that it supports: Julia, Python, and R. Jupyter ships with the IPython
kernel, which allows you to write your programs in Python, but there are currently over 100
other kernels that you can also use.
16
5.3 Google Colab NoteBook
• Notebooks. Explore and run machine learning code with Google Colab Notebooks, a cloud
computational environment that enables reproducible and collaborative analysis.
• One of the advantages to using Notebooks as your data science workbench is that you can
easily add data sources from thousands of publicly available Datasets or even upload your own.
You can also use output files from another Notebook as a data source.

17
Chapter 6
IMPLEMENTATION

6.1 Implementation Steps:

1) Import necessary packages into NoteBook.


2) For implementing, we need dataset. So, Load the Dataset.
3) Preprocess the loaded dataset
4) Cleaning and filtering Data according to the requirement.
5) Selecting features on the basis of the relation with loan approval.
6) Training our model with the help of decision tree algorithm.
7) Once the model is trained, input the test data and predict the output labels.
8) Calculate the accuracy of the model.

6.2 DECISION TREE ALGORITHM


The Decision Tree algorithm is a member of the family of supervised learning
algorithms.The goal of using a Decision Tree is to generate a training model that can be used
to predict the class or value of the target variable (training data) by following the rules in the
training data set. When using Decision Trees to predict a class label for a record, we begin at
the root of the tree. The values of the root attribute are compared to the values of the record's
attribute. We proceed to the next node by following the branch corresponding to that value
based on the comparison.The internal nodes of a decision tree denote the different attributes;
the branches between the nodes tell us the possible values that these attributes can have in the
observed samples, while the terminal nodes tell us the final.

Advantages:
• Compared to other algorithms decision trees requires less effort for data prepara-
tion during pre-processing.
• A decision tree does not require normalization of data.

• A decision tree does not require scaling of data as well.

18
• Missing values in the data also do NOT affect the process of building a decision
tree to any considerable extent.

• A Decision tree model is very intuitive and easy to explain to technical teams
as well as stakeholders.

Disadvantages:
• A small change in the data can cause a large change in the structure of the decision tree

causing instability.

• For a Decision tree sometimes calculation can go far more complex compared to other

algorithms.

• Decision tree often involves higher time to train the model.

• Decision tree training is relatively expensive as the complexity and time has taken are

• The Decision Tree algorithm is inadequate for applying regression and predicting continuous
values.

6.3 Code

1. Import the libraries and load the dataset

First, we are going to import all the modules that we are going to need for training our model.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.model_selection import GridSearchCV

19
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

Loading Dataset
df_train = pd.read_csv('/content/train_u6lujuX_CVtuZ9i (1).csv')
df_test = pd.read_csv('/content/test_Y3wMUE5_7gLdaTN.csv')

2. Preprocess the data


df_train.shape
df_test.shape
df_train.head()
total = df_train.isnull().sum().sort_values(ascending=False)
percent = (df_train.isnull().sum()/df_train.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data.head(20)
df_train['Gender'] = df_train['Gender'].fillna(
df_train['Gender'].dropna().mode().values[0] )
df_train['Married'] = df_train['Married'].fillna(
df_train['Married'].dropna().mode().values[0] )
df_train['Dependents'] = df_train['Dependents'].fillna(
df_train['Dependents'].dropna().mode().values[0] )
df_train['Self_Employed'] = df_train['Self_Employed'].fillna(
df_train['Self_Employed'].dropna().mode().values[0] )
df_train['LoanAmount'] = df_train['LoanAmount'].fillna(
df_train['LoanAmount'].dropna().median() )
df_train['Loan_Amount_Term'] = df_train['Loan_Amount_Term'].fillna(
df_train['Loan_Amount_Term'].dropna().mode().values[0] )
df_train['Credit_History'] = df_train['Credit_History'].fillna(
df_train['Credit_History'].dropna().mode().values[0] )
code_numeric = {'Male': 1, 'Female': 2,'Yes': 1, 'No': 2,
Graduate': 1, 'Not Graduate': 2,'Urban': 3, 'Semiurban': 2,'Rural': 1,'Y': 1, 'N': 0,'3+': 3}
df_train = df_train.applymap(lambda s: code_numeric.get(s) if s in code_numeric else s)
df_test = df_test.applymap(lambda s: code_numeric.get(s) if s in code_numeric else s)
df_train.drop('Loan_ID', axis = 1, inplace = True)
20
4.Train the model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)
ypred_tree = tree.predict(X_test)
print(“The model has successfully trained”)

5.Evaluating the model


accuracy = f1_score(y_test, ypred_tree)
accuracy

21
CHAPTER – 7
TESTING

7.1 INTRODUCTION TO TESTING


Testing is a process, which reveals errors in the program. It is the major quality
measure employed during software development. During testing, the program is executed with
a set of test cases and the output of the program for the test cases is evaluated to determine if
the program is performing as it is expected to perform.

7.2 Types of Testing

7.2.1 Unit Testing


Unit Testing is done on individual modules as they are completed and
become executable. It is confined Only to the designer's requirements. Each module can be
tested using the following two Strategies:

7.2.1.1 Black Box Testing


In this strategy some test cases are generated as input conditions that fully execute all
functional requirements for the program. This testing has been uses to find errors in the
following categories:
Incorrect or missing functions
 Interface errors
 Errors in data structure or external database access
 Performance errors
 Initialization and termination errors.
In this testing only the output is checked for correctness. The logical flow of the data is not
checked.

7.2.1.2 White Box Testing


In this the test cases are generated on the logic of each module by drawing flow graphs of that
module and logical decisions are tested on all the cases. It has been uses to generate the test
cases in the following cases:
22
Guarantee that all independent paths have been Executed.
 Execute all logical decisions on their true and false Sides.
 Execute all loops at their boundaries and within their operational bounds
 Execute internal data structures to ensure their validity.

7.2.2 Integrating Testing


Integration testing ensures that software and subsystems work together a whole. It tests the
interface of all the modules to make sure that the modules behave properly when integrated
together. In this case the communication between the device and Google Translator Service.

7.2.3 Functional testing


Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals. Functional testing is centered on the following items
 Valid Input: identified classes of valid input must be accepted.
• Invalid Input: identified classes of invalid input must be rejected.
• Functions: identified functions must be exercised.
• Output: identified classes of application outputs must be exercised.
• Systems/Procedures: interfacing systems or procedures must be invoked

7.2.4 System Testing


Involves in-house testing in an emulator of the entire system before delivery to the user. It's aim is to
satisfy the user the system meets all requirements of the client's specifications. 7.2.4 Acceptance Testing
It is a pre-delivery testing in which entire system is tested in a real android device on real world data and
usage to find errors.

, 7.3 Test Strategy and Design


Designing a test strategy for all different types of functioning, hardware by
determining the efforts and costs incurred to achieve the objectives of the system. For any
project, the test strategy can be prepared by
 Defining the scope of the testing.
 Identifying the type of testing required
 Risks and issues
23
 Creating test logistics.
7.4 Test Objectives
Test objective is the overall goal and achievement of the test execution. Objectives are
defined in such a way that the system is bug-free and is ready to use bythe end-users. Test
objective can be defined by identifying thesoftware features that are needed to test and the
goal of the test, these features need to achieve to be noted as successful.

7.5 Features to be Tested


 Loan eligible students should be predicted.
 Prediction should be accurate.

24
CHAPTER 8
OUTPUTS

Checking Null Values

Filling the null values

25
Plotting all graphs

Accuracy

Loan eligible and not eligible candidates(csv)

26
Chapter 9
Conclusion and Further work

Conclusion
Loan companies grant loans after a thorough verification and validation
process. However, they do not know with absolute certainty whether the applicant will be able
to repay the loan without difficulty. So,that we introduced the loan Prediction System that will
allow them to choose the most deserving applicants quickly, easily, and efficiently. It will
provide the bank with unique benefits. In this project we have reviewed the process of building
a Loan Approval Prediction System. Data Collection, Exploratory Data Analysis, Data
Preprocessing, Model Building, and Model Testing are theanalytical processes involved in
building this system.

Further Work
For further research, applicants' Age, past health records, as well as
the type of occupation they have will be utilized to evaluate the ambiguity factor of paying
debts, and possible defaults of corporate loans for businesses and startups can be forecasted.
Another method could be developed to forecast defaulters on different types of loans as well.

27
REFERENCES
`
[1] Kumar Arun, Garg Ishan, Kaur Sanmeet, May-Jun. 2016. Loan Approval Prediction
based on Machine Learning Approach, IOSR Journal of Computer Engineering (IOSR-JCE)
[2] Wei Li, Shuai Ding, Yi Chen, and Shanlin Yang, Heterogeneous Ensemble for Default
Prediction of Peer-to-Peer Lending in China, Key Laboratory of Process Optimization
and Intelligent Decision-Making, Ministry of Education, Hefei University of Technology,
Hefei 2009, China
[3] Short-term prediction of Mortgage default using ensembled machine learning models,
Jesse C.Sealand on july 20, 2018.
[4] Clustering Loan Applicants based on Risk Percentage using K-Means Clustering
Technique, Dr. K. Kavitha, International Journal of Advanced Research in Computer
Science and Software Engineering.
.

28

You might also like