Credit Card Frauds

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 52

A Project Report

On

CREDIT CARD FRAUD DETECTION USING ISOLATION


FOREST ALGORITHM
Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR, ANANTHAPURAMU
In Partial Fulfillment of the Requirements for the Award of the Degree of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE & ENGINEERING
Submitted By
M. KEERTHI - (17699A0527)
M. SATISH - (17699A0547)
C. SREE HARITHA - (17699A0551)
V. THULASIRAM - (17699A0557)
Under the Guidance of
Dr. V. Arun, Ph.D, Associate Professor
Department of Computer Science & Engineering

MADANAPALLE INSTITUTE OF TECHNOLGY & SCIENCE


(UGC – AUTONOMOUS)
(Affiliated to JNTUA, Ananthapuramu)
Accredited by NBA, Approved by AICTE, New Delhi)
AN ISO 9001:2008 Certified Institution
P. B. No: 14, Angallu, Madanapalle – 517325
2020-2021

1
2009-2013

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that the project work entitled “CREDIT CARD FRAUD
DETECTION USING ISOLATION FOREST ALGORITHM” is a bonafide work
carried out by

M. KEERTHI - (17699A0527)
M. SATISH - (17699A0547)
C. SREE HARITHA - (17699A0551)
V. THULASIRAM - (17699A0557)

Submitted in partial fulfillment of the requirements for the award of degree Bachelor of
Technology in the stream of Computer Science & Engineering in Madanapalle
Institute of Technology and Science, Madanapalle, affiliated to Jawaharlal Nehru
Technological University Anantapur, Ananthapuramu during the academic year
2020-2020

Guide Head of the Department


Dr. V. Arun, Ph.D Dr. R. Kalpana
Associate Professor, Professor and Head,
Department of CSE Department of CSE

Submitted for the University examination held on:

Internal Examiner External Examiner


Date: Date:

2
ACKNOWLEDGEMENT

We sincerely thank the MANAGEMENT of Madanapalle Institute of


Technology and Science for providing excellent infrastructure and lab facilities that
helped me to complete this project.

We sincerely thank Dr. C. Yuvaraj, M.E., Ph.D., Principal for guiding and
providing facilities for the successful completion of our project at Madanapalle Institute
of Technology and Science, Madanapalle.

We express our deep sense of gratitude to Dr. R. Kalpana, M. Tech., Ph.D.,


Professor and Head of the Department of CSE for his continuous support in making
necessary arrangements for the successful completion of the Project.

We express our sincere thanks to the Internship Coordinator, Dr. R. Anand


kumar, M. Tech., Ph.D., for his tremendous support for the successful completion of the
Project.
.
We express my deep sense gratitude to Dr. V. Arun, Ph.D , Project Coordinator
for their guidance and encouragement that helped us to complete this project.

We express our deep gratitude to my guide Dr. V. Arun, PhD., Associate


Professor, Department of CSE for his guidance and encouragement that helped us to
complete this project.

We also wish to place on record my gratefulness to other Faculty of CSE Department


and also to our friends and our parents for their help and cooperation during our project
work.

3
RECOGNISED RESEARCH CENTER

Plagiarism Verification Certificate

This is to certify that the B. Tech Project report titled, “CREDIT CARD FRAUD

DETECTION USING ISOLATION FOREST ALGORITHM” submitted by

MALLELA KEERTHI (REGD. NO: 17699A0527), MENTA SATISH

(REGD.NO: 17699A0547), CHEEDELLA SREE HARITHA (REGD. NO:

17699A0551) and VEERAMANGALAM THULASI RAM (REGD. NO:

17699A0557) has been evaluated using Anti-Plagiarism Software, URKUND and

based on the analysis report generated by the software, the report’s similarity index is

found to be 20%.

The following is the URKUND report for the project report consisting of 38 pages.

Dean RRC

4
DECLARATION

We hereby declare that the results embodied in this project


“CREDIT CARD FRAUD DETECTION USING ISOLATION FOREST
ALGORITHM” by us under the guidance of Dr. V. Arun, Ph.D, Associate Professor,
Dept. of CSE in partial fulfillment of the award of Bachelor of Technology in
Computer Science & Engineering from Jawaharlal Nehru Technological University
Anantapur, Ananthapuramu and we have not submitted the same to any other
University/institute for award of any other degree.

Date :
Place :

PROJECT ASSOCIATES
M. KEERTHI
M. SATISH
C. SREE KARITHA
V. THULASIRAM

I certify that above statement made by the students is correct to the best of my
knowledge.

Date : Guide :

5
INDEX

S.N TOPIC PAGE NO.


O

1. INTRODUCTION 1
1.1 Motivation 2
1.2 Problem Definition 2
1.3 Objective of the Project 2
1.4 Organization of Documentation 3

2. LITERATURE SURVEY 5
2.1 Introduction 6
2.2 Existing System 7
2.3 Disadvantages of Existing System 7
2.4 Proposed System 7
2.5 Advantages over Existing System 7
3. ANALYSIS 8
3.1Introduction 9
3.2 Software Requirement Specification 11
3.3 Content diagram of Project 27
4. DESIGN 29
4.1 Introduction 30
4.2 ER/UML Diagrams 30
4.3 Module Design and Organization 31
4.4 Conclusion 32
5. IMPLEMENTATION AND RESULTS 33
5.1 Introduction 34
5.2 Implementation of key functions 34
5.3 Method of Implementation 35
5.4 Output Screens and Result Analysis 44
5.5 Conclusion 50

6. TESTING AND VALIDATION 51


6.1 Introduction 52
6.2 Design of Test cases and Scenarios 53
6.3 Validation 54
6.4 Conclusion 54
7. CONCLUSION 55
7.1 Conclusion 56
6
8. REFERENCES 57

List of Figures/Screens/Tables

S.NO Figure Name of the figure Page Number

1 3.3.1 Content Diagram 06

2 4.2.1 Usecase Diagram 11

3 4.2.2 Class Diagram 12


4 4.2.3 Sequence Diagram 17

5 4.2.4 Collaboration Diagram 20

6 4.3.1 Organization of project 20

7 4.3.2 Isolation forest Algorithm 27

7
8 5.3.1 Correlation matrix 30

9 5.4.1 Output Screen 31

8
ABSTRACT

The project is mainly focused on credit card fraud detection in real world. Increase in
number of transactions by number of people has created many fraudulent acts. The
motive of people who make those frauds is to get goods without paying, or to get funds
from an account without the permission of an Account Holder. Implementing fraud
detection systems has become more important for each credit card issuing banks to
reduce their losses and also to reduce risk. The most challenging situation in business is
that both the card and card holders will not be present at that time of purchase. This
makes it impractical for the merchant to confirm whether the customer is original card
holder or not. Many of the Researchers have proposed many algorithms in order to detect
Frauds to reduce loss. In this paper comparison of Local Outlier Factor and Isolation
Forest Algorithms has been proposed with the clear experimental results and
methodologies. After complete analysis we got Accuracy for of 97% for the local outlier
factor and 76% for the Isolation algorithm.

9
CHAPTER-1
INTRODUCTION

10
1.1 MOTIVATION

Long lines have been found in the grocery store, at the hour of shopping clients are
discovering numerous issues as deficient equilibrium, cash misfortune and so forth
alongside this they need to choose a best item among every one of the items. So to make
this interaction simple Visas were presented and these Visas helps a great deal in making
our works quick and there is additionally a burden of utilizing these cards as there are
more fakes occurring with those Mastercards.

1.2 PROBLEM DEFINITION

Credit card fraud detection is hard but most useful in order to prevent higher loses or to
reduce them by using Fraud detecting algorithms we can write fraud detecting algorithms
using Artificial Intelligence, Fuzzy Logic, Data Mining and Machine Learning but among
all the mentioned algorithms Machine Learning is the best algorithm for implementing
credit card fraud detection. So that in our proposed System we are interested to choose
Machine Learning. Large amount of data has been transferred during online transactions.
Credit cards are used mostly for shopping in everyday life with the vast usage of credit
cards, credit cards frauds were also increasing. Hence, we can say that this Theft is one of
the biggest risks to the customers as well as shop keepers and mainly this will bring
problem to banks and these issues are now a days became common as the card holder will
know that he has lost his money only after the transaction and in this case knowing after
transaction is useless. In this paper we are going to prove that by there are algorithms
which can be used to reduce these types of frauds by using many performance metrics as
F1 Score, Recall, Accuracy and precision these metrics will help us to know the
variations in algorithms and by knowing these metrics implementation of Algorithm
based on situation will be carried smoothly without any problem. Frauds to credit cards
mainly occurs in online shopping also for a reason that online buying requires very less
information so by using Local Outlier factor and Isolation Forest Algorithms we can
reduce these types of frauds.

11
1.3 OBJECTIVES OF PROJECT

The Credit Card Fraud Detection Problem incorporates demonstrating past Mastercard
exchanges with the information on the ones that ended up being misrepresentation. This
model is then used to recognize if another exchanges is real or not. Our point here is to
identify deceitful exchanges with limiting the inaccurate extortion characterizations

1.4 ORGANISATION OF DOCUMENTATION

1.4.1 Feasibility Study

Starter assessment dissects project plausibility; the likelihood the structure will be useful
to the affiliation. The rule objective of the feasibility study is to test the Technical,
Operational and Economical common sense for adding new modules and examining old
running system. All systems are feasible if they are given boundless resources and
unfathomable time. There are points in the reachability concentrate part of the
preliminary assessment:
 Technical Feasibility
 Operation Feasibility
 Economic Feasibility

Technical Feasibility

The particular issue regularly raised during the feasibility period of the assessment joins
the going with.

Operation Feasibility

The functional believability consolidates User very much arranged, enduring quality,
security, mobility, availability and reasonableness of the item used in the endeavor.
Economic Feasibility

12
Assessment of an errand costs and pay with an ultimate objective to choose whether or
not it is predictable and possible to wrap up.

CHAPTER-2
LITERATURE SURVEY

13
2.1 INTRODUCTION
Visa misrepresentation identification has numerous calculations in AI they were of 2
kinds as classifiers Algorithm and relapse Algorithms. Calculations can be carried out
dependent on the issue that we are confronting and there calculations will have execution
measurements too to realize which is best as referenced beneath we as of now have a few
calculations existing for certain non-working things.

2.2 EXIXTING SYSTEM


Since the Visa extortion discovery framework is an exceptionally explored field, there are
various calculations and strategies for playing out the Mastercard misrepresentation
identification framework. One of the most punctual framework is CCFD framework
utilizing markov model. Some other different existing calculations utilized in the
Mastercards extortion identification framework incorporates Cost touchy choice tree
(CSDT), support vector machine (SVM), Random woodland ,and so forth charge card
misrepresentation detection(CCFD) is additionally proposed by utilizing neural
organizations. The current Visa extortion identification framework utilizing neural
organization follows the whale swarm enhancement calculation to acquire an inceptive
worth. It the utilizations BP organization to redress the qualities which are discovered
mistake. These strategies has some genuine inconveniences, for example, diminishing
precision levels, absence of productivity, now and then ordering the ordinary exchanges
as extortion exchanges and tight clamp versa. These hindrances are defeated in this
charge card extortion recognition framework utilizing whale calculation and destroyed
strategy.

2.3 DISADVANTAGES OF EXISTING SYSTEM

1. In this paper another collative differentiation to compute that sensibly addresses the
benefits and misfortunes because of misrepresentation discovery is proposed.
2. A value touchy strategy which depends on Bayes least danger is introduced utilizing
the proposed value measure.

14
2.4 PROPOSED SYSTEM

In this proposed project we arranged a convention or a model to perceive the


misrepresentation action in Mastercard exchanges. This framework is productive of
giving essentially the entirety of the important highlights needed to recognize deceitful
and approved exchanges. As innovation fluctuates, it turns out to be difficult to follow the
conduct an example of fake exchanges. With the improvement of AI, man-made
brainpower and other related fields of data innovation, it becomes plausible to
computerize the cycle and to safeguard a portion of the productive measure of work that
is placed into distinguishing charge card deceitful activities.

2.5 ADVANTAGES OF PROPOSED SYSTEM

1. Isolation backwoods grades the meaning of factors in a relapse or arrangement issue in


a characteristic manner should be possible by Isolation Forest.
2. The 'sum' include is the exchange amount. Highlight 'class' is the target for the parallel
grouping and it takes esteem 1 for positive case (extortion) and 0 for negative case (not
misrepresentation).

15
CHAPTER-3

ANALYSIS

16
3.1 INTRODUCTION
Mastercard extortion recognition is hard however generally valuable to forestall higher
loses or to decrease them by utilizing Fraud identifying calculations we can compose
misrepresentation recognizing calculations utilizing Artificial Intelligence, Fuzzy Logic,
Data Mining and Machine Learning yet among every one of the referenced calculations
Machine Learning is the best calculation for carrying out Mastercard extortion location.
So that in our proposed System we are intrigued to pick Machine Learning. Enormous
measure of information has been moved during on the web exchanges. Mastercards are
utilized for the most part for shopping in regular daily existence with the huge utilization
of Visas, Visas cheats were additionally expanding. Thus, we can say that this Theft is
probably the greatest danger to the clients just as businesspeople and basically this will
carry issue to banks and these issues are presently a days became normal as the card
holder will realize that he has lost his cash solely after the exchange and for this situation
knowing after exchange is futile. In this paper we will demonstrate that by there are
calculations which can be utilized to diminish these sorts of fakes by utilizing numerous
presentation measurements as F1 Score, Recall, Accuracy and exactness these
measurements will assist us with knowing the varieties in calculations and by knowing
these measurements execution of Algorithm dependent on circumstance will be conveyed
easily with no issue. Fakes to charge cards fundamentally happens in web based shopping
likewise for an explanation that internet purchasing requires extremely less data so by
utilizing Local Outlier factor and Isolation Forest Algorithms we can diminish these kinds
of cheats.

3.2 HARDWARE AND SOFTWARE DESCRIPTION


The prerequisites detail is an innovative determination of requirements for the product
results. It is the underlying advance in the necessities review measure it records the
prerequisites of a particular programming framework including practical, execution and
security prerequisites. The thought process of programming necessities determination is
to supply a nitty gritty audit of the product project, its boundaries and objectives.

17
HARDWARE REQUIREMENTS

• Processor - 64 bit
• RAM-8GB
• Hard Disk - 1TB
• OS - Windows 10
• Key Board - Standard Windows Keyboard

Hardware Description

Random-access memory: An irregular access memory gadget permits information


things to be perused or written in practically similar measure of time independent of the
actual area of information inside the memory. Conversely, with other direct-access
information stockpiling media like hard plates, CD-RWs, DRW and the more established
attractive tapes and memory, the time needed to peruse and compose information things
fluctuates essentially on their depending actual restrictions, for example, media pivot
paces and arm development.

Read-only memory: (ROM) is a sort of non-unstable memory utilized in PCs and other
electronic gadgets. 64-bit processor: this processor is a microprocessor and it has a word
size of 64 Pieces is taken for this task. Arbitrary access memory A sporadic access
memory contraption licenses data things to be scrutinized or written in for all intents and
purposes a comparative proportion of time autonomous of the real space of data inside the
memory. Strangely, with other direct-access data accumulating media like hard circles,
CD-RWs, DRW and the more settled appealing tapes and memory, the time expected to
scrutinize and create data things vacillates basically on their depending genuine
obstructions, for instance, media turn rates and arm advancement. Peruse just Memory
(ROM) is a kind of non-flighty memory used in PCs and other electronic contraptions.64-
digit processor: this processor is a microprocessor and it's everything except a word size
of 64.

SOFTWARE REQUIREMENTS
 Jupiter NoteBook
 Anaconda
 Sckit-Learn
 Scipy

18
 Packages:
 Pandas
 Matplotlib
 Numpy

Software Description

Anaconda Python:
Boa constrictor scattering goes with more than 1,000 data packages similarly as the
Conda group and virtual environment chairman, called Anaconda Navigator, so it takes
out the need to sort out some way to present each library self-sufficiently .The open
source data packages can be independently presented from the Anaconda store with the
conda install request or using the pip present request that is presented with Anaconda. Pip
packages give an impressive parcel of the features of conda packs and all around they can
.collaborate You can similarly make your own custom groups using the conda amass the
request and you can grant them to others by moving them to Anaconda Cloud, PyPi or
other repositories. The default foundation of Anaconda2 fuses Python 2.7 and Anaconda3
fuses Python 3.6.However, you can build up new conditions that consolidate any
variation of Python packaged with conda.

Anaconda Navigator:
Anaconda Navigator is a workspace graphical UI (GUI) associated with Anaconda
dispersal that licenses you to dispatch applications and direct conda packs, and the
conditions and channels without using request line orders. Guide would boa constrictor
have the option to Cloud or in a local Anaconda Repository, present them in an
environment, the Run the groups and update them. It is open for Windows macOS and
other Linux Navigator is thusly presented when you present Anaconda variation 4.0.0 or
higher. The going with applications are open normally in Navigator.
• QTConsole
• Spyder
• VSCode
• Glueviz
 Jupyter notebook

19
• Orange 3 App
• Rodeo
• RStudio

Conda:
Conda is an open source, cross-platform,language-agnostic[13] pack boss and
Environment the board structure that presents, runs, and updates groups and their
conditions.

Permits clients to:


• Present different types of twofold programming packs and any vital libraries appropriate
for their enlisting stage
• Switch between pack variations
• Download and present invigorates from an item store.
It was made for Python programs, yet it can package and pass on programming for any
language (e.g., R), including multi-language projects. The conda group and environment
overseer is associated with all variations of Anaconda, Miniconda and Anaconda
Repository.

Anaconda Cloud:
Anaconda Cloud is a pack the chief’s organization by Anaconda where you can find,
store and offer public and private scratch cushion, conditions, and conda and PyPI
groups. Cloud has significant Python groups, scratch pads and conditions for a wide
variety of employments. You don't need to sign in or to have a cloud account, to search
for public packages, download and present them. You can gather new packages using the
Anaconda Client request line interface (CLI), then truly or normally move the groups to
cloud.

Scikit-Learn:
• Simple and capable gadgets for data mining and data assessment
• Accessible to everybody and reusable in various settings

20
• Built on Numpy,Scipy and Matplotlib
• It is a group open to import all estimation classes
• Syntax: from sklearn import classname
Eg: from sklearn import KNeighbors Classifier

Python_Packages:
Numpy:
Numpy is clearly extraordinary compared to other Mathematical and Scientific figuring
library for Python. Tensorflow and various stages use Numpy inside for acting in the
couple of strategy on Tensors. Maybe the fundamental segment of Numpy is it's the
Array interface. This interface can be used to impart pictures, sound waves in or some
other rough equal streams as assortments of real numbers with N estimations. Data on
Numpy is especially huge for Machine Learning and Data Science.

Pandas:
Pandas is an open-source, BSD-approved Python library giving world class easy to-use
data developments and data assessment contraptions for the programming language.
Python with Pandas is used in a wide extent of fields including academic and business
spaces including finance, monetary issue, Statistics, assessment, etc. In this informative
exercise we will get to know the various features of Python Pandas and how to use them
eventually.

Matplotlib:
Matplotlib is a plotting library for Python. It is used close by NumPy to give an
environment that is a feasible open source elective for mat lab can moreover be used with
plans toolboxs like PyQt and wx Python. From the matplotlib import pyplot as plt Here
pyplot() is the principle limit in matplotlib library which is used to plot 2D data. The
going with content plots the condition y = 2x + 5

Scipy:

21
SciPy contains modules for smoothing out, straight factor based math, joining, expansion,
exceptional limits, FFT, sign and picture getting ready, ODE solvers and various tasks in
the SciPy develops the NumPy bunch object and is significant for the NumPy stack which
fuses Tools like Matplotlib, pandas and SymPy, and a developing plan of legitimate
enrolling libraries. This NumPy stack has practically identical customers to various
applications like MATLAB, GNU Octave, and Scilab. The NumPy stack is moreover
sometimes suggested as the SciPy stack.

3.3 CONTENT DIAGRAM OF PROJECT

BLOCK DIAGRAM

Figure 3.3.1: Content Diagram

WORKING OF THE PROJECT:

3.3.1 DATASET
To download the dataset, we have utilized Kaggle.com and the downloaded dataset is in
CSV Format. As referenced in the above figure before Implementation of calculations we

22
need to examine total dataset. This dataset has 28,481 exchanges that are made b has
28,481 exchanges that are made by the clients. Subsequent to checking the exchanges,
they are characterized into two classes to be specific Fraudulent execute has 28,481
exchanges that are made by the clients. In the wake of checking the exchanges, they are
grouped into two classes in particular deceitful exchanges and non-Fraudulent exchanges.
A portion of the touchy data isn't given in the dataset as they are classified.

3.3.2 DATA PRE-PROCESSING


At the point when we talk about information, we generally think about some huge
datasets with countless lines and segments. While that is a possible situation, it isn't
generally the situation — information could be in such countless various structures:
Structured Tables, Images, Audio records, Videos, and so forth. Machines don't see free
content, picture, or video information for what it's worth, they get 1s and 0s. So it likely
will not be sufficient on the off chance that we put on a slideshow of every one of our
pictures and expect our AI model to get prepared just by that!

3.3.3 FEATURE EXTRACTION


Following thing is to do include taking out is a trait limiting interaction. Disparate
element determination, which positions the enduring credits according to their prescient
significance, include extraction really changes the traits. The changed properties, or
highlights, are straight concoction of the underlying credits. At last, our models are
prepared applying classifier calculation. We use index module on Natural Language
Toolkit library on Python. We utilize the marked dataset collected. The remainder of our
marked information will be utilized to assess the models. Scarcely any AI calculations
were utilized to partition pre-handled information.

3.3.5 CLASSIFIER SECTION

3.3.5.1 Local Outlier Factor


Nearby anomaly factor calculation is utilized to discover unusual information focuses by
assessing the neighborhood fluctuation of the given information focuses in correlation
with Neighbors. Territory can be found by utilizing closest neighbors whether far or
closer however the proportion of thickness can be found by utilizing their distance.

23
Peripheral examples are partitioned into two examples worldwide and nearby anomalies.
The item which is having huge distance to its neighbor is worldwide exception else it is
nearby anomalies.

3.3.5.2 Isolation Forest


This calculation is tree based and it is equipped for distinguishing anomalies. This
calculation is profoundly valuable and it is additionally unique in relation to every single
existing calculation and generally excellent performing models are assemble utilizing this
calculation. This calculation confines perceptions by choosing a component arbitrarily
and parting esteem among most extreme and least upsides of the element which we have
chosen.

24
CHAPTER-4
DESIGN

25
4.1 INTRODUCTION
Consequently we can presume that we should discover fake cases while working together
so the clients will not misfortune their cash and furthermore a large number of the
financial issues will be decreased. In this paper there are two AI methods as Local Outlier
factor and Isolation Forest Algorithm in Python. As the Isolation backwoods Algorithm
has most noteworthy Precision we may anticipate better outcomes in recognizing
extortion.Henceforth we can presume that we should discover false cases while working
together so the clients will not misfortune their cash and furthermore a significant number
of the financial issues will be diminished. In this paper there are two AI procedures as
Local Outlier factor and Isolation Forest Algorithm in Python. As the Isolation woodland
Algorithm has most elevated Precision we may anticipate better outcomes in
distinguishing fakes

4.2 ER DIAGRAMS

Use Case Diagram:


Use case charts are typically alluded to as conduct outlines used to depict a bunch of
activities (use cases) that some framework or frameworks (subject) ought to or can act in
a joint effort with at least one outside clients of the framework (entertainers). Each
utilization case ought to give some noticeable and important outcome to the entertainers
or different partners of the framework.

26
Fig 4.2.1 Usecase Diagram

Class Diagram:

The most generally utilized UML outline, and the main establishment of any item situated
arrangement. Classes inside a framework, traits and activities and the connection between
each class. Classes are assembled to make class graphs when outlining huge frameworks.

Fig 4.2.2 Class Diagram

27
Sequence Diagram:

Sequence diagrams, in any case called event diagrams or event circumstances, outline
how cycles work together with each other by showing calls between different things in a
plan. These layouts have two estimations: vertical and level. The vertical lines show the
gathering of messages and acquires successive solicitation, and the level parts show
object events where the messages are moved.

Fig 4.2.3 Sequence Diagram

28
Collaboration Diagram:

A joint effort graph, likewise called a correspondence chart or association outline is a


delineation of the connections and collaborations among programming objects in the
Unified Modeling Language (UML). The idea is over 10 years old despite the fact that it
has been refined as displaying ideal models have developed.

Fig 4.2.4 Collaboration Diagram

4.3 MODULE DESIGN AND ORGANIZATION

29
4.3.1 Organization of project

Local Outlier Factor


Nearby anomaly factor calculation is utilized to discover strange information focuses by
assessing the neighborhood inconstancy of the given information focuses in examination
with Neighbors. Region can be found by utilizing closest neighbors whether far or closer
however the proportion of thickness can be found by utilizing their distance. Distant
examples are partitioned into two examples worldwide and nearby anomalies. The item
which is having huge distance to its neighbor is worldwide exception else it is nearby
anomalies.

Isolation Forest

30
This calculation is tree based and it is equipped for distinguishing exceptions. This
calculation is exceptionally helpful and it is additionally not the same as every single
existing calculation and awesome performing models are construct utilizing this
calculation. This calculation separates perceptions by choosing an element haphazardly
and parting esteem among most extreme and least upsides of the element which we have
chosen.

4.3.2 ISOLATION FOREST ALGORITHM

4.4 CONCLUSION
Thus we can reason that we should discover false cases while working together so the
clients will not misfortune their cash and furthermore a significant number of the
financial issues will be diminished. In this paper there are two AI methods as Local
Outlier factor and Isolation Forest Algorithm in Python. As the Isolation timberland
Algorithm has most elevated Precision we may anticipate better outcomes in recognizing
fakes

31
CHAPTER-5
IMPLEMENTATION AND RESULTS

32
5.1 INTRODUCTION
Consequently we can presume that we should discover fake cases while working together
so the clients will not misfortune their cash and furthermore a large number of the
financial issues will be decreased. In this paper there are two AI methods as Local Outlier
factor and Isolation Forest Algorithm in Python. As the Isolation backwoods Algorithm
has most noteworthy Precision we may anticipate better outcomes in recognizing
extortion. Henceforth we can presume that we should discover false cases while working
together so the clients will not misfortune their cash and furthermore a significant number
of the financial issues will be diminished. In this paper there are two AI procedures as
Local Outlier factor and Isolation Forest Algorithm in Python. As the Isolation woodland
Algorithm has most elevated Precision we may anticipate better outcomes in
distinguishing fakes

5.2 IMPLEMENTATION OF KEY FUNCTIONS


Various arrangement occupations use straightforward evaluation measurements like
exactness to differentiate execution in models, since precision is basic gauge to execute
and sums up to in addition than simply paired marks. In any case, there is one significant
disservice of exactness that it is assumed that there is an equivalent portrayal of models
from all class, and for screwy datasets like for our situation exact is a deceptive
component. It doesn't give precise outcomes. So exactness is anything but a right
proportion of adequacy for our situation. To group the exchanges as extortion or non-
misrepresentation we need some other greatness of precision which are as:

 Precision
 Recall
 F1-score
 Support

These all norm of precision are rely on the genuine and conjecture class, so we draw a
2×2 disarray network to know extra about them.

True Positive (TP):


These qualities are remedially anticipated positive that implies worth of both
genuine class and anticipated class are YES.
33
True Negative (TN):
These qualities are restoratively anticipated Negative that implies worth of both
genuine class and anticipated class are NO.

False Positive (FP):


At the point when worth of genuine class is NO and esteem f anticipated class is
YES

False Negative (FN):


At the point when worth of genuine class is YES and worth f anticipated class is
NO. Bogus Positive and False Negative classes happen when genuine class repudiates
with anticipated class.

Precision: It is proportion of impeccably anticipated positive perceptions to


the anticipated positive perceptions.

Recall: it is proportion of consummately anticipated positive perceptions to


the all perceptions in real class YES.

F1 Score: It is the stacked normal of Precision and Recall. Along these lines
this score thinks about both bogus negatives and bogus positives.

Support: It is number of happenings of each class in right point esteems.


The few number of occurrences in the connected point esteems for each class is. The
detachment timberland showed the whole number of mistakes as 71 and the exact was
99.75 percent while Local Outlier Factor communicated the whole number of blunders as
97 and Accuracy as 99.65 percent.

34
5.3 METHOD OF IMPLEMENTATION

35
5.4 RESULTS

36
37
5.5 CONCLUSION
Consequently we can presume that we should discover fake cases while working together
so the clients will not misfortune their cash and furthermore a large number of the
financial issues will be decreased. In this paper there are two AI methods as Local Outlier
factor and Isolation Forest Algorithm in Python. As the Isolation backwoods Algorithm
has most noteworthy Precision we may anticipate better outcomes in recognizing
extortion

38
CHAPTER-6
TESTING AND VALIDATION

39
6.1 INTODUCTION

INTRODUCTION TO TESTING

Coming up next are the Testing Methodologies:


 Unit Testing.
 Integration Testing.
 Client Acceptance Testing.
 Output Testing.

6.2 TYPES OF TESTING

UNIT TESTING
Unit testing centers confirmation exertion around the littlest unit of Software plan that is
the module. Unit testing practices explicit ways in a module’s control design to guarantee
total inclusion and most extreme blunder identification. This test centers around every
module separately, guaranteeing that it capacities appropriately as a unit. Thus, the
naming is Unit Testing.
During this testing, every module is tried independently and the module interfaces are
checked for the consistency with plan determination. Exceedingly significant handling
way are tried for the normal outcomes. All blunder dealing with ways are likewise tried

Integration Testing
Integration testing addresses the issues associated with the dual problems of verification
and program construction. After the software has been integrated a set of high order tests
are conducted. The main objective in this testing process is to take unit tested modules
and builds a program structure that has been dictated by design

Client Acceptance Testing


Client Acceptance of a framework is the critical factor for the achievement of any
framework. The framework viable is tried for client acknowledgment by continually
staying in contact with the planned framework clients at the hour of creating and making

40
changes any place required. The framework created gives an amicable UI that can
undoubtedly be seen even by another individual to the framework.

Output Testing
In the wake of playing out the approval testing, the following stage is yield trying of the
proposed framework, since no framework could be valuable in the event that it doesn’t
create the necessary yield in the predefined design. Getting some information about the
organization needed by them tests the yields produced or showed by the framework
viable. Subsequently the yield design is considered 2ly – one is on screen and another in
printed design.

6.3 VALIDATION

The final step involves validation testing, which determines whether the software
function as the user expected. The end-user rather than the system developer conduct this
test most software developers as a process called “Alpha and Beta Testing” to uncover
that only the end user seems able to find. The compilation of the entire project is based on
the full satisfaction of the end users. In the project, validation testing is made in various
forms.

6.4 CONCLUSION

Henceforth we can presume that we should discover false cases while working together
so the clients will not misfortune their cash and furthermore a significant number of the
financial issues will be diminished. In this paper there are two AI procedures as Local
Outlier factor and Isolation Forest Algorithm in Python. As the Isolation woodland
Algorithm has most elevated Precision we may anticipate better outcomes in
distinguishing fakes

41
CHAPTER-7

CONCLUSION

42
7.1 CONCLUSION

Hence we can conclude that we must definitely find fraudulent cases while doing
business so that the customers won’t loss their money and also many of the banking
issues will be reduced. In this paper there are two machine learning techniques as Local
Outlier factor and Isolation Forest Algorithm in Python. As the Isolation forest Algorithm
has highest Precision we may expect better results in detecting frauds

43
CHAPTER-8
REFERENCES

44
REFERENCES

[1] Sudhamathy G: Credit Risk Survey and Forecast Modelling of Bank Loans Using
R, vol. 8, no-5, pp. 1954-1966.
[2] Sitaram patel, Sunita Gond , “Supervised Machine (SVM) Learning for Credit
Card Fraud Detection, International of engineering trends and technology, vol. 8,
no. -3, pp. 137- 140, 2014.
[3] E. Duman and Y. Sahin, “Finding Credit Card Scam by Decision Trees and
Support Vector Machines (SVM), Happenings of International Numerous
Conference of Engineers and Computer Scientists, vol. I, 2011.
[4] Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior Member, IEEE,
“BLAST-SSAHA Hybridization for Credit Card Fraud Detection”, vol. 6, no. 4
pp. 309-315, 2009.
[5] LI Changjian, HU Peng: Credit Risk Assessment for ural Credit Cooperatives
based on Improved Neural Network, International Conference on Smart Grid and
Electrical Automation vol. 60, no. – 3, pp 227-230, 2017.
[6] Wei Sun, Chen-Guang Yang, Jian-Xun Qi: Credit Risk Assessment in
Commercial Banks Based On Support Vector Machines, vol.6, pp 2430-2433,
2006.
[7] Jyoti Gaikwad, Amruta Deshmane, Snehal Patil, Harshada Somavanshi, Rinku
Badgujar,” Credit Card Scam Detection Using Decision Tree Induction
Algorithm, International Journal of CS and Mobile Computing, Vol.4 Issue.4,
April- 2015, pg. 92-95
[8] Dahee Choi and Kyungho Lee, “Machine Learning based Approach to Financial
Fraud Detection Process in Mobile Payment System”, vol. 5, no. – 4, December
2017, pp. 12-24

45
Published Paper

46
47
48
49
50
51
52

You might also like