Credit Card Frauds
Credit Card Frauds
Credit Card Frauds
On
1
2009-2013
BONAFIDE CERTIFICATE
This is to certify that the project work entitled “CREDIT CARD FRAUD
DETECTION USING ISOLATION FOREST ALGORITHM” is a bonafide work
carried out by
M. KEERTHI - (17699A0527)
M. SATISH - (17699A0547)
C. SREE HARITHA - (17699A0551)
V. THULASIRAM - (17699A0557)
Submitted in partial fulfillment of the requirements for the award of degree Bachelor of
Technology in the stream of Computer Science & Engineering in Madanapalle
Institute of Technology and Science, Madanapalle, affiliated to Jawaharlal Nehru
Technological University Anantapur, Ananthapuramu during the academic year
2020-2020
2
ACKNOWLEDGEMENT
We sincerely thank Dr. C. Yuvaraj, M.E., Ph.D., Principal for guiding and
providing facilities for the successful completion of our project at Madanapalle Institute
of Technology and Science, Madanapalle.
3
RECOGNISED RESEARCH CENTER
This is to certify that the B. Tech Project report titled, “CREDIT CARD FRAUD
based on the analysis report generated by the software, the report’s similarity index is
found to be 20%.
The following is the URKUND report for the project report consisting of 38 pages.
Dean RRC
4
DECLARATION
Date :
Place :
PROJECT ASSOCIATES
M. KEERTHI
M. SATISH
C. SREE KARITHA
V. THULASIRAM
I certify that above statement made by the students is correct to the best of my
knowledge.
Date : Guide :
5
INDEX
1. INTRODUCTION 1
1.1 Motivation 2
1.2 Problem Definition 2
1.3 Objective of the Project 2
1.4 Organization of Documentation 3
2. LITERATURE SURVEY 5
2.1 Introduction 6
2.2 Existing System 7
2.3 Disadvantages of Existing System 7
2.4 Proposed System 7
2.5 Advantages over Existing System 7
3. ANALYSIS 8
3.1Introduction 9
3.2 Software Requirement Specification 11
3.3 Content diagram of Project 27
4. DESIGN 29
4.1 Introduction 30
4.2 ER/UML Diagrams 30
4.3 Module Design and Organization 31
4.4 Conclusion 32
5. IMPLEMENTATION AND RESULTS 33
5.1 Introduction 34
5.2 Implementation of key functions 34
5.3 Method of Implementation 35
5.4 Output Screens and Result Analysis 44
5.5 Conclusion 50
List of Figures/Screens/Tables
7
8 5.3.1 Correlation matrix 30
8
ABSTRACT
The project is mainly focused on credit card fraud detection in real world. Increase in
number of transactions by number of people has created many fraudulent acts. The
motive of people who make those frauds is to get goods without paying, or to get funds
from an account without the permission of an Account Holder. Implementing fraud
detection systems has become more important for each credit card issuing banks to
reduce their losses and also to reduce risk. The most challenging situation in business is
that both the card and card holders will not be present at that time of purchase. This
makes it impractical for the merchant to confirm whether the customer is original card
holder or not. Many of the Researchers have proposed many algorithms in order to detect
Frauds to reduce loss. In this paper comparison of Local Outlier Factor and Isolation
Forest Algorithms has been proposed with the clear experimental results and
methodologies. After complete analysis we got Accuracy for of 97% for the local outlier
factor and 76% for the Isolation algorithm.
9
CHAPTER-1
INTRODUCTION
10
1.1 MOTIVATION
Long lines have been found in the grocery store, at the hour of shopping clients are
discovering numerous issues as deficient equilibrium, cash misfortune and so forth
alongside this they need to choose a best item among every one of the items. So to make
this interaction simple Visas were presented and these Visas helps a great deal in making
our works quick and there is additionally a burden of utilizing these cards as there are
more fakes occurring with those Mastercards.
Credit card fraud detection is hard but most useful in order to prevent higher loses or to
reduce them by using Fraud detecting algorithms we can write fraud detecting algorithms
using Artificial Intelligence, Fuzzy Logic, Data Mining and Machine Learning but among
all the mentioned algorithms Machine Learning is the best algorithm for implementing
credit card fraud detection. So that in our proposed System we are interested to choose
Machine Learning. Large amount of data has been transferred during online transactions.
Credit cards are used mostly for shopping in everyday life with the vast usage of credit
cards, credit cards frauds were also increasing. Hence, we can say that this Theft is one of
the biggest risks to the customers as well as shop keepers and mainly this will bring
problem to banks and these issues are now a days became common as the card holder will
know that he has lost his money only after the transaction and in this case knowing after
transaction is useless. In this paper we are going to prove that by there are algorithms
which can be used to reduce these types of frauds by using many performance metrics as
F1 Score, Recall, Accuracy and precision these metrics will help us to know the
variations in algorithms and by knowing these metrics implementation of Algorithm
based on situation will be carried smoothly without any problem. Frauds to credit cards
mainly occurs in online shopping also for a reason that online buying requires very less
information so by using Local Outlier factor and Isolation Forest Algorithms we can
reduce these types of frauds.
11
1.3 OBJECTIVES OF PROJECT
The Credit Card Fraud Detection Problem incorporates demonstrating past Mastercard
exchanges with the information on the ones that ended up being misrepresentation. This
model is then used to recognize if another exchanges is real or not. Our point here is to
identify deceitful exchanges with limiting the inaccurate extortion characterizations
Starter assessment dissects project plausibility; the likelihood the structure will be useful
to the affiliation. The rule objective of the feasibility study is to test the Technical,
Operational and Economical common sense for adding new modules and examining old
running system. All systems are feasible if they are given boundless resources and
unfathomable time. There are points in the reachability concentrate part of the
preliminary assessment:
Technical Feasibility
Operation Feasibility
Economic Feasibility
Technical Feasibility
The particular issue regularly raised during the feasibility period of the assessment joins
the going with.
Operation Feasibility
The functional believability consolidates User very much arranged, enduring quality,
security, mobility, availability and reasonableness of the item used in the endeavor.
Economic Feasibility
12
Assessment of an errand costs and pay with an ultimate objective to choose whether or
not it is predictable and possible to wrap up.
CHAPTER-2
LITERATURE SURVEY
13
2.1 INTRODUCTION
Visa misrepresentation identification has numerous calculations in AI they were of 2
kinds as classifiers Algorithm and relapse Algorithms. Calculations can be carried out
dependent on the issue that we are confronting and there calculations will have execution
measurements too to realize which is best as referenced beneath we as of now have a few
calculations existing for certain non-working things.
1. In this paper another collative differentiation to compute that sensibly addresses the
benefits and misfortunes because of misrepresentation discovery is proposed.
2. A value touchy strategy which depends on Bayes least danger is introduced utilizing
the proposed value measure.
14
2.4 PROPOSED SYSTEM
15
CHAPTER-3
ANALYSIS
16
3.1 INTRODUCTION
Mastercard extortion recognition is hard however generally valuable to forestall higher
loses or to decrease them by utilizing Fraud identifying calculations we can compose
misrepresentation recognizing calculations utilizing Artificial Intelligence, Fuzzy Logic,
Data Mining and Machine Learning yet among every one of the referenced calculations
Machine Learning is the best calculation for carrying out Mastercard extortion location.
So that in our proposed System we are intrigued to pick Machine Learning. Enormous
measure of information has been moved during on the web exchanges. Mastercards are
utilized for the most part for shopping in regular daily existence with the huge utilization
of Visas, Visas cheats were additionally expanding. Thus, we can say that this Theft is
probably the greatest danger to the clients just as businesspeople and basically this will
carry issue to banks and these issues are presently a days became normal as the card
holder will realize that he has lost his cash solely after the exchange and for this situation
knowing after exchange is futile. In this paper we will demonstrate that by there are
calculations which can be utilized to diminish these sorts of fakes by utilizing numerous
presentation measurements as F1 Score, Recall, Accuracy and exactness these
measurements will assist us with knowing the varieties in calculations and by knowing
these measurements execution of Algorithm dependent on circumstance will be conveyed
easily with no issue. Fakes to charge cards fundamentally happens in web based shopping
likewise for an explanation that internet purchasing requires extremely less data so by
utilizing Local Outlier factor and Isolation Forest Algorithms we can diminish these kinds
of cheats.
17
HARDWARE REQUIREMENTS
• Processor - 64 bit
• RAM-8GB
• Hard Disk - 1TB
• OS - Windows 10
• Key Board - Standard Windows Keyboard
Hardware Description
Read-only memory: (ROM) is a sort of non-unstable memory utilized in PCs and other
electronic gadgets. 64-bit processor: this processor is a microprocessor and it has a word
size of 64 Pieces is taken for this task. Arbitrary access memory A sporadic access
memory contraption licenses data things to be scrutinized or written in for all intents and
purposes a comparative proportion of time autonomous of the real space of data inside the
memory. Strangely, with other direct-access data accumulating media like hard circles,
CD-RWs, DRW and the more settled appealing tapes and memory, the time expected to
scrutinize and create data things vacillates basically on their depending genuine
obstructions, for instance, media turn rates and arm advancement. Peruse just Memory
(ROM) is a kind of non-flighty memory used in PCs and other electronic contraptions.64-
digit processor: this processor is a microprocessor and it's everything except a word size
of 64.
SOFTWARE REQUIREMENTS
Jupiter NoteBook
Anaconda
Sckit-Learn
Scipy
18
Packages:
Pandas
Matplotlib
Numpy
Software Description
Anaconda Python:
Boa constrictor scattering goes with more than 1,000 data packages similarly as the
Conda group and virtual environment chairman, called Anaconda Navigator, so it takes
out the need to sort out some way to present each library self-sufficiently .The open
source data packages can be independently presented from the Anaconda store with the
conda install request or using the pip present request that is presented with Anaconda. Pip
packages give an impressive parcel of the features of conda packs and all around they can
.collaborate You can similarly make your own custom groups using the conda amass the
request and you can grant them to others by moving them to Anaconda Cloud, PyPi or
other repositories. The default foundation of Anaconda2 fuses Python 2.7 and Anaconda3
fuses Python 3.6.However, you can build up new conditions that consolidate any
variation of Python packaged with conda.
Anaconda Navigator:
Anaconda Navigator is a workspace graphical UI (GUI) associated with Anaconda
dispersal that licenses you to dispatch applications and direct conda packs, and the
conditions and channels without using request line orders. Guide would boa constrictor
have the option to Cloud or in a local Anaconda Repository, present them in an
environment, the Run the groups and update them. It is open for Windows macOS and
other Linux Navigator is thusly presented when you present Anaconda variation 4.0.0 or
higher. The going with applications are open normally in Navigator.
• QTConsole
• Spyder
• VSCode
• Glueviz
Jupyter notebook
19
• Orange 3 App
• Rodeo
• RStudio
Conda:
Conda is an open source, cross-platform,language-agnostic[13] pack boss and
Environment the board structure that presents, runs, and updates groups and their
conditions.
Anaconda Cloud:
Anaconda Cloud is a pack the chief’s organization by Anaconda where you can find,
store and offer public and private scratch cushion, conditions, and conda and PyPI
groups. Cloud has significant Python groups, scratch pads and conditions for a wide
variety of employments. You don't need to sign in or to have a cloud account, to search
for public packages, download and present them. You can gather new packages using the
Anaconda Client request line interface (CLI), then truly or normally move the groups to
cloud.
Scikit-Learn:
• Simple and capable gadgets for data mining and data assessment
• Accessible to everybody and reusable in various settings
20
• Built on Numpy,Scipy and Matplotlib
• It is a group open to import all estimation classes
• Syntax: from sklearn import classname
Eg: from sklearn import KNeighbors Classifier
Python_Packages:
Numpy:
Numpy is clearly extraordinary compared to other Mathematical and Scientific figuring
library for Python. Tensorflow and various stages use Numpy inside for acting in the
couple of strategy on Tensors. Maybe the fundamental segment of Numpy is it's the
Array interface. This interface can be used to impart pictures, sound waves in or some
other rough equal streams as assortments of real numbers with N estimations. Data on
Numpy is especially huge for Machine Learning and Data Science.
Pandas:
Pandas is an open-source, BSD-approved Python library giving world class easy to-use
data developments and data assessment contraptions for the programming language.
Python with Pandas is used in a wide extent of fields including academic and business
spaces including finance, monetary issue, Statistics, assessment, etc. In this informative
exercise we will get to know the various features of Python Pandas and how to use them
eventually.
Matplotlib:
Matplotlib is a plotting library for Python. It is used close by NumPy to give an
environment that is a feasible open source elective for mat lab can moreover be used with
plans toolboxs like PyQt and wx Python. From the matplotlib import pyplot as plt Here
pyplot() is the principle limit in matplotlib library which is used to plot 2D data. The
going with content plots the condition y = 2x + 5
Scipy:
21
SciPy contains modules for smoothing out, straight factor based math, joining, expansion,
exceptional limits, FFT, sign and picture getting ready, ODE solvers and various tasks in
the SciPy develops the NumPy bunch object and is significant for the NumPy stack which
fuses Tools like Matplotlib, pandas and SymPy, and a developing plan of legitimate
enrolling libraries. This NumPy stack has practically identical customers to various
applications like MATLAB, GNU Octave, and Scilab. The NumPy stack is moreover
sometimes suggested as the SciPy stack.
BLOCK DIAGRAM
3.3.1 DATASET
To download the dataset, we have utilized Kaggle.com and the downloaded dataset is in
CSV Format. As referenced in the above figure before Implementation of calculations we
22
need to examine total dataset. This dataset has 28,481 exchanges that are made b has
28,481 exchanges that are made by the clients. Subsequent to checking the exchanges,
they are characterized into two classes to be specific Fraudulent execute has 28,481
exchanges that are made by the clients. In the wake of checking the exchanges, they are
grouped into two classes in particular deceitful exchanges and non-Fraudulent exchanges.
A portion of the touchy data isn't given in the dataset as they are classified.
23
Peripheral examples are partitioned into two examples worldwide and nearby anomalies.
The item which is having huge distance to its neighbor is worldwide exception else it is
nearby anomalies.
24
CHAPTER-4
DESIGN
25
4.1 INTRODUCTION
Consequently we can presume that we should discover fake cases while working together
so the clients will not misfortune their cash and furthermore a large number of the
financial issues will be decreased. In this paper there are two AI methods as Local Outlier
factor and Isolation Forest Algorithm in Python. As the Isolation backwoods Algorithm
has most noteworthy Precision we may anticipate better outcomes in recognizing
extortion.Henceforth we can presume that we should discover false cases while working
together so the clients will not misfortune their cash and furthermore a significant number
of the financial issues will be diminished. In this paper there are two AI procedures as
Local Outlier factor and Isolation Forest Algorithm in Python. As the Isolation woodland
Algorithm has most elevated Precision we may anticipate better outcomes in
distinguishing fakes
4.2 ER DIAGRAMS
26
Fig 4.2.1 Usecase Diagram
Class Diagram:
The most generally utilized UML outline, and the main establishment of any item situated
arrangement. Classes inside a framework, traits and activities and the connection between
each class. Classes are assembled to make class graphs when outlining huge frameworks.
27
Sequence Diagram:
Sequence diagrams, in any case called event diagrams or event circumstances, outline
how cycles work together with each other by showing calls between different things in a
plan. These layouts have two estimations: vertical and level. The vertical lines show the
gathering of messages and acquires successive solicitation, and the level parts show
object events where the messages are moved.
28
Collaboration Diagram:
29
4.3.1 Organization of project
Isolation Forest
30
This calculation is tree based and it is equipped for distinguishing exceptions. This
calculation is exceptionally helpful and it is additionally not the same as every single
existing calculation and awesome performing models are construct utilizing this
calculation. This calculation separates perceptions by choosing an element haphazardly
and parting esteem among most extreme and least upsides of the element which we have
chosen.
4.4 CONCLUSION
Thus we can reason that we should discover false cases while working together so the
clients will not misfortune their cash and furthermore a significant number of the
financial issues will be diminished. In this paper there are two AI methods as Local
Outlier factor and Isolation Forest Algorithm in Python. As the Isolation timberland
Algorithm has most elevated Precision we may anticipate better outcomes in recognizing
fakes
31
CHAPTER-5
IMPLEMENTATION AND RESULTS
32
5.1 INTRODUCTION
Consequently we can presume that we should discover fake cases while working together
so the clients will not misfortune their cash and furthermore a large number of the
financial issues will be decreased. In this paper there are two AI methods as Local Outlier
factor and Isolation Forest Algorithm in Python. As the Isolation backwoods Algorithm
has most noteworthy Precision we may anticipate better outcomes in recognizing
extortion. Henceforth we can presume that we should discover false cases while working
together so the clients will not misfortune their cash and furthermore a significant number
of the financial issues will be diminished. In this paper there are two AI procedures as
Local Outlier factor and Isolation Forest Algorithm in Python. As the Isolation woodland
Algorithm has most elevated Precision we may anticipate better outcomes in
distinguishing fakes
Precision
Recall
F1-score
Support
These all norm of precision are rely on the genuine and conjecture class, so we draw a
2×2 disarray network to know extra about them.
F1 Score: It is the stacked normal of Precision and Recall. Along these lines
this score thinks about both bogus negatives and bogus positives.
34
5.3 METHOD OF IMPLEMENTATION
35
5.4 RESULTS
36
37
5.5 CONCLUSION
Consequently we can presume that we should discover fake cases while working together
so the clients will not misfortune their cash and furthermore a large number of the
financial issues will be decreased. In this paper there are two AI methods as Local Outlier
factor and Isolation Forest Algorithm in Python. As the Isolation backwoods Algorithm
has most noteworthy Precision we may anticipate better outcomes in recognizing
extortion
38
CHAPTER-6
TESTING AND VALIDATION
39
6.1 INTODUCTION
INTRODUCTION TO TESTING
UNIT TESTING
Unit testing centers confirmation exertion around the littlest unit of Software plan that is
the module. Unit testing practices explicit ways in a module’s control design to guarantee
total inclusion and most extreme blunder identification. This test centers around every
module separately, guaranteeing that it capacities appropriately as a unit. Thus, the
naming is Unit Testing.
During this testing, every module is tried independently and the module interfaces are
checked for the consistency with plan determination. Exceedingly significant handling
way are tried for the normal outcomes. All blunder dealing with ways are likewise tried
Integration Testing
Integration testing addresses the issues associated with the dual problems of verification
and program construction. After the software has been integrated a set of high order tests
are conducted. The main objective in this testing process is to take unit tested modules
and builds a program structure that has been dictated by design
40
changes any place required. The framework created gives an amicable UI that can
undoubtedly be seen even by another individual to the framework.
Output Testing
In the wake of playing out the approval testing, the following stage is yield trying of the
proposed framework, since no framework could be valuable in the event that it doesn’t
create the necessary yield in the predefined design. Getting some information about the
organization needed by them tests the yields produced or showed by the framework
viable. Subsequently the yield design is considered 2ly – one is on screen and another in
printed design.
6.3 VALIDATION
The final step involves validation testing, which determines whether the software
function as the user expected. The end-user rather than the system developer conduct this
test most software developers as a process called “Alpha and Beta Testing” to uncover
that only the end user seems able to find. The compilation of the entire project is based on
the full satisfaction of the end users. In the project, validation testing is made in various
forms.
6.4 CONCLUSION
Henceforth we can presume that we should discover false cases while working together
so the clients will not misfortune their cash and furthermore a significant number of the
financial issues will be diminished. In this paper there are two AI procedures as Local
Outlier factor and Isolation Forest Algorithm in Python. As the Isolation woodland
Algorithm has most elevated Precision we may anticipate better outcomes in
distinguishing fakes
41
CHAPTER-7
CONCLUSION
42
7.1 CONCLUSION
Hence we can conclude that we must definitely find fraudulent cases while doing
business so that the customers won’t loss their money and also many of the banking
issues will be reduced. In this paper there are two machine learning techniques as Local
Outlier factor and Isolation Forest Algorithm in Python. As the Isolation forest Algorithm
has highest Precision we may expect better results in detecting frauds
43
CHAPTER-8
REFERENCES
44
REFERENCES
[1] Sudhamathy G: Credit Risk Survey and Forecast Modelling of Bank Loans Using
R, vol. 8, no-5, pp. 1954-1966.
[2] Sitaram patel, Sunita Gond , “Supervised Machine (SVM) Learning for Credit
Card Fraud Detection, International of engineering trends and technology, vol. 8,
no. -3, pp. 137- 140, 2014.
[3] E. Duman and Y. Sahin, “Finding Credit Card Scam by Decision Trees and
Support Vector Machines (SVM), Happenings of International Numerous
Conference of Engineers and Computer Scientists, vol. I, 2011.
[4] Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior Member, IEEE,
“BLAST-SSAHA Hybridization for Credit Card Fraud Detection”, vol. 6, no. 4
pp. 309-315, 2009.
[5] LI Changjian, HU Peng: Credit Risk Assessment for ural Credit Cooperatives
based on Improved Neural Network, International Conference on Smart Grid and
Electrical Automation vol. 60, no. – 3, pp 227-230, 2017.
[6] Wei Sun, Chen-Guang Yang, Jian-Xun Qi: Credit Risk Assessment in
Commercial Banks Based On Support Vector Machines, vol.6, pp 2430-2433,
2006.
[7] Jyoti Gaikwad, Amruta Deshmane, Snehal Patil, Harshada Somavanshi, Rinku
Badgujar,” Credit Card Scam Detection Using Decision Tree Induction
Algorithm, International Journal of CS and Mobile Computing, Vol.4 Issue.4,
April- 2015, pg. 92-95
[8] Dahee Choi and Kyungho Lee, “Machine Learning based Approach to Financial
Fraud Detection Process in Mobile Payment System”, vol. 5, no. – 4, December
2017, pp. 12-24
45
Published Paper
46
47
48
49
50
51
52