100% found this document useful (1 vote)
136 views16 pages

Credit Card EDA: Authored by

The document presents an analysis of credit card application data to identify patterns indicating whether an applicant would have difficulty paying installments. Various data exploration and visualization techniques were used to analyze correlations between attributes and compare defaulters and non-defaulters. The analysis found that younger, unmarried applicants who recently started a job, have low education levels, smaller loan amounts, no children, and lower external source scores were more likely to default on payments.

Uploaded by

ARTEMIS STYLES
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
136 views16 pages

Credit Card EDA: Authored by

The document presents an analysis of credit card application data to identify patterns indicating whether an applicant would have difficulty paying installments. Various data exploration and visualization techniques were used to analyze correlations between attributes and compare defaulters and non-defaulters. The analysis found that younger, unmarried applicants who recently started a job, have low education levels, smaller loan amounts, no children, and lower external source scores were more likely to default on payments.

Uploaded by

ARTEMIS STYLES
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 16

Authored by:

Rajashree Roy & Vyas


Credit Card
Chauhan
EDA
This case study aims to identify patterns which
indicate if a applicant has difficulty paying their
installments. Identification of aforementioned
patterns ( Indicators or driving factors) can help
the credit card company to identify applicants
who are likely to default.

Introduction
This presentation contains our approach to the
analysis and aim to
share findings which can help company to
identify applicants who are likely to default the
loan
Approach to case study
1. Data activities such as data import, cleaning, manipulation are done using pandas(pd)
library in python
2. Data inspection activity is done using
1. pd.info()
2. pd.head()
3. pd.describe()
3. Outliers detection, in general, in this case study is done using
1. Box plot for numerical data
2. pd.value_counts() for categorical data
4. Outlier removal is done using pd.quantile() method*
5. Data Visualization activities are done using matplotlib, seaborn and pandas. Visualization
technique widely used is
1. Correlation heatmaps
2. Countplots
3. distplot ( histogram)
4. Barplots, etc
6. Understanding trend and different comparisons led to comprehend the analysis and
formulate statements.
* Not all of outliers were eliminated. We chose to eliminate only those outliers which were very far off the group of observations.
Specific cases had there few outliers removed
Data Imbalance
 Data imbalance refers to a problem associated
with classification in machine learning ecosystem

 Data is considered to be imbalanced when there’s


large amount of data (observations) for one class
much fewer observations for one or more other
classes

 Imbalance of data can introduce bias in the


classifications by machine learning.

 In this case study we’ve found out the data


imbalance is very significant in the ‘TARGET’
attribute of ‘application_data.csv’
 Data is imbalanced by 11:1
Heat map based on correlation between sample
attributes from the current application
 First we plotted heatmap
to identify correlations
 We took this approach in
order to find out the
correlation between
sample attributes from the
current application
 Observations:
 AMT_ANNUITY
& AMT_GOODS_PRICE are
significantly linearly
correlated
 AMT_CREDIT
& AMT_GOODS_PRICE are
also correlated
How many times does the current defaulter
applied before already ?
 Quantile method to deal with outliers, removing
top 2% percentile of data
 There are 4400 applicants who've applied only
once before
 But most of the current defaulters have tried to
apply two or more times.
Family status comparison of defaulters vs non-defaulters

Number of family members of the current applicants Number of children of the current applicants with
with respect to applicant's difficulty to pay installments respect to applicant's difficulty to pay installments

 Maximum applicants with payment difficulty have mostly 2 family


members
 Probability of defaulting is highest in the case of no children.
Level of education of
Target – 1 & 0
 Defaulters are likely to not
completed their higher
education
Total income vs Income type of Target – 1 & 0

 According to the figure -


 Most applicants with high income belong to the commercial associate group
 Applicants from working income group have similar total income irrespective of their difficulty of
payment
 No students are defaulters unlike applicants on maternity leave.
- Contract status comparison of applicants with records
- Contract status comparison based on gender

 According to the graphs shown above -


 Females who have no difficulty in payment are most likely to have their contracts approved
 Most applicants who have difficulty in payment, do not have their contracts approved
compared to other applicants
Can applicant’s age be an indicator of being a defaulter?

 Yes, it can be
 Number of applicants with financial difficulties
gradually decreases with the age increases
 most applicants with financial difficulties are around
the age group of 25 to 35 years
Credit amount of defaulters & non-defaulters
Credit amount of all defaulters Credit amount of all non-defaulters

 Almost all of the loan defaulters


have a credit less than 15L INR
 Up to 75% of all loan defaulters
have a credit amount up to 7L INR
 Around 50% of total defaulter have
taken loan amount of INR 448600
Applicants with previous record, having
difficulty to pay installments

Other Applicants with previous record

 Applicants with likely to be defaulters are the one


who are new to the current job.
 Around 85% of all defaulters have started there
current job around 2500 days or 6.8 year ago
Applicants with payment difficulty vs others, i.e. Target-1 vs Target – 0
with respect to ‘EXT_SOURCE’

 Higher the EXT_SOURCE_1 score, lower the


chances of applicant to be a defaulter.
 Lower EXT_SOURCE_2 score higher the
chances to default.
 Lower the EXT_SOURCE_3 score, higher
chances of applicant to default
Family status of the applicant when they were applying for
the loan

 Most of the applicants who have no difficulty in payment are married


 Young applicant ( around 25 to 35 year old), unmarried
and who have started there current job recently (less
than 6 years) are more likely to default.
 Applicants who’ve not completed secondary or higher
education are more likely to default
Final Outcome of  When the loan credit amount is lower than 4.5 Lac INR
Analysis:  Applicants with no children are likely to default

Risky or Safe bet?  Applicant with higher EXT_SOURCE_1 score are safe bet
 Lower EXT_SOURCE_2 and EXT_SOURCE_3 score are
more likely to default

You might also like