Credit Card EDA: Authored by
Credit Card EDA: Authored by
Introduction
This presentation contains our approach to the
analysis and aim to
share findings which can help company to
identify applicants who are likely to default the
loan
Approach to case study
1. Data activities such as data import, cleaning, manipulation are done using pandas(pd)
library in python
2. Data inspection activity is done using
1. pd.info()
2. pd.head()
3. pd.describe()
3. Outliers detection, in general, in this case study is done using
1. Box plot for numerical data
2. pd.value_counts() for categorical data
4. Outlier removal is done using pd.quantile() method*
5. Data Visualization activities are done using matplotlib, seaborn and pandas. Visualization
technique widely used is
1. Correlation heatmaps
2. Countplots
3. distplot ( histogram)
4. Barplots, etc
6. Understanding trend and different comparisons led to comprehend the analysis and
formulate statements.
* Not all of outliers were eliminated. We chose to eliminate only those outliers which were very far off the group of observations.
Specific cases had there few outliers removed
Data Imbalance
Data imbalance refers to a problem associated
with classification in machine learning ecosystem
Number of family members of the current applicants Number of children of the current applicants with
with respect to applicant's difficulty to pay installments respect to applicant's difficulty to pay installments
Yes, it can be
Number of applicants with financial difficulties
gradually decreases with the age increases
most applicants with financial difficulties are around
the age group of 25 to 35 years
Credit amount of defaulters & non-defaulters
Credit amount of all defaulters Credit amount of all non-defaulters
Risky or Safe bet? Applicant with higher EXT_SOURCE_1 score are safe bet
Lower EXT_SOURCE_2 and EXT_SOURCE_3 score are
more likely to default