0% found this document useful (0 votes)

64 views56 pages

Business Analytics

Uploaded by

oitdeepak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views56 pages

Business Analytics

Uploaded by

oitdeepak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Business Analytics

Business Analytics Life Cycle

Learning Objectives

By the end of this lesson, you will be able to:

Explore business analytics life cycle phases to define the

roadmap and achieve business goals

List the challenges faced in each phase

Perform and outline business analytics life cycle phases with

the help of loan default prediction case study
Business Analytics Life Cycle (BALC)
Introduction

BALC is a framework that describes the process of using data and analytics to drive business decisions.
The phases involved are:

Data exploration Data modeling

Data collection Data deployment

Business Monitoring and

understanding maintenance
Business Understanding: Overview

This phase involves understanding and addressing the business problem or opportunity.

Identifying the
stakeholders

Establishing the Defining the scope

goals and objectives of the problem
Business Understanding: Example

A retail company wants to improve its customer retention. The phase would involve:

Defining the problem Defining success metrics

Identifying stakeholders Identifying data sources

Establishing the scope Determining objectives

Business Understanding: Challenges

Insufficient Inadequate
Ambiguous Limited data Frequent changes
domain stakeholder
problem definition availability in business needs
expertise involvement
Data Collection: Overview

Data is collected from various sources, including internal and external sources.

Cleanse Integrate

Steps for
preparing data for
analysis:

Transform
Data Collection: Example

After completing the business understanding phase, the retail company will collect data.

Customer transaction Social media

Data
Demographic collected can Competitor
be related to:

Customer feedback Website traffic

Data Collection: Challenges

Data quality Data security Data integration Data volume

Data availability
and privacy

To overcome these challenges, it is essential to have a structured approach to data collection.

Data Exploration: Overview

The goal of data exploration is to gain insights and identify patterns, trends, and outliers that can
inform subsequent analysis.

Data
visualization

Descriptive Correlation
statistics analysis
Data
exploration
techniques:

Data cleaning Outlier detection

Data Exploration: Example

After completing the data collection phase, the retail company will explore the collected data.

Examples of data exploration techniques are:

Descriptive statistics: Calculate summary statistics

Data visualization: Create visualizations

Correlation analysis: Calculate correlation coefficients

between variables

Outlier detection: Identify and investigate outliers

Data cleaning: Identify and address missing values

Data Exploration: Challenges

Bias and Data privacy and

Data quality issues Data complexity Time constraints
subjectivity security
Data Modeling: Overview

It creates a mathematical representation of the data that captures the relationships

between different variables.

Descriptive Predictive

Types of data
models:

Prescriptive
Data Modeling: Example

After completing the data exploration phase, the retail company can use the following
data modeling approaches:

Define the problem Apply the model

Clean and preprocess the

Validate the model
data

Select the modeling

Train the model
technique
Data Modeling: Challenges

Data privacy and

Data quality
security

Overfitting Interpretability

Underfitting Model selection

Deployment: Overview

Model deployment is the process of integrating a data model into a production environment to
generate predictions or support decision-making. It involves:

Preparing the
model

Selecting a
Monitoring and
deployment
maintenance
environment

Testing and Integrating with

validation other systems
Deployment: Challenges

Model drift

Security User adoption

Scalability Regulatory compliance

Integration with
existing systems Data governance

A successful model deployment requires planning, testing, and maintenance to meet the needs.
Monitoring and Maintenance: Overview

It is essential for ensuring the accuracy, reliability, and usefulness of data-driven insights.

Performance
monitoring

Data quality
Model validation
monitoring
Some key
considerations
are:

Continuous Data security

improvement

It is an ongoing process that requires regular attention and adjustment.

Monitoring and Maintenance: Techniques

Error analysis

Performance monitoring Feedback loops

Automated testing Versioning

Regular retraining Security monitoring

Case Study: Loan Default Prediction
Problem Statement

When customers fail to pay their loans on time, banks suffer losses. These losses, which amount to
millions of dollars every year, have a significant impact on a country's economic growth.
In this case study, you will predict whether a person will default on a loan by examining various
factors such as location, loan balance, funded amount, and more.
A training and testing dataset of 67,463 rows by 35 columns and 28,913 rows by 34 columns,
respectively, is provided.

Source: [Link]
Data Description

• ID (Int): Unique ID of a representative

• Loan amount (Int): Loan amount applied • Subgrade (Object): Subgrade by the bank
• Funded amount (Int): Loan amount funded • Employment duration (Object): Duration
• Funded amount investor (Float): Loan amount • Home ownership (Float): Ownership of home
approved by the investors • Verification status (Object): Income verification
• Term (Int): Term of the loan (in months) by the bank
• Batch enrolled (Object): Batch numbers to • Payment plan (Object): If any payment plan has
representatives been started against the loan
• Interest rate (Float): Interest rate (%) on loan • Loan title (Object): Loan title provided
• Grade (Object): Grade by the bank
Data Description

• Revolving Balance (Int): Total credit revolving

• Debit to income (Float): Ratio of the
balance
representative's total monthly debt
• Revolving utilities (Float): Amount of credit a
repayment divided by self-reported monthly
representative is using relative to the
income excluding mortgage
revolving balance
• Delinquency two years (Int): Number of 30+
• Total accounts (Int): Total number of credit
days of delinquency in the past two years
lines available in a representative credit line
• Inquires in six months (Int): Total number of
• Initial list status (Object): Unique listing status
inquiries in the last six months
of the loan (W for waiting and F for forwarded)
• Open account (Int): Number of open credit
• Total received interest (Float): Total interest
lines in the representative's credit line
received to date
• Public record (Int): Number of derogatory
• Total received late fee (Float): Total late fee
public records
received to date
Data Description

• Recoveries (Float): Post charge-off gross

recovery • Accounts delinquent (Int): Number of accounts
• Collection recovery fee (Float): Post charge-off on which the representative is delinquent
collection fee • Total collection amount (Int): Total collection
• Collection12 months medical (Int): Total amount from all accounts
collections in the last 12 months, excluding • Total current balance (Int): Total current
medical collections balance from all accounts
• Application type (Object): Indicates when the • Total revolving credit limit (Int): Total revolving
representative is an individual or joint credit limit
• Last week’s pay (Int): Indicates how long (in • Loan status (Int): 1 = defaulter, 0 = non-
weeks) a representative has paid EMI after the defaulter (Target feature)
batch enrolled
Data Understanding

There are 67,463 observations and 35 features in the training dataset.

• Out of 35 features, there are:

o 9 features of datatype float

o 17 features of datatype int
o 9 features of datatype object

• Feature ID is the identifier

• Loan Status is the target feature
Data Understanding: Target

The target variable indicates the presence of imbalanced data.

Non Defaulters

90.75% (61,222)

9.25% (6,241)

Defaulters
Problem with Imbalance Data

It refers to a situation where the distribution of classes in the dataset is unequal.

Difficult to detect rare events Biased model performance

Some of the
common problems
are:

Inaccurate evaluation metrics

Techniques to Address Imbalanced Data

Oversampling

Undersampling Cost-sensitive learning

Changing the Ensemble learning

performance metric
Data Exploration: Examples

Univariate analysis
Data Exploration: Examples

Univariate analysis

Interest rate Debit to income

Data Exploration: Examples

Bivariate analysis
Data Exploration: Examples

Bivariate analysis
Data Preparation

Missing values: There are no missing values in the data.

Duplicate values: There are no duplicate values in the data.

Low variance features:

1. Constant features (Variance = 0)
2. Quasi-constant features (Variance = 0.02)

• Feature accounts delinquent has variance = 0

• Collection 12 months medical and accounts delinquent are quasi-constant features.
Data Preparation

Per box plots, the following features have

outliers:
• Funded amount investor
• Interest rate
• Home ownership
Outliers and anomalies
• Open account
• Percentile method
• Revolving balance
• IQR method
• Total accounts
• Box plot method
• Total received interest
• Total received late fee
• Recoveries
• Collection recovery fee
• Total collection amount
• Total current balance
• Total revolving credit limit
Hypothesis Generation

Check if the Target variable has a significant correlation with the Input features
Hypothesis Generation

Check if there is any kind of pattern between the Initial list status and the Loan status
Hypothesis Generation

Check if Subgrade is associated with the Loan status

Hypothesis Generation

On similar lines, you can check the effect of:

• Application type
• Collection 12 months medical
• Term
• Employment duration
• Public record
• Inquiries - six months
• Grade on target feature that is loan status
Outlier Treatment

Once outliers are identified, you need to decide on the appropriate treatment.

Removal Transformation Imputation

By using these options, outliers can be removed.

Feature Encoding

It is the process of converting categorical variables into numerical values that can be used for
analysis or modeling. Techniques for feature encoding are:

One hot

Label Ordinal
Binary Encoding

This technique creates binary columns for a categorical variable by using binary numbers.
Techniques for binary encoding are:

1 2 3

Count Target Hashing

Categorical features

• Batch enrolled- 41
• Grade- 7
• Subgrade- 35
• Employment duration- 3
• Verification status- 3
• Payment plan- 1
• Loan title- 109
• Initial list status- 2
• Application type- 2
Data Pre-processing

Feature wise roadmap of data pre-processing are as follows:

• Batch enrolled: Remove BAT and typecast into it
• Grade: Ordinal
• Sub grade: Ordinal and too many unique values
• Employment duration: Manually typecast
• Verification status: Manually typecast
• Payment plan: Drop
• Loan title: Too many unique values
• Initial list status: Binary nominal
• Application type: Binary nominal
Model Selection

A number of models are tried and tested before deciding which one gives the better result.
Loan Default Prediction

Decision tree performance: Bagging classifier performance:

Loan Default Prediction

Boosting algo performance: Logistic regression performance:

Final Model

• You will use the XGB model as it gives better results.

• Next step would be to fine-tune the model for better precision
and recall.
Model Deployment

Business Production machine learning

inputs
Pipeline

Packaging*
hardening Model
Data science Deploy Monitoring
(Data hardening
engineering)
Data
engineering

Model Model Feature

Model catalog
security governance catalog

Data catalog
Model Deployment: Approach

Considerations
ML architectures

• Modularity
• Train by the batch; predict on the fly; serve via REST API
• Reproducibility
• Train by the batch; predict by the batch; serve through a shared
• Scalability
database
• Extensibility
• Train and predict by streaming
• Testing
• Train by the batch; predict on the mobile (or by other clients)
• Automation
Model Deployment: Comparison
Model Deployment: High Level Architecture

Evaluation layer Scoring layer Feature layer Data layer

Monitoring and Maintenance
Monitoring

Production machine learning needs:

• Monitoring mechanism that is model agnostic
• Instrumentation of both the data flow in and the model
performance metrics out
• To collect performance metrics
Key Takeaways

The BALC is a framework that describes the process of using

data and analytics to drive business decisions.

The business understanding phase involves understanding the

business problem or opportunity that needs to be addressed.

The data collected from various sources is summarized and

visualized to understand the key characteristics of a dataset.

Successful model deployment requires planning, testing, and

maintenance to meet the needs.

Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
No ratings yet
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
9 pages
Loan Default Prediction - Course Slides
No ratings yet
Loan Default Prediction - Course Slides
92 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Credit Risk Analysis Using EDA Techniques
No ratings yet
Credit Risk Analysis Using EDA Techniques
51 pages
Thera Bank Loan Campaign Analysis
No ratings yet
Thera Bank Loan Campaign Analysis
21 pages
Thera Bank Loan Campaign Analysis
100% (1)
Thera Bank Loan Campaign Analysis
21 pages
Problem Statement
No ratings yet
Problem Statement
11 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Bank Loan EDA Project Overview
No ratings yet
Bank Loan EDA Project Overview
24 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Business Analytics Course Guide
No ratings yet
Business Analytics Course Guide
38 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
Hillier 7e Ch02 PPT Accessible
No ratings yet
Hillier 7e Ch02 PPT Accessible
74 pages
03 Data Science Process - Fall 23-24
No ratings yet
03 Data Science Process - Fall 23-24
38 pages
EDA for Risk Analysis in Lending
100% (1)
EDA for Risk Analysis in Lending
19 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
EDA Basics: Python for Data Analysis
100% (1)
EDA Basics: Python for Data Analysis
30 pages
Report
No ratings yet
Report
34 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
Loan Default Prediction Analysis
No ratings yet
Loan Default Prediction Analysis
18 pages
Bank Loan PPT
No ratings yet
Bank Loan PPT
45 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
26 pages
Asg One
No ratings yet
Asg One
10 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Data Science: Demystifying
No ratings yet
Data Science: Demystifying
73 pages
EDA Case Study on Loan Default Risk
No ratings yet
EDA Case Study on Loan Default Risk
33 pages
LendingClub Loan Default Prediction Model
No ratings yet
LendingClub Loan Default Prediction Model
18 pages
Predictive Modeling Using Transactional Data: Financial Services
100% (1)
Predictive Modeling Using Transactional Data: Financial Services
12 pages
Business Analytics Essentials
No ratings yet
Business Analytics Essentials
37 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
12 pages
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
No ratings yet
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
96 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037
No ratings yet
Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037
16 pages
PA v0.20
No ratings yet
PA v0.20
17 pages
Data Science Real World Applications
100% (1)
Data Science Real World Applications
19 pages
03 Data Science Process - Spring-24-25
No ratings yet
03 Data Science Process - Spring-24-25
48 pages
Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Ch1 2
No ratings yet
Ch1 2
23 pages
Vehicle Loan Default Prediction Report
No ratings yet
Vehicle Loan Default Prediction Report
23 pages
Banking Project Final
No ratings yet
Banking Project Final
38 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
Data Analytics Approaches Explained
No ratings yet
Data Analytics Approaches Explained
50 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Machine Learning for Loan Approval Prediction
No ratings yet
Machine Learning for Loan Approval Prediction
31 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
EDA Assignment
No ratings yet
EDA Assignment
33 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
CC&BD Unit 4
No ratings yet
CC&BD Unit 4
12 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
34 pages
Bank Loan Default Risk Analysis
No ratings yet
Bank Loan Default Risk Analysis
26 pages
Optimizing Bank Marketing with ML
No ratings yet
Optimizing Bank Marketing with ML
56 pages
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
Stock Valuation: A Case Study of Texas Instruments: Brinwa Michelle Kra, Brisa Marner, Diana Ruiz, and Mitchell Wolfe
No ratings yet
Stock Valuation: A Case Study of Texas Instruments: Brinwa Michelle Kra, Brisa Marner, Diana Ruiz, and Mitchell Wolfe
7 pages
Abstract Tesis
No ratings yet
Abstract Tesis
2 pages
MBA - U1 - Quantitative Techniques For Business Decisions
No ratings yet
MBA - U1 - Quantitative Techniques For Business Decisions
22 pages
Managing Motivation in Difficult Economy
33% (3)
Managing Motivation in Difficult Economy
7 pages
Experimental Design and Optimisation 4 Plackett Burman Designs 55 Tcm18 232212
No ratings yet
Experimental Design and Optimisation 4 Plackett Burman Designs 55 Tcm18 232212
3 pages
Lesson Plan Project
100% (1)
Lesson Plan Project
6 pages
Collected Insights From The Field of Play Volume 1 Football and Society
No ratings yet
Collected Insights From The Field of Play Volume 1 Football and Society
127 pages
Direct Indirect Thesis Statement
100% (3)
Direct Indirect Thesis Statement
7 pages
Intervention Mapping - A Process For Developing Theory and Evidence Based
No ratings yet
Intervention Mapping - A Process For Developing Theory and Evidence Based
20 pages
2023 - Clinical Assessment Tools For The Detection of Cognitive Imp
No ratings yet
2023 - Clinical Assessment Tools For The Detection of Cognitive Imp
11 pages
CE605A (Probability and Statistics For Civil Engineers) : Assignment
No ratings yet
CE605A (Probability and Statistics For Civil Engineers) : Assignment
11 pages
Concept Paper Thesis
No ratings yet
Concept Paper Thesis
5 pages
Enhancing Patient Safety Through Champions
100% (1)
Enhancing Patient Safety Through Champions
27 pages
ACT Participation Proportion and Mean Score Confidence Intervals
No ratings yet
ACT Participation Proportion and Mean Score Confidence Intervals
16 pages
Urban Revamp for Balud Residents
No ratings yet
Urban Revamp for Balud Residents
6 pages
Meat Shelf-Life Estimation Guide
100% (1)
Meat Shelf-Life Estimation Guide
4 pages
A. Defining Leadership, B. The Nature of Administrative Work - 20250807 - 211507 - 0000
No ratings yet
A. Defining Leadership, B. The Nature of Administrative Work - 20250807 - 211507 - 0000
28 pages
Wine Flavour Chemistry
No ratings yet
Wine Flavour Chemistry
12 pages
PMP Project Context Eng.M.raslan
No ratings yet
PMP Project Context Eng.M.raslan
34 pages
Exploratory Essay Examples
100% (2)
Exploratory Essay Examples
7 pages
Romblon Community-Based Tourism Study
No ratings yet
Romblon Community-Based Tourism Study
29 pages
Activated Carbon From PKS
100% (3)
Activated Carbon From PKS
14 pages
Simulation Modeling With Simul8 Web
No ratings yet
Simulation Modeling With Simul8 Web
415 pages
PG1 Full Research Proposal - 08 2019
No ratings yet
PG1 Full Research Proposal - 08 2019
14 pages
Research Question or Thesis Statement
100% (3)
Research Question or Thesis Statement
4 pages
Solar Electric Tiffin Box Project Report
No ratings yet
Solar Electric Tiffin Box Project Report
21 pages
Quality Based Selection Guideline 2011
No ratings yet
Quality Based Selection Guideline 2011
20 pages
Instructional Supervision in Secondary Education
No ratings yet
Instructional Supervision in Secondary Education
10 pages
Project Based Learning Lesson Plan Template
100% (8)
Project Based Learning Lesson Plan Template
14 pages
Estimation and Hypothesis Testing
No ratings yet
Estimation and Hypothesis Testing
40 pages

Uploaded by

Uploaded by

Business Analytics

Business Analytics Life Cycle

By the end of this lesson, you will be able to:

Explore business analytics life cycle phases to define the

List the challenges faced in each phase

Perform and outline business analytics life cycle phases with

Data exploration Data modeling

Data collection Data deployment

Business Monitoring and

Establishing the Defining the scope

Defining the problem Defining success metrics

Identifying stakeholders Identifying data sources

Establishing the scope Determining objectives

Customer transaction Social media

Customer feedback Website traffic

Data quality Data security Data integration Data volume

To overcome these challenges, it is essential to have a structured approach to data collection.

Data cleaning Outlier detection

Examples of data exploration techniques are:

Descriptive statistics: Calculate summary statistics

Data visualization: Create visualizations

Correlation analysis: Calculate correlation coefficients

Outlier detection: Identify and investigate outliers

Data cleaning: Identify and address missing values

Bias and Data privacy and

It creates a mathematical representation of the data that captures the relationships

Define the problem Apply the model

Clean and preprocess the

Select the modeling

Data privacy and

Underfitting Model selection

Testing and Integrating with

Security User adoption

Scalability Regulatory compliance

Continuous Data security

It is an ongoing process that requires regular attention and adjustment.

Performance monitoring Feedback loops

Automated testing Versioning

Regular retraining Security monitoring

• ID (Int): Unique ID of a representative

• Revolving Balance (Int): Total credit revolving

• Recoveries (Float): Post charge-off gross

There are 67,463 observations and 35 features in the training dataset.

• Out of 35 features, there are:

o 9 features of datatype float

• Feature ID is the identifier

The target variable indicates the presence of imbalanced data.

It refers to a situation where the distribution of classes in the dataset is unequal.

Difficult to detect rare events Biased model performance

Inaccurate evaluation metrics

Undersampling Cost-sensitive learning

Changing the Ensemble learning

Interest rate Debit to income

Missing values: There are no missing values in the data.

Duplicate values: There are no duplicate values in the data.

Low variance features:

• Feature accounts delinquent has variance = 0

Per box plots, the following features have

Check if Subgrade is associated with the Loan status

On similar lines, you can check the effect of:

Removal Transformation Imputation

By using these options, outliers can be removed.

Count Target Hashing

Feature wise roadmap of data pre-processing are as follows:

Decision tree performance: Bagging classifier performance:

Boosting algo performance: Logistic regression performance:

• You will use the XGB model as it gives better results.

Business Production machine learning

Model Model Feature

Evaluation layer Scoring layer Feature layer Data layer

Production machine learning needs:

The BALC is a framework that describes the process of using

The business understanding phase involves understanding the

The data collected from various sources is summarized and

Successful model deployment requires planning, testing, and

You might also like