0% found this document useful (0 votes)
41 views9 pages

Predective Analytics & Predective Modelling

Predictive analytics involves using advanced statistical methods to identify predictive variables and build models that forecast future trends in business performance. It encompasses both logic-driven models, which are based on business knowledge and relationships, and data-driven models that utilize collected data to establish quantitative relationships. Various methodologies, including regression analysis, data mining, and clustering, are employed to develop these predictive models and analyze data effectively.

Uploaded by

kajin718
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views9 pages

Predective Analytics & Predective Modelling

Predictive analytics involves using advanced statistical methods to identify predictive variables and build models that forecast future trends in business performance. It encompasses both logic-driven models, which are based on business knowledge and relationships, and data-driven models that utilize collected data to establish quantitative relationships. Various methodologies, including regression analysis, data mining, and clustering, are employed to develop these predictive models and analyze data effectively.

Uploaded by

kajin718
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

What Are Predictive Analytics?

Chapter objectives:
e Explain what logic-driven models are used for in business analytics (BA). C
Descfibe what a cause-and-effect diagram is used for in BA.
e Explain the difference between logic-drivenand data-driven models. C
Explain how data mining can aid in BA.
Explain why neural networks can be helpful in determining both associations
and classificationtasks required in some BA analyses.
Explain how clustering is undertaken in BA.
Explain how step-wise regression can be useful in BA.
Explain how to use R-Squared adjusted statistics in BA.

6.1 Introduction
In Chapter 1, "What Are BusinessAnalytics?"we defined predictive analytics as
an application of advancedstatistical,informationsoftware, or operations research
methods to identify predictive variables and build predictive models to identify trends
and relationships not readily observed in the descriptive analytic analysis. Knowing
that relationships exist explainswhy one set of independent variables (predictive vari-
ables) influences dependent variableslike business performance. Chapter I further
explained that the purpose of the descriptive analytics step is to position decision mak-
ers to build predictive models designed to identify and predict future trends.
Picture a situation in which big data files are available from a firm's
sales and
customer information (responsestoidifferinghoes of advertisements,
customer sur-
veys on product quality, customer surveyson supply chain
performance, sale prices,
and so on). Assume also that a previous descfiptive
analytic analysis suggests there
93
94 BUSINESS ANALYTICS PRINCIPLES, CONCEPTS, AND APPLICATIONS

is a relationship between certain customer variables, but there is a need to precisely


establish a quantitative relationship bctwecn sales and customer behavior. Satisfying
this need requires explorationinto the big data to first establishwhether a measur-
able, quantitative relationship docs in ('actexist and then devélop a statisticallyvalid
model in which to predict friture events. This is what the predictive analytics step in
BA seeks to achieve.
Many methods can be used in 'this step of the BA process. Some are just to sort
or classify big data into manageable files in which to later build a precise quantitative
model. As previously mentioned in Chapter 3, "What Resource Considerations Are
Important to Support Business Analytics?"predictive modeling and analysismight
consist of the use of methodologies,including those found in forecasting,sampling
and estimation, statistical inference, data mining, and regression analysis. A commonly
used methodology is multiple regression. (See Appendixes A, "Statistical Tools," and
E, "Forecasting," for a discussion on multiple regression and ANOVAtesting.) This
methodology is ideal for establishingwhether a statistical relationship existsbetween
the predictive variablesfound in the descriptiveanalysisand the dependent variable
one seeks to forecast. An example of its use will be presented in the last section of this
chapter.
Although single or multiple regression models can often be used to forecast a
trend line into the future, sometimesregressionis not practical. In such cases, other
forecasting methods, such as exponentialsmoothingor smoothing averages, can be
applied as predictive analytics to develop needed forecasts of business activity. (See
Appendix E.) Whatever methodologyis used, the identification of future trends or
forecasts is the principle output of the predictive analytics step in the BA process.

6.2 Predictive Modeling


Predictive modeling means developing models that can be used to forecast or
based on logic
predict future events. In business analytics, models can be developed
or data.

62.1 Logic-Driven Models


and logical rela-
A logic-clrivenmodel is one based on experience, knowledge,
business performange
tionships of variables and constants connected to the desired
and constants together
outcome situation. The question here is how to put variables
to create a model that can predict the future. Doing this requires businessexperience.
Model building requires an understanding of business systems and the relationships
ofvariables and constants that seek to generate a desirable business performance out-
come. To help conceptualize the relationships inherent in a business system, diagram-
ming methods can be helpful. For example, the cause-and-effectdiagrarn is a visual
aid diagram that permits a user to hypothesize relationships 'betweenpotential causes
of an outcome (see Figure 6.1). This diagram lists potential causes in terms of human,
technology, policy, and process resources in an effort to establish some basic relation-
ships that impact business performance. The diagram is used by tracing contributing
and relational factors from the desired business performance goal back to possible
causes, thus allowing the user to better picture sources of potential causesthat could
affect the performance. This diagram is sometimes referred to as afishbone diagram
because of its appearance.

Environment Materials Methods

EFFECT:
Poor business
performance

Human Resources Technology

Figure 6.1 Cause-and-effectdiagram*


Schniederjans et al, (2014), p. 201.
*Source: Adapted from Figure 5 in

conceptualize potential relationshipswith business


Another useful diagram to diagram. According to Evans (2013, pp.
influence
performance variables is called the conceptualize the relationshipsof vafi-
useful to
228—229), influence diagrams can be influence diagram is presented
models. AP example of an
ables in the development of and a constant to the desired busi-
relationship of variables the
in Figure 6.2. It maps the such a diagram, it is easy to
profit. From
ness performance outcome of constants and variables that define profit
with
information into a quantitative model
in this situation:
96 BUSINESS ANALYTICS PRINCIPLES, CONCEPTS, AND APPLICATIONS

Profit = Revenue -- Cost, or


Profit = (Unit Price x Quantity Sold) -- [(Fixed Cost) + (Variable Cost x
Quantity Sold)], or
P = (UP x QS) [FC (VC x QS)]

Desired business
performance:
PROFIT (P)

Variable: Variable:
REVENUE (R) COST (C)

Variable: Variable: Constant: Variable:


UNIT QUANTITY FIXED VARIABLE
PRICE (UP) SOLD (QS) COST (FC) COSTS (VC)

Figure 6.2 An influencediagram

The relationships in this simple example are based on fundamental business


howledge. Consider, however, how complex cost functions might become without
some idea of how they are mapped together. It is necessary to be knowledgeable
about the business systems being modeled in order to capture the relevant business
behavior. Cause-and-effect diagrams and influence diagrams provide tools to concep-
tualize relationships, variables, and constants, but it often takes many other method-
ologies to explore and develop predictive models.

6.2.2 Data-Driven Models


Logic-driven modeling is often used as a first step to establish relationships
through data-driven models (using data collected from many sources to quantitatively
establish model relationships). To avoid duplication of content and focus on concep-
tuaJ material in the chapters, most of the computational aspects and some computer
usage content are relegated to the appendixes. In addition, some of the methodolo-
gies are illustrated in the case problems presented in this book. Please refer to,våhe
CHAPTER 6 • WHAT ARE PREDICTIVEANALYTICS? 97

Additional lnCornoation in "J"able6.1 to obtain further information on the use


and application oc the data-driven tnoclels.

Table 6.1 Data-Dijvcn Models


Data-D'iven Models Possible Applications Additional Information
Sarnplingand Generate statistical confidcncc Chapter 5, ''What Are Descriptive
Estitnation intervals to define limitations and Analytics?," Appcndix A, "Statistical
boundaries on future forecasts for Tools," Appendix E, 'Torecasting."
other forecastingmodels.
Regression Analysis (l) Create a predictive equation Chapter 6, 'What Are Predictive
useful for forecasting time series Analytics?,"Chapter 8, "A Final Case
forecasts. (2) Weed out predictive Study Illustration," Appendix E.
vafiablcs in forecasting models
that add little to predicting values.
(3) Generate a trend line for
forecasting.
Correlation Analysis (1) Assess variable relationships. Chapter 6, Appendix E.
(2) Weed out predictive variables
in forecasting models that add
little to predicting values.
Probability (1) Estimate trend behavior Chapter 5, Appendix A.
Distributions that follows certain types of
probability distributions.(2)
Conduct statisticaltests to confirm
significance of variables.
Predictive Modeling Fit linear and nonlinearmodäs Appendix A, Appendix E.
and Analysis to data to use the models for
forecasång.
Forecasting Models Those listed in this table and Appendix E.
others such as smoothing models
can be used to forecast values.
Simulation Project future behavior in variables Appendix F,
"Simulation."
by simulating the past behavior
found in probability distributions.
Modeling Relationships and Trends in Data
Understanding both the mathematics and the clcscriptivc properties of different functional
relationships is illlportant in building predictive analytical models. We often begin by cre-
ating a chart ol' the data to understand them and choose the appropriate type of functional
relationship to incorporate into an analytical model. For cross-sectional data, we use a
scatter chart; Cot'tinyc-scrics data, wc usc a linc chart.
Comtnon types of nyathctnafical_Conctions in prcdictivc analytical models include
the following:

n Linear function: y z: a + bx. Linear functions show steady increases or


decreases over the range of x. This is the simplest type of function used in predic-
tive models. It is easy to understand and, over small ranges of values, can approx-
imate behavior rather well.
Logaritlunic function: y In(x). Logarithmic functions are used when the rate
of change in a variable increases or decreases quickly and then levels out, such as
with diminishing returns to scale. Logarithmic functions are often used in mar-
keting models where constant percentage increases in advertising, for instance,
result in constant, absolute increases in sales.
Polynomial function: y ax + bx + c (secondorder—quadraticfunc-
tion), y ax + bx2 + cx + d (third order—cubicfunction), and so on. A
second-order polynomial is parabolic in nature and has only one hill or valley;
a third-order polynomial has one or two hills or valleys. Revenue models that
incorporate price elasticity are often polynomial functions.
Power function: y ax . Power functions define phenomena that increase at a
specific rate. Learning curves that express improving times in performing a task
are often modeled with power functions having a > 0 and b < 0.
z Exponential function: y alit . Exponential functions have the property that y
rises or falls at constantly increasing rates. For example, the perceived brightness
of a lightbulb grows at a decreasing rate as the wattage increases. In this case, a
would be a positive number and b would be between 0 and 1. The exponential
function is often defined as y act , where b — e, the base of natural logarithms
(approximately 2.71828).

The Excel Trendline tool provides a convenient method for determining the best-fit-
ting functional relationship among these alternatives for a set of data. First, click the chart
to which you wish to add a trendiine; this will display the Chart Tools menu. Select the
Chart Tools Design tab, and then click Add Chart Element from the Chart Layouts group.
From the Trendline submenu, you can select one of the options (Linear is the most com-
mon) or More Trendline Options . If you select More Trendline Options, you will get
the Format Trendline pane in the worksheet (see Figure 8.l). A simpler way of doing all
this is to righ! click on the data series in the chart and choose Add trend!ine from the pop-
up menu—try it! Select the radio button for the type of functional relationship you wish
Regression Analysis
314 Chapter 8 Trendllnesand
Format Trendline
Figure 8.1 OPTIONS
TRENDUNE
Excel Format Trendline Pane
ÅåitOi0LJNE orrtotJ5

O Oponontlot
@ LJneor
IL

IL O Lggar/thrnlt.
O Polynomial
0 Pouor
Average period 2
1,
O Moving
IL
Trendline Name
Linear(Seriesl)
1,
@Automatic
C) Custom
Forecast
1.
periods
Eorward

gackward
0.0 periods 1
00
C) Set Intercept

C) Display
Equation on chart 1
value on chart
Display ß-squared

1
chart and Display R-square
for Display Equation on
to fit to the data. Check the boxes pane. Excel will display
the Format Trendline
value on chart. You may then close
you may move the equation and R-squaredvaluef
results on the chart you have selected;
different location. To clear a trendline, rightclic
better readability by dragging them to a
on it and select Delete.
of the line to the data. The value of R2will
R2 (R-squared) is a measure of the "fit"
better the fit. We will discuss thisfurthe
between 0 and 1. The larger the value of R2, the
in the context of regression analysis.
Trendlines can be used to model relationships between variables and Understand
the dependent variable behaves as the independent variable changes. For example, the
demand-prediction models that we introduced in Chapter 1 (Examples 1.7 and 1.8)woul
generally be developed by analyzing data.

Modeling a Price-Demand Function


A market research study has collected data on sales vol- Sales = 20,512 - 9.5116 x Price
umes for different levels of pricing of a particular product.
If the price is $125, we can estimate the the level of sales
The data and a scatter diagram are shown in Figure 8.2
(Excel file Price-Sa/es Data). The relationship between price Sales 20,512 - 9.5116 x 125 = 19,323
and sales clearly appears to be linear, so a linear trendline
This model can be used as the demand functioninother
was fit to the data. The resulting model is
marketing or financial analyses.

Trendlines are also used extensively in modeling trends over time—that is, whenthe
variable x in the functional relationships represents time. For
example, an analyst for
airline needs to predict where fuel prices are going, and
an investment analyst wouldwant
to predict the price of stocks or key economic
indicators.
Chapter 8 Trendines and Regression Analysis 315

Predicting Crude Oil Prices


Figure 8.3 shows a chart of historical data on crude oil Polynomial (second order):
2 - 2.40X -E 68.01 R2 — 0.905
prices on the first Friday of each month from January 2006 Y = 0.130x
through June 2008 (data are in the Excel file Ctude Oil
prices). Using the Trend/ine tool, we can try to fit the vari- Polynomial (third order):
Y = 0.005x 3 0.111X 2 -E 0.648K + 59.497
ous functions to these data (herex represents the number 0.928
of months starting with January 2006). The results are as R2 0.397
Power:y = 45.96%
0.0169
follows:
among these, which has the largest
I Exponential: Y — 50.49e 0.664 The best-fitting model
polynomial,shown in Figure 8.4.
Logarithmetic: y 13.021n(x) + 39.60 R2 0.382 R2, is the third-order

Figure 8.2 c
1 Price-Sales Data
Scatter Chart with Fitted 2
Linear Function 3 Price Demand + 20512
y - -9.5116x
4 $50.00 19964.09 Price-Sales Data RI 0.833
5 —$[Link] 19706.85
6 $70.00 20240.83 20500
7 $80.00 19698.31
8 $90.00 20095.81 20000
9 100.00 19390.99
10 $110.00 19430.07 19500
11 120.00 19273.69
12 $130.00 18716.38 19000
43 $140.00 18925.36
14 $150.00 19484.78 18500
15 $160.09 18934.38
$170.00 18915.77 18000
16
17 $180.00 18893.37 $250.00 $300.00
17500 $[Link] $200.00
18 $190.00 18961.62
$50.00 $100.00
18443.29 $0.00 price
19 $200.00
20 $210.00 18811.98
21 $220.00 18561.92
22 $230.00 18158.62
$240.00 18412.56
23
24 $250.00 17771.39

Price
Figure 8.3
Chartof Crude Oil Prices $140.00
Data
$120.00

$100.00

$80.00

$60.00

$40.00

$20.00

$0.00
316 Chapter 8 Trendlinesand Regression Analysis

will continue to increase:


Bc cautious when using polynomial functions. The R value provide a
the order oc the polynonjial increases; that is, a third-order polynomial will
generally n,
fit than a second-order polynornial, and so on. Higher-order polynomials will
be very stnootl) anclwill be difficult to intcrprc[ visually. Thus, we don't recommend
jUdgmen
beyond a third-order polynomial when fitting data. Use your eye to make a good
Of course, the proper tnodel to osc depends on thc scope of the data. As the chC,-
and then began
Figure 8.3 shows, oil prices wcrc relatively stabJc tuntilearly 2007
increase rapidly. (he carly data acc included, the long-term functional relationship
not adequately express the short-term trend. For cxampJc, fitting a model to only the
beginning with January 2007 yiclcls these rnodcls:
i
Exponential: Y 50.56 e0.044x R2 0.969
Polynornial (second order): y 0.121x2 + 1.23x + 53.48 R2 0.968
Linear: 3.55x + 45.76 R2 0.944
Y

The difference in prediction can


be significant. For example, predicting the price
months after the last data point (x 36) yields $172.25 for the third-order polynomi
fit with all the data and
$246.45 for the exponential model with only the recent data. Th
you must be careful
to select the proper amount of data for the analysis. The question thc
becomes one of choosing the best
assumptionsfor the model. Is it reasonable to assu
that prices would increase
exponentially, or perhaps at a slower rate, such as with the lin
model fit? Or would they level off
and start falling? Clearly, factors other than historirA
trends would enter into this
choice. In the latter half of 2008, oil prices plunged; thus, an
predictive models are risky.

1. State the common types of mathematical


functions used in predictive analytics ai-
their properties.
2. Explain how to use the Trendline tool in Excel.
3. What does R2 measure?

Figure 8.4
Polynomial Fit of Crude Oil
Prices Data $140.00
= 0.9282

$120.00

$100.00

$80.00

$60.00

$40.00

$20.00

$0.00

00 i eo

You might also like