Data Visualization
Quantitative Business Analysis
Introduction
• Data visualization involves:
– Creating a summary table for the data.
– Generating charts to help interpret, analyze, and learn from the data.
• Uses of data visualization:
– Helpful for identifying data errors.
– Reduces the size of your data set by highlighting important relationships
and trends in the data.
Effective Design Techniques
OVERVIEW OF DATA VISUALIZATION
Overview of Data Visualization
Effective Design Techniques:
• Data-ink ratio: Measures the proportion of what Tufte terms “data-ink” to
the total amount of ink used in a table or chart.
– Edward R. Tufte first described the data-ink ratio.
– Helpful for creating effective tables and charts for data visualization:
• Data-ink: Ink used in a table or chart that is necessary to convey the meaning of
the data to the audience.
• Non-data-ink: Ink used in a table or chart that serves no useful purpose in
conveying the data to the audience.
Overview of Data Visualization
Table 3.1: Example of a Day Sales Day Sales
Low Data-Ink Ratio 1 150 11 170
Table 2 170 12 160
3 140 13 290
Scarf Sales
4 150 14 200
5 180 15 210
6 180 16 110
7 210 17 90
8 230 18 140
9 140 19 150
10 200 20 230
Overview of Data Visualization
Figure 3.3: Example of a Low Data-Ink Ratio Chart
Overview of Data Visualization
Table 3.2: Increasing the Data-Ink Ratio by Removing
Unnecessary Gridlines Day Sales Day Sales
Scarf Sales 1 150 11 170
2 170 12 160
3 140 13 290
4 150 14 200
5 180 15 210
6 180 16 110
7 210 17 90
8 230 18 140
9 140 19 150
10 200 20 230
Overview of Data Visualization
Figure 3.4: Increasing the Data-Ink Ratio by Adding Labels to
Axes and Removing Unnecessary Lines and Labels
Table Design Principles
Crosstabulation
PivotTables in Excel
Recommended PivotTables in Excel
TABLES
Tables
Tables should be used when:
1. The reader needs to refer to specific numerical values.
2. The reader needs to make precise comparisons between different values
and not just relative comparisons.
3. The values being displayed have different units or very different magnitudes.
Tables
Table 3.3: Table Showing Exact Values for Costs and Revenues
by Month for Gossamer Industries
Month
1 2 3 4 5 6 Total
Costs ($) 48,123 56,458 64,125 52,158 54,718 50,985 326,567
Revenues ($) 64,124 66,128 67,125 48,178 51,785 55,687 353,027
Tables
Figure 3.5: Line Chart of Monthly Costs and Revenues at
Gossamer Industries
Tables
Figure 3.6: Combined Line
Chart and Table for Monthly
Costs and Revenues at
Gossamer Industries
Tables
Table 3.4: Table Displaying Head Count, Costs, and Revenues
at Gossamer Industries
Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Total
Head Count 8 9 10 9 9 9
Costs ($) 48,123 56,458 64,125 52,158 54,718 50,985 326,567
Revenues ($) 64,124 66,128 67,125 48,178 51,785 55,687 353,027
Tables
Table Design Principles:
– Avoid using vertical lines in a table unless they are necessary for clarity.
– Horizontal lines are generally necessary only for separating column titles
from data values or when indicating that a calculation has taken place.
Tables
Figure 3.7: Comparing Different Table Designs
Tables
Table 3.5: Larger Table Showing Revenues by Location for 12
Months of Data
Revenues by Location ($) Month 1 Month 2 Month 3 Month 4 Month 5 Month 6
Temple 8,987 8,595 8,958 6,718 8,066 8,574
Killeen 8,212 9,143 8,714 6,869 8,150 8,891
Waco 11,603 12,063 11,173 9,622 8,912 9,553
Belton 7,671 7,617 7,896 6,899 7,877 6,621
Granger 7,642 7,744 7,836 5,833 6,002 6,728
Harker Heights 5,257 5,326 4,998 4,304 4,106 4,980
Gatesville 5,316 5,245 5,056 3,317 3,852 4,026
Lampasas 5,266 5,129 5,022 3,022 3,088 4,289
Academy 4,170 5,266 7,472 1,594 1,732 2,025
Total 64,124 66,128 67,125 48,178 51,785 55,687
Costs ($) 48,123 56,458 64,125 52,158 54,718 50,985
Tables
Table 3.5 (cont.)
Revenues by
Location ($) Month 7 Month 8 Month 9 Month 10 Month 11 Month 12 Total
Temple 8,701 9,490 9,610 9,262 9,875 11,058 107,895
Killeen 8,766 9,193 9,603 10,374 10,456 10,982 109,353
Waco 11,943 12,947 12,925 14,050 14,300 13,877 142,967
Belton 7,765 7,720 7,824 7,938 7,943 7,047 90,819
Granger 7,848 7,717 7,646 7,620 7,728 8,013 88,357
Harker Heights 5,084 5,061 5,186 5,179 4,955 5,326 59,763
Gatesville 5,135 5,132 5,052 5,271 5,304 5,154 57,859
Lampasas 5,110 5,073 4,978 5,343 4,984 5,315 56,620
Academy 8,772 1,956 3,304 3,090 3,579 2,487 45,446
Total 69,125 64,288 66,128 68,128 69,125 69,258 759,079
Costs ($) 57,898 62,050 65,215 61,819 67,828 69,558 710,935
Tables
• Crosstabulation: A useful type of table for describing data of
two variables.
• PivotTable: A crosstabulation in Microsoft Excel.
Tables
Table 3.6: Quality Rating and Meal Price for 300 Los Angeles
Restaurants
Restaurant Quality Rating Meal Price ($) Wait Time (min)
1 Good 18 5
2 Very Good 22 6
3 Good 28 1
4 Excellent 38 74
5 Very Good 33 6
6 Good 28 5
7 Very Good 19 11
8 Very Good 11 9
9 Very Good 23 13
10 Good 13 1
Tables
Table 3.7: Crosstabulation of Quality Rating and Meal Price for
300 Los Angeles Restaurants
Meal Price
Quality Rating $10–19 $20–29 $30–39 $40–49 Total
Good 42 40 2 0 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300
• The greatest number of restaurants in the sample (64) have a very good rating and a meal price
in the $20–29 range.
• Only two restaurants have an excellent rating and a meal price in the $10–19 range.
• The right and bottom margins of the crosstabulation give the frequencies of quality rating and
meal price separately.
Tables
Figure 3.8: Excel Worksheet
Containing Restaurant Data
Tables
Figure 3.9: Initial
PivotTable Field
List and PivotTable
Field Report for the
Restaurant Data
Tables
Figure 3.10: Completed
PivotTable Field List
and a Portion of the
PivotTable Report for
the Restaurant Data
(Columns H:AK Are
Hidden)
Tables
Figure 3.11: Final
PivotTable Report
for the Restaurant
Data
Tables
Figure 3.12:
Percent
Frequency
Distribution as a
PivotTable for the
Restaurant Data
Tables
Figure 3.13: PivotTable Report for the Restaurant Data with
Average Wait Times Added
Scatter Charts
Recommended Charts in Excel Bubble Charts
Line Charts Heat Maps
Bar Charts and Column Charts Additional Charts for Multiple Variables
A Note on Pie Charts and PivotCharts in Excel
Three-Dimensional Charts
CHARTS
Charts
• Charts (or graphs): Visual methods of displaying data.
• Scatter chart: Graphical presentation of the relationship
between two quantitative variables.
• Trendline: A line that provides an approximation of the
relationship between the variables.
• Line chart: A line connects the points in the chart.
– Useful for time series data collected over a period of time (minutes,
hours, days, years, etc.).
Charts
Table 3.8: Sample Data for the San Francisco Electronics Store
No. of Commercials Sales ($100s)
Week x y
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46
Charts
Figure 3.17:
Scatter Chart for
the San Francisco
Electronics Store
Charts
Month Sales ($100s)
Table 3.9: Monthly Jan 135
Feb 145
Sales Data of Air Mar 175
Compressors at Apr 180
May 160
Kirkland Industries Jun 135
Jul 210
Aug 175
Sep 160
Oct 120
Nov 115
Dec 120
Charts
Figure 3.19: Scatter Chart and Line Chart for Monthly Sales
Data at Kirkland Industries
Charts
Table 3.10: Regional Sales Data by Month for Air Compressors
at Kirkland Industries Month
Jan
Sales ($100s) North
95
Sales ($100s) South
40
Feb 100 45
Mar 120 55
Apr 115 65
May 100 60
Jun 85 50
Jul 135 75
Aug 110 65
Sep 100 60
Oct 50 70
Nov 40 75
Dec 40 80
Charts
Figure 3.21: Line
Chart of Regional
Sales Data at
Kirkland
Industries
Charts
Sparkline: Special type of line chart:
– Minimalist type of line chart that can be placed directly into a cell in Excel.
– Contains no axes; they display only the line for the data.
– Takes up very little space and can be effectively used to provide information
on overall trends for time series data.
Charts
Figure 3.22:
Sparklines for the
Regional Sales
Data at Kirkland
Industries
Charts
• Bar Charts: Use horizontal bars to display the magnitude of
the quantitative variable.
• Column Charts: Use vertical bars to display the magnitude of
the quantitative variable.
• Bar and column charts are very helpful in making comparisons
between categorical variables.
Charts
Figure 3.23: Bar Charts for
Accounts Managed Data
Gentry manages the greatest
number of accounts and
Williams the fewest.
Charts
Figure 3.24:
Sorted Bar Chart
for Accounts
Managed Data
Charts
Figure 3.25: Bar
Chart with Data
Labels for
Accounts
Managed Data
Charts
• Pie chart: Common form of chart used to compare categorical
data.
• Bubble chart: Graphical means of visualizing three variables in
a two-dimensional graph that sometimes is a preferred
alternative to a 3-D graph.
• Heat map: A two-dimensional graphical representation of data
that uses different shades of color to indicate magnitude.
Charts
Figure 3.26: Pie Chart
of Accounts Managed
Charts
Table 3.11: Sample Data on Billionaires per Country
Billionaires per 10M Per Capita No. of
Country Residents Income Billionaires
United States 54.7 $54,600 1,764
China 1.5 $12,880 213
Germany 12.5 $45,888 103
India 0.7 $ 5,855 90
Russia 6.2 $24,850 88
Mexico 1.2 $17,881 15
Charts
Figure 3.27: Bubble
Chart Comparing
Billionaires by Country
Charts
Figure 3.28:
Heat Map and
Sparklines for
Same-Store
Sales Data
Charts
Additional Charts for Multiple Variables:
– Stacked-column chart: Allows the reader to compare the relative values of
quantitative variables for the same category in a bar chart.
– Clustered-column (or bar) chart: An alternative chart to stacked-column
chart for comparing quantitative variables.
– Scatter-chart matrix: Useful chart for displaying multiple variables.
Charts
Figure 3.29: Stacked-Column Chart for Regional Sales Data for Kirkland Industries
Charts
Figure 3.30: Comparing
Stacked-, Clustered-,
and Multiple-Column
Charts for the Regional
Sales Data for Kirkland
Industries
Charts
Table 3.12: Data for New York City Sub-boroughs
Percentage
Median Monthly College
Area Rent ($) Graduates (%) Poverty Rate (%) Travel Time (min)
Astoria 1,106 36.8 15.9 35.4
Bay Ridge 1,082 34.3 15.6 41.9
Bayside/Little Neck 1,243 41.3 7.6 40.6
Bedford Stuyvesant 822 21.0 34.2 40.5
Bensonhurst 876 17.7 14.4 44.0
Borough Park 980 26.0 27.6 35.3
Charts
Table 3.12: Data for New York City Sub-boroughs (cont.)
Percentage
Median Monthly College
Area Rent ($) Graduates (%) Poverty Rate (%) Travel Time (min)
Brooklyn Heights/Fort Greene 1,086 55.3 17.4 34.5
Brownsville/Ocean Hill 714 11.6 36.0 40.3
Bushwick 945 13.3 33.5 35.5
Central Harlem 665 30.6 27.1 25.0
Chelsea/Clinton/Midtown 1,624 66.1 12.7 43.7
Coney Island 786 27.2 20.0 46.3
Charts
Figure 3.31: Scatter-Chart
Matrix for New York City
Rent Data
Charts
PivotCharts in Excel:
PivotChart: To summarize and analyze data with both a crosstabulation and
charting, Excel pairs PivotCharts with PivotTables.
Charts
Figure 3.32: PivotTable and PivotChart for the Restaurant
Data
Advanced Charts
Geographic Information Systems Charts
ADVANCED DATA VISUALIZATION
Advanced Data Visualization
Advanced Charts:
• Parallel-coordinates plot: Chart for examining data with more than two
variables:
– Includes a different vertical axis for each variable.
– Each observation is represented by drawing a line on the parallel-
coordinates plot connecting each vertical axis.
– The height of the line on each vertical axis represents the value
taken by that observation for the variable corresponding to the
vertical axis.
• Treemap: Useful for visualizing hierarchical data along multiple dimensions.
Advanced Data Visualization
Figure 3.33:
Parallel-Coordinates
Plot for Baseball
Data
Advanced Data Visualization
Figure 3.34:
SmartMoney’s
Map of the
Market as an
Example of a
Treemap
Advanced Data Visualization
Figure 3.35: Treemap Created in Excel for Top 100 Global
Companies Data
Advanced Data Visualization
Geographic Information Systems Charts:
– Geographic information system (GIS): A system that merges maps and
statistics to present data collected over different geographic areas.
– Helps in interpreting data and observing patterns.
Advanced Data Visualization
Figure 3.36: GIS
Chart for Cincinnati
Zoo Member Data
Advanced Data Visualization
Figure 3.38:
Completed 3D
Map Created in
Excel for World
GDP Data
Principles of Effective Data Dashboards
Applications of Data Dashboards
DATA DASHBOARDS
Data Dashboards
• Data dashboard: Data-visualization tool that illustrates
multiple metrics and automatically updates these metrics as
new data become available.
Principles of Effective Data Dashboards:
• Key performance indicators (KPIs) in dashboards:
– Automobile dashboard: Current speed, Fuel level, and oil pressure.
– Business dashboard: Financial position, inventory on hand, customer
service metrics.
Data Dashboards
Principles of Effective Data Dashboards (continued):
– Should provide timely summary information on KPIs that are important
to the user.
– Should present all KPIs as a single screen that a user can quickly scan to
understand the business’s current state of operations.
– The KPIs displayed in the data dashboard should convey meaning to its
user and be related to the decisions the user makes.
– A data dashboard should call attention to unusual measures that may
require attention.
– Color should be used to call attention to specific values to differentiate
categorical variables, but the use of color should be restrained.
Data Dashboards
Figure 3.39: Data Dashboard
for the Grogan Oil
Information Technology Call
Center