Skip to content

Commit 174fea1

Browse files
committed
Added bar plots
1 parent efd7efd commit 174fea1

File tree

2 files changed

+88
-4
lines changed

2 files changed

+88
-4
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ To install all of the libraries, run the commands in the "install.txt" file. The
3838
- **Scatterplot:** A scatter plot is a graphical method of displaying the relationship between data points. Each feature variable is assigned an axis. Each data point in the dataset is then plotted based on its feature values.
3939
- **Beeswarm Plot:** A Beeswarm plot is a two-dimensional visualisation technique where data points are plotted relative to a fixed reference axis so that no two datapoints overlap. The beeswarm plot is a useful technique when we wish to see not only the measured values of interest for each data point, but also the distribution of these values.
4040
- **Cumulative Distribution Function:** The cumulative distribution function (cdf) is the probability that a variable takes a value less than or equal to x. For example, we may wish to see what percentage of the data has a certain feature variable that is less than or equal to x.
41+
- **Bar Plots:** Classical bar plots that are good for visualisation and comparison of different data statistics, especially comparing statistics of feature variables.
4142

4243
#### Statistics
4344
- **Mean and Median:** Both of these show a type of "average" or "center" value for a particular feature variable. The mean is the more literal and precise center; however median is much more robust to outliers which may pull the mean value calculation far away from the majority of the values.

explore_wine_data.py

Lines changed: 87 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@
55
import matplotlib.pyplot as plt
66
from sklearn.datasets import load_wine
77

8+
9+
# ------------------------------------------------------------------------------------------------
10+
811
# Read in the data
912
# NOTE that this loads as a dictionairy
1013
wine_data = load_wine()
@@ -19,15 +22,21 @@
1922

2023
print("The wine dataset has " + str(num_features) + " features")
2124
print(wine_data.feature_names)
22-
print("The wine dataset has " + str(num_classes) + " classes")
25+
print("The wine dataset has " + str(num_classes) + " categoryes")
2326
print(wine_data.target_names)
2427

2528

2629
# Put everything into a Pandas DataFrame
27-
data = pd.DataFrame(data=np.c_[train_data, train_labels], columns=wine_data.feature_names + ['class'])
30+
data = pd.DataFrame(data=np.c_[train_data, train_labels], columns=wine_data.feature_names + ['category'])
2831
# print(tabulate(data, headers='keys', tablefmt='psql'))
2932

33+
# ------------------------------------------------------------------------------------------------
34+
35+
36+
37+
3038

39+
# ------------------------------------------------------------------------------------------------
3140

3241
# Create histogram
3342
hist_feature_name='color_intensity'
@@ -38,7 +47,72 @@
3847
plt.xlabel(hist_feature_name)
3948
plt.show()
4049

50+
# ------------------------------------------------------------------------------------------------
51+
52+
53+
54+
55+
56+
57+
58+
# ------------------------------------------------------------------------------------------------
59+
60+
# Create grouped bar plot
61+
62+
63+
var_name_1 = 'alcohol'
64+
var_name_2 = 'color_intensity'
65+
66+
67+
# Setting the positions and width for the bars
68+
pos = list(range(num_classes))
69+
width = 0.1
70+
71+
# Plotting the bars
72+
fig, ax = plt.subplots(figsize=(10,5))
73+
74+
# Set the position of the x ticks
75+
ax.set_xticks([p + 1.5 * width for p in pos])
76+
ax.set_xticklabels(list(range(num_classes)))
77+
78+
class_0_data = data[data.category==0]
79+
alcohol_values_0 = class_0_data[var_name_1].values
80+
mean_alcohol_0 = np.mean(alcohol_values_0)
81+
color_values_0 = class_0_data[var_name_2].values
82+
mean_color_0 = np.mean(color_values_0)
83+
84+
class_1_data = data[data.category==1]
85+
alcohol_values_1 = class_1_data[var_name_1].values
86+
mean_alcohol_1 = np.mean(alcohol_values_1)
87+
color_values_1 = class_1_data[var_name_2].values
88+
mean_color_1 = np.mean(color_values_1)
4189

90+
class_2_data = data[data.category==2]
91+
alcohol_values_2 = class_2_data[var_name_1].values
92+
mean_alcohol_2 = np.mean(alcohol_values_2)
93+
color_values_2 = class_2_data[var_name_2].values
94+
mean_color_2 = np.mean(color_values_2)
95+
96+
plt.bar(pos, [mean_alcohol_0, mean_alcohol_1, mean_alcohol_2], width, alpha=1.0, color='#EE3224', label='alcohol')
97+
plt.bar([p + width for p in pos], [mean_color_0, mean_color_1, mean_color_2], width, alpha=1.0, color='#F78F1E', label='color_intensity')
98+
99+
100+
plt.legend([var_name_1, 'color_intensity'], loc='upper left')
101+
102+
plt.show()
103+
104+
105+
# ------------------------------------------------------------------------------------------------
106+
107+
108+
109+
110+
111+
112+
113+
114+
115+
# ------------------------------------------------------------------------------------------------
42116

43117
# Create scatterplot
44118
scatter_feature_name_1='color_intensity'
@@ -52,18 +126,27 @@
52126

53127

54128
# Create scatterplot matrix
55-
fig = sns.pairplot(data=data[['alcohol', 'color_intensity', 'malic_acid', 'magnesium', 'class']], hue='class')
129+
fig = sns.pairplot(data=data[['alcohol', 'color_intensity', 'malic_acid', 'magnesium', 'category']], hue='category')
56130

57131
plt.show()
58132

133+
# ------------------------------------------------------------------------------------------------
59134

60135

136+
137+
# ------------------------------------------------------------------------------------------------
138+
61139
# Create bee swarm plot
62-
sns.swarmplot(x='class', y='total_phenols', data=data)
140+
sns.swarmplot(x='category', y='total_phenols', data=data)
63141
plt.show()
64142

143+
# ------------------------------------------------------------------------------------------------
144+
145+
146+
65147

66148

149+
# ------------------------------------------------------------------------------------------------
67150

68151
# Cumulative Distribution Function Plots
69152

0 commit comments

Comments
 (0)