å ±å½¹å¾é æ³ã«ãããã¥ã¼ã©ã«ãããã®ãã©ã¡ã¼ã¿æ¨å®
Courseraの機械学習ãã¿ã®ç¶ããååã¯ãロジスティック回帰のパラメータ推定ï¼2014/4/15ï¼ã«å ±å½¹å¾é æ³ï¼Conjugate Gradient: CGæ³ï¼ã使ãã¾ãããä»åã¯ããè¤éãªãã¥ã¼ã©ã«ãããï¼å¤å±¤ãã¼ã»ãããã³ï¼ã®ãã©ã¡ã¼ã¿æ¨å®ã«å ±å½¹å¾é æ³ãé©ç¨ãã¦ã¿ã¾ããã
以åã多層パーセプトロンで手書き数字認識の実験をしたときï¼2014/2/1ï¼ã¯ãå ±å½¹å¾é æ³ã§ã¯ãªããå¾é éä¸æ³ï¼Gradient Descentï¼ãç¨ãã¦ãã©ã¡ã¼ã¿ã®æ´æ°å¼ãèªåã§æ¸ãã¦ãã¾ããã
self.weight1 -= learning_rate * np.dot(delta1.T, x) self.weight2 -= learning_rate * np.dot(delta2.T, z)
å¾é éä¸æ³ã¯ãå¦ç¿çï¼learning rateï¼ãé©åãªå¤ã«è¨å®ããªãã¨åæãé ããçºæ£ãããªã©æ¬ ç¹ãããã¾ããä»åç¨ãã共役勾配法ï¼2014/4/14ï¼ã¯å¦ç¿çãèªåã§è¨å®ããå¿ è¦ããªãä¸ã«å¾é éä¸æ³ããé«éã¨ããã¢ã«ã´ãªãºã ã§ããå®è£ ã¯ãscipy.optimizeã«ããã®ã§ããã使ãã¾ããä»åãMNISTã®ææ¸ãæ°åèªèãä¾é¡ã«ãã¦ãã¾ãã
ãã¥ã¼ã©ã«ãããã®ã³ã¹ãé¢æ°
Pythonã®å ±å½¹å¾é æ³ã®ã©ã¤ãã©ãªã¯ãscipy.optimizeã¢ã¸ã¥ã¼ã«ã«ããã¾ãããã®ã¢ã¸ã¥ã¼ã«ã¯ãæé©åãããé¢æ°ã¨ãã®å°é¢æ°ãä¸ããå¿ è¦ãããã¾ãã天ä¸ãã§ããããã¥ã¼ã©ã«ãããã®ã³ã¹ãé¢æ°ã¯ä¸ã®ãããªã¡ãã£ã¨ãã°ãå¼ã«ãªãã¾ãã
ããã§ãmã¯è¨ç·´ãã¼ã¿æ°ãKã¯åºåã¦ãããã®æ°ã ã¯kçªç®ã®åºåã¦ãããã«å¯¾ããæ師信å·ï¼1-of-K表è¨ï¼ã ã¯ä»®èª¬ã®é¢æ°ã§ãã¥ã¼ã©ã«ãããã§ã¯ãã©ã¼ã¯ã¼ããããã²ã¼ã·ã§ã³ã«ããäºæ¸¬å¤ã§ãããã®ä¸ã® ã¯æ¨å®ãããã©ã¡ã¼ã¿ã§å ¥å層ã¨é ã層ã®éã®éã¿ï¼ï¼ã¨é ã層ã¨åºå層ã®éã®éã¿ï¼ï¼ããã¼ã¸ãããã®ã§ãã
ãã®ãã¥ã¼ã©ã«ãããã§ã¯ãäºä¹èª¤å·®é¢æ°ã§ã¯ãªãã交差ã¨ã³ãããã¼èª¤å·®é¢æ°ã使ã£ã¦ãã¾ããã¾ããéå¦ç¿ãé²ãããã«æ£ååãã¦ãã¾ãã第äºé ãæ£ååé ã§ããã¥ã¼ã©ã«ãããã®ãã¹ã¦ã®éã¿ãäºä¹ãã¦è¶³ããã¨ã«ãã£ã¦éã¿ã®å¤ã巨大ã«ãªããªãããã«èª¿æ´ãã¦ãã¾ãã400ã¯å ¥å層ã®ã¦ãããæ°ã25ã¯é ã層ã®ã¦ãããæ°ã10ã¯åºå層ã®ã¦ãããæ°ã§ãï¼ä¸ã®ã¹ã¯ãªããã§ã¯å°ãå¤ãã¦ãã¾ãï¼ãPythonã§æ¸ãã¨ä¸ã®ãããªã³ã¼ãã«ãªãã¾ãããã¥ã¼ã©ã«ãããã®æ§é ã¯ãå ¥å層ãé ã層ãåºå層ã®3層ãæ³å®ãã¦ãã¾ãã
Coursera Machine Learning Week5 ããå¼ç¨
J = 0 for i in range(m): xi = X[i, :] yi = y[i] # forward propagation a1 = xi z2 = np.dot(Theta1, a1) a2 = sigmoid(z2) a2 = np.hstack((1, a2)) z3 = np.dot(Theta2, a2) a3 = sigmoid(z3) J += sum(-yi * safe_log(a3) - (1 - yi) * safe_log(1 - a3)) J /= m # æ£ååé temp = 0.0; for j in range(hid_size): for k in range(1, in_size + 1): # ãã¤ã¢ã¹ã«å¯¾å¿ããéã¿ã¯å ããªã temp += Theta1[j, k] ** 2 for j in range(num_labels): for k in range(1, hid_size + 1): # ãã¤ã¢ã¹ã«å¯¾å¿ããéã¿ã¯å ããªã temp += Theta2[j, k] ** 2 J += lam / (2.0 * m) * temp;
ãã¥ã¼ã©ã«ãããã§ã¯ããã®ã³ã¹ãé¢æ°ãæå°åãããã©ã¡ã¼ã¿ï¼Theta1ã¨Theta2ï¼ãæ¨å®ãã¾ããTheta1ã¯å ¥å層ã¨é ã層ã®éã®éã¿ãTheta2ã¯é ã層ã¨åºå層ã®éã®éã¿ã§ãã
ã³ã¹ãé¢æ°ã®å°é¢æ°
scipyã®å ±å½¹å¾é æ³ã使ãã«ã¯ãæé©åãããã³ã¹ãé¢æ°ã®ä»ã«ãã®åå¾®åãå¿ è¦ã«ãªãã¾ãããã¥ã¼ã©ã«ãããã§ã¯ãããã¯ãããã²ã¼ã·ã§ã³ãç¨ãããã¨ã§å¹ççã«ã³ã¹ãé¢æ°ã®åå¾®åãè¨ç®ã§ãã¾ãã
ããã§ã大æåã®Îã¯ãå層ã®èª¤å·®Î´ã¨åºåå¤aããè¨ç®ã§ããè¡å
ãå ¨è¨ç·´ãã¼ã¿ã§è¶³ãåãããå¤ã«ãªãã¾ããPythonã§æ¸ãã¨ä¸ã®ããã«ãªãã¾ãã
for i in range(m): # forward propagation ï¼çç¥ï¼ # backpropagation delta3 = a3 - yi delta2 = np.dot(Theta2.T, delta3) * sigmoidGradient(np.hstack((1, z2))) delta2 = delta2[1:] # ãã¤ã¢ã¹é ã«å¯¾å¿ããè¦ç´ ãé¤å¤ # ãã¯ãã« x ãã¯ãã« = è¡åã®æ¼ç®ãããªããã°ãªããªãã®ã§ # 縦ãã¯ãã«ã¸ã®reshapeãå¿ è¦ # è¡æ°ã«-1ãæå®ããã¨èªåçã«å ¥ã delta2 = delta2.reshape((-1, 1)) delta3 = delta3.reshape((-1, 1)) a1 = a1.reshape((-1, 1)) a2 = a2.reshape((-1, 1)) # æ£ååããã®ã¨ãã®ãã«ã¿ã®æ¼ç® Theta1_grad += np.dot(delta2, a1.T) Theta2_grad += np.dot(delta3, a2.T) # åå¾®åã®æ£ååé Theta1_grad /= m Theta1_grad[:, 1:] += (lam / m) * Theta1_grad[:, 1:] Theta2_grad /= m Theta2_grad[:, 1:] += (lam / m) * Theta2_grad[:, 1:]
ã³ã¹ãé¢æ°ã¨åå¾®åã1ã¤ã®é¢æ°ã§è¨ç®ããã
ååãロジスティック回帰で共役勾配法を使ったときï¼2014/4/15ï¼ã¯ãã³ã¹ãé¢æ° J() ã¨ãã®åå¾®åãè¿ãé¢æ° gradient() ãå¥ã ã«ç¨æãã¦fminc_cg()ã«æ¸¡ãã¦ãã¾ããã
# Conjugate Gradientã§ãã©ã¡ã¼ã¿æ¨å®
theta = optimize.fmin_cg(J, initial_theta, fprime=gradient, args=(X, y))
ãã¥ã¼ã©ã«ãããã®å ´åãã³ã¹ãé¢æ°ãè¨ç®ããããã«ãã©ã¯ã¼ããããã²ã¼ã·ã§ã³ãå¿ è¦ã§ããã¾ããåå¾®åãè¨ç®ããããã«ã¯ããã¯ãããã²ã¼ã·ã§ã³ãå¿ è¦ã§ããããã®åã«ãã©ã¯ã¼ããããã²ã¼ã·ã§ã³ãè¡ãå¿ è¦ãããã¾ãããã®ãããã³ã¹ãé¢æ°ã¨åå¾®åãè¨ç®ããé¢æ°ãåããã¨ãã©ã¯ã¼ããããã²ã¼ã·ã§ã³ãéè¤ãã¦å¹çãæªããªãã¾ãããã®ãããªã±ã¼ã¹ã§ã¯ãfmin_cg()ã§ã¯ãªããminimize()ã¨ããããä½ã¬ãã«ãªé¢æ°ã使ãã¾ãï¼scipy 0.11.0以ä¸ï¼ãminimize()ã使ãã¨ã³ã¹ãé¢æ°ã¨åå¾®åã®ãªã¹ããè¿ãé¢æ°ï¼ãµã³ãã«ã§ã¯nnCostFunctionï¼ã ã渡ãã°ãããªãã¾ãã
# Conjugate Gradientã§ãã©ã¡ã¼ã¿æ¨å® # NNã¯ã³ã¹ãé¢æ°ã¨åå¾®åã®è¨ç®ãéè¤ããããåãé¢æ°ï¼nnCostFunctionï¼ã«ã¾ã¨ãã¦ãã # ãã®å ´åãfmin_cgã§ã¯ãªãminimizeã使ç¨ããã¨ãã # minimize()ã¯scipy 0.11.0以ä¸ãå¿ è¦ res = optimize.minimize(fun=nnCostFunction, x0=initial_nn_params, method="CG", jac=True, options={'maxiter':20, 'disp':True}, args=(in_size, hid_size, num_labels, X, y, lam))
ãããå¦ç¿
前回ï¼2014/2/1ï¼ã¯ã確ççå¾é éä¸æ³ï¼Stochastic Gradient Descentï¼ãç¨ãã¦ãã¾ãããä¸ã®ããã«ã©ã³ãã ã«é¸æããè¨ç·´ãã¼ã¿ã1ã¤ä½¿ã£ã¦ãã©ã¡ã¼ã¿ã1åæ´æ°ããææ³ã§ãã
for k in range(epochs): # è¨ç·´ãã¼ã¿ããã©ã³ãã ã«é¸æãã i = np.random.randint(X.shape[0]) x = X[i] (ç¥ï¼ # ãã©ã¡ã¼ã¿ãæ´æ° self.weight1 -= learning_rate * np.dot(delta1.T, x) self.weight2 -= learning_rate * np.dot(delta2.T, z)
ä»åã¯ã確ççå¾é éä¸æ³ã§ã¯ãªãããããå¦ç¿ã使ã£ã¦ãã¾ããè¨ç·´ãã¼ã¿ããã¹ã¦ä½¿ã£ã¦èª¤å·®ãèç©ããã¾ã¨ãã¦ãã©ã¡ã¼ã¿ãä¸åæ´æ°ããææ³ã§ãããããåæããã¾ã§ç¹°ãè¿ãè¡ãã¾ãã
for i in range(m): # mã¯è¨ç·´ãã¼ã¿æ° xi = X[i, :] yi = y[i] # 誤差ãèç© Theta1_grad += np.dot(delta2, a1.T) Theta2_grad += np.dot(delta3, a2.T)
ãã®æ¹æ³ã¯æé©è§£ã«ã¾ã£ããè¿ã¥ãã¾ãããè¨ç·´ãã¼ã¿æ°ã大ãããªãã¨ä¸åã®æ´æ°ã«æéããããã¨ããæ¬ ç¹ãããã¾ããMNISTã®ãã¼ã¿æ°ã¯70000ã§ããããããæ´æ°ã ã¨ãããã«ãã¤ãã®ã§è¨ç·´ãã¼ã¿æ°ã¯5000ã«ãã¼ã£ã¦ãã¾ãã確ççå¾é æ³ã¨ãããå¦ç¿ã®ä¸éçãªææ³ã¨ãã¦ãããããå¦ç¿ã¨ããã®ããã使ãããããã§ããå¾ã§è©¦ãã¦ã¿ãããå¦ç¿å®è¡ããã¨ä¸ã®ããã«æ´æ°ãç¹°ãè¿ããã¨ã§ã³ã¹ãé¢æ°ã®å¤ãæ¸ã£ã¦ããæé©åããããã¨ããããã¾ãã
ä¸ã®ãããªåºåãåºã¾ããè¨ç·´ãã¼ã¿ã®ç²¾åº¦ã¯99%ããã¹ããã¼ã¿ã®ç²¾åº¦ã¯92%ã¨ããçµæã§ããã
Warning: Maximum number of iterations has been exceeded. Current function value: 0.336109 Iterations: 70 Function evaluations: 201 Gradient evaluations: 201 *** training set accuracy [[498 0 0 1 1 0 1 0 0 0] [ 0 584 0 0 1 0 0 1 0 1] [ 1 0 473 1 4 1 4 3 1 0] [ 0 1 1 480 0 3 0 1 3 0] [ 1 1 0 0 446 0 0 1 0 0] [ 1 0 1 2 4 465 2 0 3 0] [ 0 0 0 0 1 0 459 0 0 0] [ 0 0 3 0 1 1 0 525 1 2] [ 0 1 1 2 0 0 0 0 505 0] [ 0 0 0 4 2 0 0 2 0 498]] precision recall f1-score support 0.0 0.99 0.99 0.99 501 1.0 0.99 0.99 0.99 587 2.0 0.99 0.97 0.98 488 3.0 0.98 0.98 0.98 489 4.0 0.97 0.99 0.98 449 5.0 0.99 0.97 0.98 478 6.0 0.98 1.00 0.99 460 7.0 0.98 0.98 0.98 533 8.0 0.98 0.99 0.99 509 9.0 0.99 0.98 0.99 506 avg / total 0.99 0.99 0.99 5000 *** test set accuracy [[203 0 2 1 1 0 2 1 0 0] [ 0 226 1 0 0 3 0 0 1 1] [ 2 2 175 0 2 0 4 1 5 1] [ 2 1 6 168 0 10 0 2 0 2] [ 1 3 1 0 183 0 7 0 0 5] [ 1 0 2 6 2 165 3 2 5 5] [ 2 2 1 0 3 1 187 0 1 0] [ 0 0 2 1 2 0 0 196 0 3] [ 1 4 2 2 0 5 3 0 159 4] [ 4 0 0 3 8 4 0 5 0 179]] precision recall f1-score support 0.0 0.94 0.97 0.95 210 1.0 0.95 0.97 0.96 232 2.0 0.91 0.91 0.91 192 3.0 0.93 0.88 0.90 191 4.0 0.91 0.92 0.91 200 5.0 0.88 0.86 0.87 191 6.0 0.91 0.95 0.93 197 7.0 0.95 0.96 0.95 204 8.0 0.93 0.88 0.91 180 9.0 0.90 0.88 0.89 203 avg / total 0.92 0.92 0.92 2000
é ã層ã®å¯è¦å
é ã層ã®éã¿ã¯ä¸ã®ã³ã¼ãã§å¯è¦åãããã¨ãã§ãã¾ãã
# é ãã¦ããããå¯è¦å displayData(Theta1[:, 1:])
ãã¥ã¼ã©ã«ãããã®é ã層ã§ã¯æ°åç»åã®ç¹å¾´ããã®ããã«ã¨ããã¦ãããã§ãããå ã®æ°åã®é¢å½±ã¯ã»ã¨ãã©ãªãããã©ã»ã»ã»é¡ç»åãèªèããããããã¨ãã£ã¨é¢ç½ãç»åãå¾ãããã®ããªï¼é¡ç»åèªèãä»åº¦è©¦ãã¦ã¿ããã
å®æããã¹ã¯ãªãã
https://github.com/sylvan5/PRML/blob/master/ch5/mlp_cg.py
#coding: utf-8 import numpy as np from scipy import optimize from matplotlib import pyplot from sklearn.datasets import load_digits, fetch_mldata from sklearn.preprocessing import LabelBinarizer from sklearn.metrics import confusion_matrix, classification_report """ æ£ååå¤å±¤ãã¼ã»ãããã³ã«ããææ¸ãæåèªè ã»ãããå¦ç ã»å ±å½¹å¾é æ³ï¼Conjugate Gradient Methodï¼ã«ãããã©ã¡ã¼ã¿æé©å """ def displayData(X): """ ãã¼ã¿ããã©ã³ãã ã«100ãµã³ãã«é¸ãã§å¯è¦å ç»åãã¼ã¿ã¯8x8ãã¯ã»ã«ãä»®å® """ # ã©ã³ãã ã«100ãµã³ãã«é¸ã¶ sel = np.random.permutation(X.shape[0]) sel = sel[:100] X = X[sel, :] for index, data in enumerate(X): pyplot.subplot(10, 10, index + 1) pyplot.axis('off') image = data.reshape((28, 28)) pyplot.imshow(image, cmap=pyplot.cm.gray_r, interpolation='nearest') pyplot.show() def randInitializeWeights(L_in, L_out): """ (-epsilon_init, +epsilon_init) ã®ç¯å²ã§ éã¿ãã©ã³ãã ã«åæåããéã¿è¡åãè¿ã """ # å ¥åã¨ãªã層ã«ã¯ãã¤ã¢ã¹é ãå ¥ãã®ã§+1ãå¿ è¦ãªã®ã§æ³¨æ epsilon_init = 0.12 W = np.random.rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init return W def sigmoid(z): return 1.0 / (1 + np.exp(-z)) def sigmoidGradient(z): return sigmoid(z) * (1 - sigmoid(z)) def safe_log(x, minval=0.0000000001): return np.log(x.clip(min=minval)) def nnCostFunction(nn_params, *args): """NNã®ã³ã¹ãé¢æ°ã¨ãã®åå¾®åãæ±ãã""" in_size, hid_size, num_labels, X, y, lam = args # ãã¥ã¼ã©ã«ãããã®å ¨ãã©ã¡ã¼ã¿ãè¡åå½¢å¼ã«å¾©å Theta1 = nn_params[0:(in_size + 1) * hid_size].reshape((hid_size, in_size + 1)) Theta2 = nn_params[(in_size + 1) * hid_size:].reshape((num_labels, hid_size + 1)) # ãã©ã¡ã¼ã¿ã®åå¾®å Theta1_grad = np.zeros(Theta1.shape) Theta2_grad = np.zeros(Theta2.shape) # è¨ç·´ãã¼ã¿æ° m = X.shape[0] # è¨ç·´ãã¼ã¿ã®1åç®ã«ãã¤ã¢ã¹é ã«å¯¾å¿ãã1ã追å X = np.hstack((np.ones((m, 1)), X)) # æ師ã©ãã«ã1-of-K表è¨ã«å¤æ lb = LabelBinarizer() lb.fit(y) y = lb.transform(y) J = 0 for i in range(m): xi = X[i, :] yi = y[i] # forward propagation a1 = xi z2 = np.dot(Theta1, a1) a2 = sigmoid(z2) a2 = np.hstack((1, a2)) z3 = np.dot(Theta2, a2) a3 = sigmoid(z3) J += sum(-yi * safe_log(a3) - (1 - yi) * safe_log(1 - a3)) # backpropagation delta3 = a3 - yi delta2 = np.dot(Theta2.T, delta3) * sigmoidGradient(np.hstack((1, z2))) delta2 = delta2[1:] # ãã¤ã¢ã¹é ã«å¯¾å¿ããè¦ç´ ãé¤å¤ # ãã¯ãã« x ãã¯ãã« = è¡åã®æ¼ç®ãããªããã°ãªããªãã®ã§ # 縦ãã¯ãã«ã¸ã®reshapeãå¿ è¦ # è¡æ°ã«-1ãæå®ããã¨èªåçã«å ¥ã delta2 = delta2.reshape((-1, 1)) delta3 = delta3.reshape((-1, 1)) a1 = a1.reshape((-1, 1)) a2 = a2.reshape((-1, 1)) # æ£ååããã®ã¨ãã®ãã«ã¿ã®æ¼ç® Theta1_grad += np.dot(delta2, a1.T) Theta2_grad += np.dot(delta3, a2.T) J /= m # æ£ååé temp = 0.0; for j in range(hid_size): for k in range(1, in_size + 1): # ãã¤ã¢ã¹ã«å¯¾å¿ããéã¿ã¯å ããªã temp += Theta1[j, k] ** 2 for j in range(num_labels): for k in range(1, hid_size + 1): # ãã¤ã¢ã¹ã«å¯¾å¿ããéã¿ã¯å ããªã temp += Theta2[j, k] ** 2 J += lam / (2.0 * m) * temp; # åå¾®åã®æ£ååé Theta1_grad /= m Theta1_grad[:, 1:] += (lam / m) * Theta1_grad[:, 1:] Theta2_grad /= m Theta2_grad[:, 1:] += (lam / m) * Theta2_grad[:, 1:] # ãã¯ãã«ã«å¤æ grad = np.hstack((np.ravel(Theta1_grad), np.ravel(Theta2_grad))) print "J =", J return J, grad def predict(Theta1, Theta2, X): m = X.shape[0] num_labels = Theta2.shape[0] # forward propagation X = np.hstack((np.ones((m, 1)), X)) h1 = sigmoid(np.dot(X, Theta1.T)) h1 = np.hstack((np.ones((m, 1)), h1)) h2 = sigmoid(np.dot(h1, Theta2.T)) return np.argmax(h2, axis=1) if __name__ == "__main__": # å ¥å層ã®ã¦ãããæ° in_size = 28 * 28 # é ã層ã®ã¦ãããæ° hid_size = 25 # åºå層ã®ã¦ãããæ°ï¼ï¼ã©ãã«æ°ï¼ num_labels = 10 # è¨ç·´ãã¼ã¿ããã¼ã digits = fetch_mldata('MNIST original', data_home=".") X = digits.data X = X.astype(np.float64) X /= X.max() y = digits.target # ãã¼ã¿ãã·ã£ããã« p = np.random.permutation(len(X)) X = X[p, :] y = y[p] # æåã®5000ãµã³ãã«ãé¸æ X_train = X[:5000, :] y_train = y[:5000] # ãã¼ã¿ãå¯è¦å displayData(X_train) # ãã©ã¡ã¼ã¿ãã©ã³ãã ã«åæå initial_Theta1 = randInitializeWeights(in_size, hid_size) initial_Theta2 = randInitializeWeights(hid_size, num_labels) # ãã©ã¡ã¼ã¿ããã¯ãã«ã«ãã©ããå initial_nn_params = np.hstack((np.ravel(initial_Theta1), np.ravel(initial_Theta2))) # æ£ååä¿æ° lam = 1.0 # åæç¶æ ã®ã³ã¹ããè¨ç® J, grad = nnCostFunction(initial_nn_params, in_size, hid_size, num_labels, X_train, y_train, lam) # Conjugate Gradientã§ãã©ã¡ã¼ã¿æ¨å® # NNã¯ã³ã¹ãé¢æ°ã¨åå¾®åã®è¨ç®ãéè¤ããããåãé¢æ°ï¼nnCostFunctionï¼ã«ã¾ã¨ãã¦ãã # ãã®å ´åãfmin_cgã§ã¯ãªãminimizeã使ç¨ããã¨ãã # minimize()ã¯scipy 0.11.0以ä¸ãå¿ è¦ res = optimize.minimize(fun=nnCostFunction, x0=initial_nn_params, method="CG", jac=True, options={'maxiter':70, 'disp':True}, args=(in_size, hid_size, num_labels, X_train, y_train, lam)) nn_params = res.x # ãã©ã¡ã¼ã¿ãå解 Theta1 = nn_params[0:(in_size + 1) * hid_size].reshape((hid_size, in_size + 1)) Theta2 = nn_params[(in_size + 1) * hid_size:].reshape((num_labels, hid_size + 1)) # é ãã¦ããããå¯è¦å displayData(Theta1[:, 1:]) # è¨ç·´ãã¼ã¿ã®ã©ãã«ãäºæ¸¬ãã¦ç²¾åº¦ãæ±ãã pred = predict(Theta1, Theta2, X_train) print "*** training set accuracy" print confusion_matrix(y_train, pred) print classification_report(y_train, pred) # è¨ç·´ãã¼ã¿ã¨ãã¶ããªãç¯å²ãããã¹ããã¼ã¿ãé¸ãã§ç²¾åº¦ãæ±ãã print "*** test set accuracy" X_test = X[10000:12000, :] y_test = y[10000:12000] pred = predict(Theta1, Theta2, X_test) print len(pred) print confusion_matrix(y_test, pred) print classification_report(y_test, pred)