CNTK 2.0ã®ãã¥ã¼ããªã¢ã«ã試è¡ããã
ä»åã¯ããç¨åº¦ç¿»è¨³ã¨ããå½¢ã§å®æ½ãã¦ã¿ããã ãã¶èª¤è¨³ã¯ããã ããããã¶ãã¯ãªæ¦è¦ãã¤ããã¨ããå½¢ã§ãã£ã¦ããããã
以éã®æç« ã«ã¤ãã¦ã¯ãCNTK 2.0ã®ç¿»è¨³ãå«ãã§ãããã誤訳ãå«ãã§ããå¯è½æ§ããããåé¡ãããå ´åã¯ææãã¦ããã ããã¨ãããããã§ãã
from IPython.display import Image
CNTK 102: ãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ã使ã£ã¦ãã¼ã¿åé¡ãå®è¡ãã
CNTK 101ã§ã¯å帰åæã§ãã¼ã¿ã®åé¡ãè¡ã£ãããä»åã¯ãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ãç¨ãã¦ãã¼ã¿ã®åé¡ãå®è¡ãããããã¯CNTK 101ã¨åã課é¡ã解ããã¨ã«ãªãã
åé¡ (CNTK 101ã¨åæ§)
CNTK 101ãã¥ã¼ããªã¢ã«ã§ã¯ãå帰åæãç¨ãã¦ãã¼ã¿ã®åé¡ãè¡ããããã§ä¾ã¨ãã¦åãä¸ãããã¦ããã®ã¯ã2種é¡ã®è¦ç´ (å¹´é½¢ãè «çã®å¤§ãã)ã«å¯¾ããã¬ã³è «çã§ãããã©ãããèå¥ããå帰åæã ãã¤ã¾ããX軸ãå¹´é½¢ãY軸ãè «çã®å¤§ããé½å¸ããããã¬ã³è «çã§ãããã©ãããèå¥ãããã®ã¨ãªãã 以ä¸ã®ãããªã°ã©ãããããããããã¨ãã§ããéè²ã®ãããããè¯æ§è «çã赤è²ã®ãããããæªæ§è «çã¨ãªãã ããã«ãä¸è¨ã®ãããªå帰ç´ç·ãå¼ããã¨ã§ãè¯æ§ã¨æªæ§ã®å¢çç·ãå¼ããã¨ãã§ããããã«ãªãã
# Figure 1 Image(url="https://www.cntk.ai/jup/cancer_data_plot.jpg", width=400, height=400)
CNTK 101ã§ã¯ãå帰åæãç¨ãã¦ãã¼ã¿ãåé¡ããç´ç·ãæ¢ç´¢ããããããå®éã®ä¸çã®åé¡ã§ã¯ãç·å½¢é¢æ°ã§åé¡ãããã¨ã®ã§ãããã¼ã¿ã¨ããã®ã¯å¶åº¦ã«ãããè¯ãç¹æ§ãå°ãããã®ææã¨ãã¦ã¯ä¸è¶³ã ã åã®ãã¥ã¼ããªã¢ã«ã§ã¯ãè¤æ°ã®ç·å½¢é¢æ°ã®çããçµã¿åããããã¨ã«ãã£ã¦(CNTK 101ã®å帰åæã®åçãå©ç¨ãã)ãéç·å½¢ã®åé¡æ³ãå°ããã¨ãç®çã¨ããã
ã¢ããã¼ãã®æ¹æ³
ä»»æã®å¦ç¿ã¢ã«ã´ãªãºã ã¯5ã¤ã®æ®µéãè¸ãããã¼ã¿èªã¿è¾¼ã¿ããã¼ã¿å¦çãã¢ãã«ä½æãã¢ãã«ãã©ã¡ã¼ã¿ã®å¦ç¿ãã¢ãã«ã®è©ä¾¡(ãã¹ããäºæ¸¬)ã§ããã æ¬ãã¥ã¼ããªã¢ã«ã§ã¯ã3çªç®ã®ã¢ãã«ä½æã®ã¨ããã®ã¿CNTK 101ã¨ç°ãªããä»åã¯ãã£ã¼ããã©ã¯ã¼ãã®ãããã¯ã¼ã¯ãå©ç¨ããã
ãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ã¢ãã«
å帰ã¢ãã«ã®ãã¥ã¼ããªã¢ã«ã¨ãåæ§ã®ãã¼ã¿ã»ããã使ç¨ãã¦ãã£ã¼ããã©ã¯ã¼ãã®ã¢ãã«ãå¦ç¿ãããããã®ã¢ãã«ã§ã¯è¤æ°ã®å帰åæå¨ãå©ç¨ãã¦ãåä½ã®å帰åæãããããè¤éãªãã¼ã¿ãåé¡ããéã«é©åãªå¢çå¤ãè¨å®ã§ããããã«ããã以ä¸ã¯ä¸è¬çãªãããã¯ã¼ã¯ã®æ§æã§ããã
# Figure 2 Image(url="https://upload.wikimedia.org/wikipedia/en/5/54/Feed_forward_neural_net.gif", width=200, height=200)
ãã£ã¼ããã©ã¯ã¼ãã®ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã¯ããããã¯ã¼ã¯ã®ãã¼ãéã§ããµã¤ã¯ã«ãçºçããªãããããã¯ã¼ã¯ã§ããã ãã£ã¼ããã©ã¯ã¼ããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã¯äººå·¥ã®ãã¥ã¼ãããããã¯ã¼ã¯ã®ä¸ã§æåã«ç»å ´ãããæãã·ã³ãã«ãªãããã¯ã¼ã¯ã§ããã ãã®ãããã¯ã¼ã¯ã§ã¯ãæ å ±ã¯åä¸ã®æ¹åãé æ¹åã«ã®ã¿ãå ¥åãã¼ããã(å¿ è¦ã¨ããã°)é ããã¼ããããã¦åºåãã¼ãã¸ã¨ä¼æ¬ãã¦ããã ãããã¯ã¼ã¯ä¸ã«ãµã¤ã¯ã«ã¯åå¨ããªãã
æ¬ãã¥ã¼ããªã¢ã«ã§ã¯ãä¸è¨ã®5ã¤ã®ã¹ãããã¨ããã¹ããã¼ã¿ãç¨ããã¢ãã«ã®ãã¹ããå®äºãããããã«å¿ è¦ãªã¹ãããã¨ãã¦ã¯ãç°ãªããã®ãå©ç¨ãã¦ããã
# Import the relevant components from __future__ import print_function # Use a function definition from future version (say 3.x from 2.7 interpreter) import matplotlib.pyplot as plt %matplotlib inline import numpy as np import sys import os import cntk as C
以ä¸ã®ãããã¯ã§ã¯ãæ¬ãã¥ã¼ããªã¢ã«ã®ã¹ã¯ãªãããCNTKã§å®è¡ããå ´åã®ã¿ã¼ã²ããã«ã¤ãã¦ããã·ã³å é¨ã®å®ç¾©ããã¦ããç°å¢å¤æ°ãæ¢ãã¦ããã é©åãªã¿ã¼ã²ããããã¤ã¹(GPU vs CPU)ãé¸æãã¦è¨å®ããããã以å¤ã®å ´åã«ã¯ãCNTKã®ããã©ã«ãã®ããªæ°ãå©ç¨ãã¦å®è¡å¯è½ãªæé©ãªããã¤ã¹ãå©ç¨ãã(使ç¨å¯è½ãªãã°GPUããã以å¤ã¯CPU)
# Select the right target device when this notebook is being tested: if 'TEST_DEVICE' in os.environ: if os.environ['TEST_DEVICE'] == 'cpu': C.device.try_set_default_device(C.device.cpu()) else: C.device.try_set_default_device(C.device.gpu(0))
ãã¼ã¿ã®çæ
æ¬ç¯ã¯CNTK 101ãç解ãã¦ããã®ãªãã°ã¹ããããããã¨ãã§ããã次ã®ã»ã¯ã·ã§ã³ãã¢ãã«ã®çæãã«é²ãã§ããã
ããã®ãµã³ãã«ã«ä¼¼ããåæãã¼ã¿ãä½æãããããã«ã¯Pythonã®numpy
ã©ã¤ãã©ãªãå©ç¨ãããããã§çæãã(2次å
ã§è¡¨ç¾ããã)ï¼ã¤ã®ç¹å¾´ãæã£ã¦ãããåãã¼ã¿ã¯2ã¤ã®ã¯ã©ã¹ã®ãã¡ã©ã¡ããã«å±ãã¦ãã(è¯æ§ãã¼ã¿: éãæªæ§ãã¼ã¿:赤)ã
ä»åã®ä¾ã§ã¯ããã¬ã¼ãã³ã°ãã¼ã¿ã¯ã©ãã«(éãããã¯èµ¤)ãæã£ã¦ãããå軸ã«å¯¾å¿ãã¦ãã(ããã§ã¯2ã¤ã®ç¹æ§ã¨ãã¦ãå¹´é½¢ã¨è «çã®å¤§ãããå©ç¨ãã)ã æ¬ä¾ã§ã¯ã2ã¤ã®ã¯ã©ã¹ãã©ãã«0, 1ã¨ãã¦è¡¨ç¾ããããããã£ã¦ããã®åé¡ã¯2å¤ã®åé¢åé¡ã§ããã
# Ensure we always get the same amount of randomness np.random.seed(0) # Define the data dimensions input_dim = 2 num_output_classes = 2
å ¥åå¤ã¨ã©ãã«
å¸ãã¥ã¼ããªã¢ã«ã§ã¯numpy
ã©ã¤ãã©ãªãå©ç¨ãã¦åæãã¼ã¿ãä½æãããèªè
ãå®éã®ä¸çã§æ±ãåé¡ã§ã¯ãç¹æ§ãã¼ã¿(å¹´é½¢ã¨è
«çã®å¤§ãã)ã«å¯¾ãã¦ãã²ã¨ã¤ã®è¦³å¯ãã¼ã¿(observation)ãåå¨ããã
ããã§ã¯ãå観å¯ãã¼ã¿ã¯(ããå¥ã®ç¹æ§ãåå¨ãããªãã°)2次å
ã®ãã¼ã¿ã§ã¯ãªãããé«æ¬¡å
ã®ãã¼ã¿ã§ããããããã¯CNTKã§ã¯ãã³ã½ã«ã¨ãã¦è¡¨ç¾ãããã
ããé«åº¦ãªãã¥ã¼ããªã¢ã«ã§ã¯ããé«ã次å
ã®ãã¼ã¿ãæ±ãæ¹æ³ã«ã¤ãã¦ãç´¹ä»ããã
# Helper function to generate a random data sample def generate_random_data_sample(sample_size, feature_dim, num_classes): # Create synthetic data using NumPy. Y = np.random.randint(size=(sample_size, 1), low=0, high=num_classes) # Make sure that the data is separable X = (np.random.randn(sample_size, feature_dim)+3) * (Y+1) X = X.astype(np.float32) # converting class 0 into the vector "1 0 0", # class 1 into vector "0 1 0", ... class_ind = [Y==class_number for class_number in range(num_classes)] Y = np.asarray(np.hstack(class_ind), dtype=np.float32) return X, Y
# Create the input variables denoting the features and the label data. Note: the input # does not need additional info on number of observations (Samples) since CNTK first create only # the network tooplogy first mysamplesize = 64 features, labels = generate_random_data_sample(mysamplesize, input_dim, num_output_classes)
ãã¼ã¿ãå¯è¦åãã¦ã¿ããã
注æ mathplotlib.pyplotã®ã¤ã³ãã¼ãã«å¤±æããå ´åãconda install matplotlib
ãå®è¡ãã¦pyplotã®ãã¼ã¸ã§ã³ä¾åã®åé¡ã修復ãããã¨ã
# Plot the data import matplotlib.pyplot as plt %matplotlib inline # given this is a 2 class colors = ['r' if l == 0 else 'b' for l in labels[:,0]] plt.scatter(features[:,0], features[:,1], c=colors) plt.xlabel("Scaled age (in yrs)") plt.ylabel("Tumor size (in cm)") plt.show()
ã¢ãã«ã®çæ
æã
ã®ãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ã¯æ¯è¼çåç´ã§ã2ã¤ã®é ãã¬ã¤ã¤(num_hidden_layers
)ãæã£ã¦ãããåé ãã¬ã¤ã¤ã«ã¯50åã®é ããã¼ã(hidden_layer_dim
)ãç¨æããã¦ããã
# Figure 3 Image(url="http://cntk.ai/jup/feedforward_network.jpg", width=200, height=200)
ä¸è¨ã®ç·è²ã®ãã¼ããé ãã¬ã¤ã¤ã§ãããæ¬ä¾ã§ã¯50åç¨æããã¦ãããã¾ãé ãã¬ã¤ã¤ã®æ°ã¯2ã§ãããããã§æ¬¡ã®å¤ã¯ã©ã®ããã«ãªããã
- num_hidden_layers
- hidden_layers_dim
注æ ä¸è¨ã®å³ã§ã¯ã(å帰åæã®éã«èª¬æãã)ãã¤ã¢ã¹ãã¼ãã«ã¤ãã¦ã¯è¡¨ç¤ºãã¦ããªããåé ãã¬ã¤ã¤ã«ã¯ãã¤ã¢ã¹ãã¼ããåå¨ãã¦ããã
num_hidden_layers = 2 hidden_layers_dim = 50
ãããã¯ã¼ã¯ã®å ¥åã¨åºåã¯ä»¥ä¸ã§ããã
- å ¥åå¤(CNTKã³ã³ã»ããã®ãã¼ã¨ãªããã®)
å ¥åå¤ã¯ãã¢ãã«è¦³æ¸¬ä¸(ã¤ã¾ããã¬ã¼ãã³ã°ä¸)ãã¢ãã«è©ä¾¡ä¸(ã¤ã¾ããã¹ãä¸)ãç°ãªã観測ãã¼ã¿(ãã¼ã¿ãã¤ã³ããããã¯ãµã³ãã«ãæ¬ä¾ã®é赤è²ã§è¡¨ç¾ããããã®)ã®ã³ã³ããã§ããã ã¤ã¾ãã
input
ã®å½¢ç¶ã¯æä¾ãããåºåãã¼ã¿ã®å½¢ç¶ã¨ããããã¦ããªããã°ãªããªãã ä¾ãã°ããã¼ã¿ã縦10ãã¯ã»ã«x横5ãã¯ã»ã«ã®ãã¼ã¿ã§ããã°ãå ¥åã®ç¹å¾´å¤ã®æ¬¡å ã¯2ã§ãã(ç»åãã¼ã¿ã®ç¸¦ã横ã§è¡¨ç¾ããã)ã åæ§ã«ãæ¬ä¾ã§ã¯å ¥åãã¼ã¿ã®æ¬¡å ã¯ãå¹´é½¢ã¨è «çãµã¤ãºã®2ãã¤ã¾ãinput_dim = 2
ã§ããã å¥ã®ãã¥ã¼ããªã¢ã«ã§ã¯ãããå¤ãã®æ¬¡å ã®ãã¼ã¿ã®å¦çã«ã¤ãã¦ãåãæ±ãã
åé¡ ä»åã®ã¢ãã«ã§ã¯ãå ¥åã®æ¬¡å å¤ã¯ããã¤ã«ãªããï¼ ããã¯CNTKã®ãããã¯ã¼ã¯ãã¢ãã«ã®å¤ãç解ããããã«ã¯å¿ é ã®ãã®ã§ããã
# The input variable (representing 1 observation, in our example of age and size) $\bf{x}$ which # in this case has a dimension of 2. # # The label variable has a dimensionality equal to the number of output classes in our case 2. input = C.input_variable(input_dim) label = C.input_variable(num_output_classes)
ãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ã®ã»ããã¢ãã
ã§ã¯1ã¹ããããã¤ãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ãæ§ç¯ãã¦ãããã
æåã®ã¬ã¤ã¤ã¯æ¬¡å
(input_dim
, ã¨è¡¨è¨)ã®å
¥åç¹å¾´ãã¯ãã«()ãå
¥åããããããevidence((hidden_layer_dim
)次å
(ããã§ã¯ã¨è¡¨è¨)ãåãåãé ãã¬ã¤ã¤ã«åãã¦æ¸¡ãã
å
¥åã¬ã¤ã¤ã®åç¹å¾´ãã¼ã¿ã¯ãè¡å ã§è¡¨ç¾ãããéã¿ãã¼ãã«æ¥ç¶ãããã
æåã®ã¹ãããã§ã¯ããã¹ã¦ã®ç¹å¾´ãã¼ã¿ã«å¯¾ãã¦evidenceãè¨ç®ãããã¨ã§ããã
注æ ããã§ã¯è¡åããã¯ãã«ã表ç¾ããããã«å¤ªåãå©ç¨ãã¦ããã
ããã§ã¯æ¬¡å ã®ãã¤ã¢ã¹ãã¯ãã«ã§ããã
liner_layer
é¢æ°ã§ã¯ã以ä¸ã®2ã¤ã®å¦çãè¡ãã
- ç¹å¾´ãã¼ã¿()ã¨éã¿()ãä¹ç®ããã
- ãã¤ã¢ã¹é ãå ç®ããã
def linear_layer(input_var, output_dim): input_dim = input_var.shape[0] weight = C.parameter(shape=(input_dim, output_dim)) bias = C.parameter(shape=(output_dim)) return bias + C.times(input_var, weight)
次ã®ã¹ãããã¯evidence(ç·å½¢ã¬ã¤ã¤ã®åºå)ãéç·å½¢é¢æ°(ãããããæ´»æ§åé¢æ°ã)ã«å¤å½¢ãããã¨ã§ããã æ´»æ§åé¢æ°ãé¸æãã¦ãevidenceãæ´»æ§åé¢æ°ã«å¤æãããã¨ã«ãªã(æ´»æ§åé¢æ°ã¯ãããããé¸æãã)ã Sigmoidé¢æ°ãTanhé¢æ°ãæ´å²çã«å©ç¨ããã¦ãã¦ããã æ¬ãã¥ã¼ããªã¢ã«ã§ã¯Sigmoidé¢æ°ãå©ç¨ãããSigmoidé¢æ°ã®åºåã¯ããã°ãã°æ¬¡ã®ã¬ã¤ã¤ã®å ¥åã¨ãã¦å©ç¨ãããã¾ãæçµã¬ã¤ã¤ã®åºåã¨ãã¦å©ç¨ãããã
åé¡ ãã®ã»ãã®éç·å½¢é¢æ°ã使ç¨ãã¦ã¿ããããããã®éç·å½¢é¢æ°ã®ä½¿ç¨æ¹æ³ã«ã¤ãã¦æ £ãã¦ã¿ããã
def dense_layer(input_var, output_dim, nonlinearity): l = linear_layer(input_var, output_dim) return nonlinearity(l)
å®å ¨æ¥ç¶ãããåé¡å¨ãä½æããããã«ã¯ãä¸ã¤é ãã¬ã¤ã¤ãä½æããã¨ããã次ã ã«æ¥ç¶ããªããã°ãªããªãã æåã®ã¬ã¤ã¤ã®åºå ã¯æ¬¡ã®ã¬ã¤ã¤ã®å ¥åã¨ãªãã
æ¬ãã¥ã¼ããªã«ããã®ãããã¯ã¼ã¯ã§ã¯2層ã®ã¬ã¤ã¤ãã使ç¨ããªãã®ã§ã以ä¸ã®ãããªã³ã¼ãã¨ãªãã
h1 = dense_layer(input_var, hidden_layer_dim, sigmoid) h2 = dense_layer(h1, hidden_layer_dim, sigmoid)
ããå¤å±¤ã®ãããã¯ã¼ã¯ã«ã¤ãã¦ç´ æ©ãè©ä¾¡ããããã«ã¯ã以ä¸ã®ãããªè¨è¿°ã§ãããã ããã
h = dense_layer(input_var, hidden_layer_dim, sigmoid) for i in range(1, num_hidden_layers): h = dense_layer(h, hidden_layer_dim, sigmoid)
# Define a multilayer feedforward classification model def fully_connected_classifier_net(input_var, num_output_classes, hidden_layer_dim, num_hidden_layers, nonlinearity): h = dense_layer(input_var, hidden_layer_dim, nonlinearity) for i in range(1, num_hidden_layers): h = dense_layer(h, hidden_layer_dim, nonlinearity) return linear_layer(h, num_output_classes)
ãããã¯ã¼ã¯ã®åºåz
ã¯ãããã¯ã¼ã¯å
¨ä½ã®åºåã¨ãã¦ä½¿ç¨ãããã
# Create the fully connected classfier z = fully_connected_classifier_net(input, num_output_classes, hidden_layers_dim, num_hidden_layers, C.sigmoid)
åè¿°ã®ãããã¯ã¼ã¯ã®è¨è¿°ãCNTKã®ããªããã£ãé¢æ°ç¾¤ã使ç¨ãã¦ãããã¯ã¼ã¯ãæ§ç¯ããæ¹æ³ãç解ããã®ãå©ããä¸æ¹ã§ãããææ©ãé«éãªãããã¯ã¼ã¯ãæ§ç¯ããããã«ã©ã¤ãã©ãªã使ç¨ããæ¹æ³ãããã
CNTKã¯(ã¬ã´ãããã¯ã®ãããª)å
±éé¨åã®ãã¬ã¤ã¤ããç¨æãã¦ãããããããæ¨æºçãªã¬ã¤ã¤ã§æ§æããããããã¯ã¼ã¯ã®è¨è¨ãç°¡åã«ããã
ä¾ãã°ãdense_layer
ã¯Dense
ã¬ã¤ã¤é¢æ°ã«ãã£ã¦ç°¡åã«ãã£ã¼ãã¢ãã«ã¨ãã¦æä½ã§ããããã«ãªã£ã¦ããã
å
¥åå¤(Input
)ãã¢ãã«ã«æ¸¡ãã¨ããããã¯ã¼ã¯ã®åºåãè¿ãããã
ææ¡ãããã¿ã¹ã¯ å®ç¾©ãããã¢ãã«ããã§ãã¯ããcreate_model
é¢æ°ã®åºåã¨ä¸è¨ã§ä¸è¨ã®ã³ã¼ããã«ãã»ã«åãããã®ãæ¯è¼ããã
def create_model(features): with C.layers.default_options(init=C.layers.glorot_uniform(), activation=C.sigmoid): h = features for _ in range(num_hidden_layers): h = C.layers.Dense(hidden_layers_dim)(h) last_layer = C.layers.Dense(num_output_classes, activation = None) return last_layer(h) z = create_model(input)
ã¢ãã«ãã©ã¡ã¼ã¿ã®ãã¬ã¼ãã³ã°
ãããã¯ã¼ã¯ãå®æãããã次ã«ã¢ãã«ãã©ã¡ã¼ã¿ã¨ã®ãã¬ã¼ãã³ã°ãè¡ãã ã¾ãã¯ãããã¯ã¼ã¯ãéã£ã¦å¾ãããåºå[tex: \bt z{final_layer}] ã確çã¨ãã¦å¤æããã [tex: \bt p = \text{softmax}\left( \bt z\text{final_layer} \right)]
ãã¬ã¼ãã³ã°
ãã¬ã¼ãã³ã°ãå®è¡ããããã«ã¯ããããã¯ã¼ã¯ãéãã¦å¾ãããå¤ã¨å®éã®çããã©ããããç°ãªã£ã¦ããããè¨ç®ããå¿ è¦ãããã ã¤ã¾ããå®éã®ã©ãã«ã«è¿ã¥ãããã«ããããã¯ã¼ã¯ãéãã¦å¾ããã確çãè¿ã¥ãã¦ããå¿ è¦ãããã ãã®é¢æ°ã¯ãæ失é¢æ°ãã¨å¼ã°ããå¦ç¿ããã¢ãã«ã¨å®éã®ãã¬ã¼ãã³ã°ã»ããã®èª¤å·®ã示ãã¦ããã
ã¯softmaxé¢æ°ã«ããçæããã確çã§ããã ã¯ã©ãã«ã示ãã¦ããã
ãã¼ã¿ã«ããæä¾ãããã©ãã«ã¯ãground-truth labelãã¨ãå¼ã°ããã
ãã®ãã¥ã¼ããªã¢ã«ã§ã¯ã©ãã«ã®ã¯ã©ã¹ã¯2ã ããã©ãã«ã®å¤ã2ã¤ã®ããããæã£ã¦ãã(num_output_classes
ã®å¤ãã¨åä¸ã§ãã)ã
ä¸è¬çã«ã¯ãæå
ã«ããã¿ã¹ã¯ãåã®ç°ãªãã¯ã©ã¹ã«åé¡ãããå ´åãã©ãã«ã®å¤ã¯åã®è¦ç´ ãæã¡ãã®è¦ç´ ã0ã®ã¨ããã«ããã°ããã¼ã¿ãã¤ã³ãã§è¡¨ãããã¯ã©ã¹ãé¤ãã¦1ã«ãªãã
ã¯ãã¹ã¨ã³ãããã¼é¢æ°ã«ã¤ãã¦ã¯ãç解ãããã¨ãå¼·ãæ¨å¥¨ããã
loss = C.cross_entropy_with_softmax(z, label)
è©ä¾¡
åé¡çµæãè©ä¾¡ããããã«ã¯ãevidenceã®ãã¯ãã«ã¨ãã¦è¡¨ç¾ããã¦ãããããã¯ã¼ã¯ã®åºåå¤(softmaxé¢æ°ã«ãã確çã«å¤æããã¦ãã)ã¯ã¯ã©ã¹ã®æ°ã¨åä¸ãªã®ã§ãããããæ¯è¼ãããã¨ã§ããã
eval_error = C.classification_error(z, label)
誤差ãæå°ã«ããããã«ã¯æ§ã ãªææ³ãããããã確ççå¾é å¹ææ³(Stochastic Gradient Descent: SGD)ãã¨ããææ³ãæãæåã§ããã å ¸åçã«ããã®æ¹æ³ã¯ã©ã³ãã ãªåæå¤ã§ã¢ãã«ã®ãã©ã¡ã¼ã¿ãè¨å®ããã SGDã®æé©åå¨ã¯ãäºæ¸¬ãããã©ãã«ã®å¤ã¨ground-truthã©ãã«ã¨ã®èª¤å·®ãè¨ç®ããå¾é å¹ææ³ã使ç¨ãã¦æ¬¡ã®ã¤ã¿ã¬ã¼ã·ã§ã³ã®ã¢ãã«ãã©ã¡ã¼ã¿ãçæããã
ä¸è¨ã®ã¢ãã«ãã©ã¡ã¼ã¿ã®æ´æ°ã«ã¯ãã¡ã¢ãªä¸ã«ãã¼ãããã¦ãããã¹ã¦ã®ãã¼ã¿ã»ããã使ç¨ãã¦æ´æ°ããããã§ã¯ãªãã¨ããç¹ã§é åçã§ããã ãã®ææ³ã§ã¯ããå°ãªããã¼ã¿ãã¤ã³ãã使ã£ã¦éã¿ã®æ´æ°ãè¡ãã å¾ã£ã¦ããå¤ãã®ãã¼ã¿ã»ããã使ç¨ãã¦ãã¬ã¼ãã³ã°ãè¡ããã¨ãå¯è½ã¨ãªãã ããããåä¸ã®ãµã³ãã«ã«ãã£ã¦å¤ãæ´æ°ããã¨ãåã¤ã¿ã¬ã¼ã·ã§ã³ã«ã¦çµæã大ããç°ãªãå ´åãããã ããã§ã¯ãè¤æ°ã®å°ããªãã¼ã¿ã»ããããã¼ããã¦ã誤差ã¨æ失ã¯ãã®å¹³åå¤ã¨ããã¢ãã«ãã©ã¡ã¼ã¿ãæ´æ°ããã ããããããããæ³ã¨å¼ã¶ã
ãããããã§ã¯ãããå¤ãã®ãã¬ã¼ãã³ã°ãã¼ã¿ã»ããã使ç¨ããã ç°ãªããã¬ã¼ãã³ã°ãµã³ãã«ã使ç¨ãã¦ã¢ãã«ãã©ã¡ã¼ã¿ã®æ´æ°ãç¹°ãè¿ããå¦çæéã¨æ失ãæ¸ããã¦ããã ã¨ã©ã¼çãå°ãããªãã大ããªæ´æ°ããªããªã£ãå ´åã«ããã®ã¢ãã«ã¯ãã¬ã¼ãã³ã°ãå®äºããã¨ãããã¨ã§ããã
æé©åã®ä¸ã¤ã®ã«ã®ã¨ãªãã®ããlearning_rate
ã¨å¼ã°ããå¤ã§ããã
ãã®å¤ã¯åã¤ã¿ã¬ã¼ã·ã§ã³ã«ããã¦è¨ç®ããå¤ãã¢ãã«ã®æ´æ°ã«ã©ãã ãå½±é¿ãä¸ããã®ãã決ããå¤ã§ããã
ãã®å¤ã«ã¤ãã¦ã¯ãä»å¾ã®ãã¥ã¼ããªã¢ã«ã§è©³ç´°ãè¿°ã¹ãã
ãããã®æ
å ±ã«åºã¥ãã¦ããã¬ã¼ãã³ã°å¨ãæ§æãã¦ããã
# Instantiate the trainer object to drive the model training learning_rate = 0.5 lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch) learner = C.sgd(z.parameters, lr_schedule) trainer = C.Trainer(z, (loss, eval_error), [learner])
ã¾ãã¯ãããã¤ãã®ãµãã¼ãé¢æ°ãä½æãã¦ãå¦ç¿ããçµæãå¯è¦åã§ããããã«ããã
# Define a utility function to compute the moving average sum. # A more efficient implementation is possible with np.cumsum() function def moving_average(a, w=10): if len(a) < w: return a[:] # Need to send a copy of the array return [val if idx < w else sum(a[(idx-w):idx])/w for idx, val in enumerate(a)] # Defines a utility that prints the training progress def print_training_progress(trainer, mb, frequency, verbose=1): training_loss = "NA" eval_error = "NA" if mb%frequency == 0: training_loss = trainer.previous_minibatch_loss_average eval_error = trainer.previous_minibatch_evaluation_average if verbose: print ("Minibatch: {}, Train Loss: {}, Train Error: {}".format(mb, training_loss, eval_error)) return mb, training_loss, eval_error
ãã¬ã¼ãã³ã°ã®å®è¡
ãã¬ã¼ãã³ã°ãå®è¡ãããåã¤ã¿ã¬ã¼ã·ã§ã³ã§ã¯25åã®ãµã³ãã«ãå ¥åãããããã¯ãããããã®ãµã¤ãºã¨åä¸ã§ããã ããã§ã¯20000åã®ãã¼ã¿ãå¦ç¿ãããã ã¡ãªã¿ã«å®éã®å ´åã«ã¯ãããã¨åæ§ã®ã©ãã«ãã¼ã¿ãæä¾ããã ããã§ã¯å ¨ä½ã®ãã¼ã¿ã»ããã®ãã¡70%ããã¬ã¼ãã³ã°ã§ä½¿ç¨ããæ®ãã¯ã¢ãã«ã®è©ä¾¡ã®ããã«åã£ã¦ãããã®ã¨ããã
# Initialize the parameters for the trainer minibatch_size = 25 num_samples = 20000 num_minibatches_to_train = num_samples / minibatch_size
# Run the trainer and perform model training training_progress_output_freq = 20 plotdata = {"batchsize":[], "loss":[], "error":[]} for i in range(0, int(num_minibatches_to_train)): features, labels = generate_random_data_sample(minibatch_size, input_dim, num_output_classes) # Specify the input variables mapping in the model to actual minibatch data for training trainer.train_minibatch({input : features, label : labels}) batchsize, loss, error = print_training_progress(trainer, i, training_progress_output_freq, verbose=0) if not (loss == "NA" or error =="NA"): plotdata["batchsize"].append(batchsize) plotdata["loss"].append(loss) plotdata["error"].append(error)
ãããããã®ãµã¤ãºãå¤ãããã¨ã«ãããã¬ã¼ãã³ã°ã®ã¨ã©ã¼ã®éãã«ã¤ãã¦ã¿ã¦ããã å¦ç¿éä¸ã®å¤ããã§ãã¯ãããã¨ã«ãããæ失é¢æ°ãå°ãããªã£ã¦ãããã¨ã¯ãããã ãã®å¤ã¯ã¤ã¿ã¬ã¼ã·ã§ã³ä¸ã«ãã¼ã¿ãééã£ã¦äºæ¸¬ããã¦ãããã¨ã示ãã ããã¯ãã¬ã¼ãã³ã°ä¸ã«æ°ããå¤ãå ¥åããããã¨ã«ããçºçããå¯è½æ§ãããã
ãã®ãããªç¾è±¡ãå°ããããããã«ã¯ããããããã®ãµã¤ãºã大ãããããã¨ããããããã åã¤ã¿ã¬ã¼ã·ã§ã³ã§ãã¹ã¦ã®ãã¼ã¿ã»ããã使ããã¨ã«ãããçæ³çã«ã¯ãã®ãããªç¾è±¡ãç¡ãããã¨ãã§ããã ããã«ãããã¤ã¿ã¬ã¼ã·ã§ã³ä¸ã«æ失é¢æ°ãã¶ãããã¨ãªãæ¸å°ãããã¨ã示ãã ãããããã®ææ³ã§ã¯ãã¹ã¦ã®ãã¼ã¿ã»ããã«å¯¾ãã¦å¾é ãè¨ç®ããå¿ è¦ããããã¢ãã«ãã©ã¡ã¼ã¿ã®æ´æ°ã大éã«å®è¡ããªããã°ãªããªãã ãã®ç°¡åãªä¾ã§ã¯å¤èªä½ã¯ããã¾ã§å¤§ãããªãããå®éã®äºè±¡ã«ããã¦ã¯å ¨ä½ã®ãã¼ã¿ã»ãããå¦ç¿ãããããã«è¤æ°ã®ãã¹ãä½æãã¦ãã©ã¡ã¼ã¿ã®ã¢ãããã¼ãã®è¨ç®ãæ³å¤ãªãã®ã«ãªããªãããã«é æ ®ããå¿ è¦ãããã
å¾ã£ã¦ãç§ãã¡ã¯ããããããµã¤ãºãå°ããªå¤ã«è¨å®ãã大ããªãã¼ã¿ã»ããã®å¦çä¸ã«SGDã«è² æ ãããããªãããã«ãã¦ããã ãã以éã®ãã¥ã¼ããªã¢ã«ã§ã¯ãCNTKã®æé©åå¨ãå©ç¨ãã¦å®éã®ãã¼ã¿ã»ããã§è¨ç®å¹çãé«ããææ³ã«ã¤ãã¦ãç´¹ä»ããã
# Compute the moving average loss to smooth out the noise in SGD plotdata["avgloss"] = moving_average(plotdata["loss"]) plotdata["avgerror"] = moving_average(plotdata["error"]) # Plot the training loss and the training error import matplotlib.pyplot as plt plt.figure(1) plt.subplot(211) plt.plot(plotdata["batchsize"], plotdata["avgloss"], 'b--') plt.xlabel('Minibatch number') plt.ylabel('Loss') plt.title('Minibatch run vs. Training loss') plt.show() plt.subplot(212) plt.plot(plotdata["batchsize"], plotdata["avgerror"], 'r--') plt.xlabel('Minibatch number') plt.ylabel('Label Prediction Error') plt.title('Minibatch run vs. Label Prediction Error') plt.show()
è©ä¾¡ã»ãã¹ã
ãã¦ããããã¯ã¼ã¯ã®ãã¬ã¼ãã³ã°ãå®äºããããã¬ã¼ãã³ã°ããããããã¯ã¼ã¯ã«ããã¬ã¼ãã³ã°ã§ä½¿ç¨ãã¦ããªãã£ããã¼ã¿ãå©ç¨ãã¦è©ä¾¡ãè¡ã£ã¦ã¿ãã
ããã¯ãã°ãã°ãã¹ãã¨å¼ã°ããã
æ°ãããã¼ã¿ã»ããã使ç¨ãã¦ãããã¯ã¼ã¯ã®è©ä¾¡ãè¡ããå¹³åã¨ã©ã¼ã¨æ失ãè©ä¾¡ããã
ããã«ã¯ãtrainer.test_minibatch
ã使ç¨ããã
# Generate new data test_minibatch_size = 25 features, labels = generate_random_data_sample(test_minibatch_size, input_dim, num_output_classes) trainer.test_minibatch({input : features, label : labels})
0.04
ãã®ã¨ã©ã¼çã¯ããã¬ã¼ãã³ã°ä¸ã®ã¨ã©ã¼çã¨æ¯è¼å¯è½ã§ããããã®ãããã¯ã¼ã¯ã¯ããã¾ã§ã«è¦³æ¸¬ãããã¨ã®ãªããã¼ã¿ã«å¯¾ãã¦ãé常ã«å¹æçã«åä½ããã¨ãããã¨ãåãã£ãã ããã¯ãªã¼ããã£ããã£ã³ã°ã¨ããç¾è±¡ãé¿ããããã®ã«ã®ã¨ãªããã®ã§ããã
ããã¾ã§ç§ãã¡ã¯ã¨ã©ã¼ã®æ¸¬å®ãè¡ãéè¨ãã¦ããã å ¨ã¦ã®è¦³æ¸¬ãã¼ã¿ã«ããã¦ãè©ä¾¡é¢æ°ã¯ãã¹ã¦ã®ã¯ã©ã¹ã«ããã¦ãã®ç¢ºçãè¿ããã®ã«ãªã£ã¦ããã æ¬ãã¥ã¼ããªã¢ã«ã§ããã©ã«ãã®ãã©ã¡ã¼ã¿ã使ç¨ããå ´åãå観測ãã¼ã¿ã«ã¤ãã¦2è¦ç´ ã®ãã¯ãã«ãè¿ãããã«ãªã£ã¦ããã ãããã¯ã¼ã¯ã®åºåãsoftmaxé¢æ°ã«æ¸¡ããã¨ã«ãããããè¨ç®ãã¦ããã
ä½æ ãããã¯ã¼ã¯ã®åºåãsoftmaxé¢æ°ã«æ¸¡ãå¿ è¦ãããã®ã ããã?
ãããã¯ã¼ã¯ãæ§æããæ¹æ³ã«ã¯ãæ´»æ§åãã¼ãã®åºåãå«ãã§ãã(ä¾ãã°ãå³4ã®ç·è²ã®ã¬ã¤ã¤)ã åºåãã¼ã(å³4ã®ãªã¬ã³ã¸è²ã®ãã¼ã)ã¯æ´»æ§åãã¼ã¿ã確çã«å¤æãã¦ããã ç°¡åã§å¹æçãªæ¹æ³ã¯ãæ´»æ§åãã¼ã¿ãsoftmaxé¢æ°ã«æ¸¡ããã¨ã§ããã
# Figure 4 Image(url="http://cntk.ai/jup/feedforward_network.jpg", width=200, height=200)
out = C.softmax(z) predicted_label_probs = out.eval({input : features}) print("Label :", [np.argmax(label) for label in labels]) print("Predicted:", [np.argmax(row) for row in predicted_label_probs])
Label : [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0] Predicted: [1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0]