æ¨æ¥ã®è¨äºã®ç¶ãã
ã¾ãã¯ãLasagne ã®ãããªãã¤ã³ããç´¹ä»ããªããã³ã¼ãã®èª¬æã
ãã®ãã¨ããã£ã¼ãã©ã¼ãã³ã°ï¼ããã£ã¼ããã¥ã¼ã©ã«ãããã¯ã¼ã¯ï¼ãã©ã¡ãã®å¼ã³åãæ£ããã®ãããã§ã¦ã«èªä¿¡ãªãããè¦ã¯ç³ã¿è¾¼ã¿ï¼ max-pooling ï¼ããããã¢ã¦ããç¹ã交ããã¢ãã«ãå¦ç¿ããããã¨ããã¨ãããªãã¤ã³ããå¢ããã®ã§ããã®ãããã®æ³¨æç¹ã¨ãµã³ãã«ã³ã¼ãã
def digits_dataset(test_N = 400): import sklearn.datasets data = sklearn.datasets.load_digits() numpy.random.seed(0) z = numpy.arange(data.data.shape[0]) numpy.random.shuffle(z) X = data.data[z>=test_N, :] y = numpy.array(data.target[z>=test_N], dtype=numpy.int32) test_X = data.data[z<test_N, :] test_y = numpy.array(data.target[z<test_N], dtype=numpy.int32) return X, y, test_X, test_y
ãã¼ã¿ã»ããã¥ããã
ããããããªã³ã¼ãã ããããã«ãã§ã«ä¸ã¤ãããªãã¤ã³ããé ããã¦ããã
ç¹å¾´é X ã®æ¹ã¯ç¹çãããã¨ã¯ãªããã¬ã³ã¼ãæ°Ã次å
ã®ã·ã³ãã«ãª ndarray ã§ããã®ã ãã
Lasagne ã®åºå層ã®éç·å½¢é¢æ°ã« softmax ãæå®ããå ´åãåºå層ã«ä¸ããæ£è§£ã©ãã« y ã®åã¯æé»ã« int32 ãè¦æ±ãããã
ãã£ããããã«æ®éã® int ã® ndarray ã¨ã渡ãããããã¨ã
TypeError: ('Bad input argument to theano function with name "****.py:**" at index 1(0-based)', 'TensorType(int32, vector) cannot store a value of dtype int64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to int32, or 2) set "allow_input_downcast=True" when calling "function".', array([...
ã¨æããã¦ãã©ãã®ãã¨ãæãã¦ããã®ãè¦å½ã¤ããæ©ããã¡ã«ãªãã
ãã®ã¨ã©ã¼ãåºãã¨ãã¯ãæ£è§£ã©ãã« y ã numpy.array(*, dtype=numpy.int32) ã§ãããã¨ããã
#### model n_classes = 10 batch_size=100 l_in = lasagne.layers.InputLayer( shape=(batch_size, input_dim), ) l_hidden1 = lasagne.layers.DenseLayer( l_in, num_units=512, nonlinearity=lasagne.nonlinearities.rectify, ) l_hidden2 = lasagne.layers.DenseLayer( l_hidden1, num_units=64, nonlinearity=lasagne.nonlinearities.rectify, ) model = lasagne.layers.DenseLayer( l_hidden2, num_units=n_classes, nonlinearity=lasagne.nonlinearities.softmax, )
ã¢ãã«ã®å®ç¾©ã¯å ¥å層 lasagne.layers.InputLayer ããå§ãã¦ãlasagne.layers.* ã®ç¬¬ä¸å¼æ°ã«åã®å±¤ãæå®ããªããã¤ãªãã§ãããæå¾ã«é©å½ãªåºå層ã宣è¨ããã¨ãããããã®ã¾ã¾ã¢ãã«ã®åç §ã¨ãªãã
å
¥å層㯠shape ã§å
¥å層ã®æ¬¡å
ãæå®ããã®ã¯èªç¶ã ããä¸åº¦ã«å
¥åãããã¼ã¿ã®ã¬ã³ã¼ãæ°ããã®ã¢ãã«ãå®ç¾©ãã段éã§æå®ããå¿
è¦ããããããããããªãã¤ã³ããã®â¦â¦ããã¤ç®ã ã£ãï¼
ããããå®è£
ä¸ã®é½åã ããããé©å½ãª batch_size ã決ãã¦ãå
¥åãã¼ã¿ã batch_size 件ãã¤ã«åãã¦å¦ç¿ãåãã®ã Lasagne æµã«ãªãããã㧠shape ã®å¤ã«ãå
¥åãã¼ã¿ã®å
¨ãã¼ã¿ä»¶æ°ãæå®ããã¨ãå¦ç¿ã®ã¨ããã§ãã¼ã¿ãåå²ãã¦åãå¿
è¦ããªããªã£ã¦ã³ã¼ãã®ä¸é¨ããã£ããããã®ã ããï¼ããããï¼ææ¥éä¸æ³ç¸å½ã«ãªãã足ããã¡ããã¡ãé
ããªãã
ã¨ããããã§é©å½ãª batch_size ãæå®ããå¿
è¦ãããããã¾ãå°ããã¨é度ãè½ã¡ãããå¦ç¿ãè½ã¡çããã« loss ã accuracy ãè·³ãã¾ããã大ããã¨è¶³ãé
ããªãããä½ãã®ãã¼ã¿ãç¡é§ã«ãªãã
åå大ãããã¼ã¿ãªã 500 åå¾ãã»ã©ããå°è±¡ï¼ mnist.py 㯠batch_size=600 ã«ãªã£ã¦ãã)ããã®ãµã³ãã«ã³ã¼ãã§ã¯ããã¼ã¿ã 2000件ã«ãæºããªãã®ã§ batch_size=100 ã«ãã¦ããã
åºå層ã¯ããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã§ä½ãä½ããããã«ãã£ã¦å¤ããã ãããããã¡ã°ãä¸è¬çãªä»ã¯ã©ã¹åé¡å¨ãªãããµã³ãã«ã«ããéã lasagne.layers.DenseLayer ã使ã£ã¦ã num_units ã«ã¯ã©ã¹æ°ãã nonlinearity ã« lasagne.nonlinearities.softmax ãæå®ããã°ããã
ã¯ã©ã¹æ°ã¯ y.max()+1 ã¨ããã¦ãããã£ããã©ãããããã£ã¦ãã®ã§ãªãã©ã«æ¸ãã¡ãã£ãã
#### loss function objective = lasagne.objectives.Objective(model, loss_function=lasagne.objectives.categorical_crossentropy) X_batch = T.matrix('x') y_batch = T.ivector('y') loss_train = objective.get_loss(X_batch, target=y_batch)
å®ç¾©ãããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã¢ãã«ããç®çé¢æ°ãçæãã¦ãããã¨ããã Lasagne ã®ç骨é ã§ããã
ããããã¨ãããããã® Objective ãã¤ã³ã¹ã¿ã³ã¹åããã¾ã§ã®ã©ããã®ã¿ã¤ãã³ã°ã«ããã©ã¡ã¼ã¿ãæ ¼ç´ãã SharedVariable ãç¨æãã¦ããã¦ããã
loss_train 㯠Theano ã® expression ã«ãªã£ã¦ããã®ã§ãtheano.function ã«é£ãããã°å®è¡ã³ã¼ãã«ã³ã³ãã¤ã«æ¸ã¿ã®é¢æ°ãå¾ããããTheano ããã¼ã
ãã¨ã¯ããã® loss_train ã使ã£ã¦ãå¿
è¦ãªé¢æ°ãå¦çãä½ã£ã¦ãããã¨ã«ãªãã
#### update function learning_rate = 0.01 momentum = 0.9 all_params = lasagne.layers.get_all_params(model) updates = lasagne.updates.nesterov_momentum( loss_train, all_params, learning_rate, momentum) #### training train = theano.function( [X_batch, y_batch], loss_train, updates=updates )
Lasagne ã«ãããå¦ç¿ã¯ãTheano ã® updates ã®ä»çµã¿ã使ã£ã¦ãã©ã¡ã¼ã¿ãæ´æ°ããã
updates ã«æ¸¡ãæ´æ°é¢æ°ããé©å㪠lasagne.updates.* ãå¼ã¹ã° Lasagne ãä½ã£ã¦ãããã
Lasagne ã«ã¯åç´ãª SGD ãç¨æããã¦ããããlasagne.updates.nesterov_momentum 㯠Nesterov Moment ã使ã£ã SGD ã«ãªããããã¯ãããã©ã¡ã¼ã¿ãåååãããæ¹åã«ãã°ããåãã ï¼ å¾é
ãåãç¹ãåååãããæ¹åã«å°ãããããã¨ãããã®ãDNN åãã« SGD ãæ¹è¯ãããã®ã§ãåæãéããªãâ¦â¦ã®ããªï¼ãmnist.py ãããã使ã£ã¦ããã®ã§ãããã®ãµã³ãã«ãããã«å£ã£ãã
lasagne.layers.get_all_params ã¯ãååã®æå¾ã«ããã©ãã¨åºã¦ãããããã©ã¡ã¼ã¿ãæ ¼ç´ãã SharedVariable ã®ãªã¹ããè¿ããã®ã§ããã
æ´æ°é¢æ°ã¯ãã¡ãã対象ã¨ãªããã®ãã©ã¡ã¼ã¿ãã¡ãç¥ããªããã°ãªããªãã®ã§å¿
è¦æ§ã¯ãããã®ã ãããããã¦ã¼ã¶ãæ¸ããªããã°ãªããªãã¨ããã«ã¯ç´å¾ãããªãï¼è¦ç¬ï¼ã
å¦ç¿çã Nesterov Moment ã® moment ãã©ã¡ã¼ã¿ãåºå®ã§ä¸ãããã¦ããã®ã¯éåæããããããããªãã
ããã T.dscalar ã«ãã¦ãtheano.function ã®å¼æ°ã§ä¸ããããã«ããã°å¯å¤ã«ãã§ãããã¨ã¯ç¢ºèªãããããç SGD ããå¶å¾¡ãé£ãããçµæããã¾ãæ¹åã§ããªãã£ãã®ã§ãmnist.py ã¨åãããã«åºå®ã§ä¸ãã¦ããã
mnist.py ã§ã¯ããã® train é¢æ°ã¯ãã¼ã¿ã SharedVariable ã«ç½®ã㦠givens ã§æ¸¡ãå½¢ã§æ¸ããã¦ãããããã¦å¼æ°ã¯ãã®ãã¼ã¿ã®ä¸ã§ãã®å¼åºã§å¯¾è±¡ã¨ãªãç¯å²ãæãã¤ã³ããã¯ã¹ã®ã¿ãæå®ãããç¹ã« GPU ãç¨ããå ´åã¯ãã¼ã¿ã®è»¢éã³ã¹ããä¸çªé«ãããã§ããããã㨠mnist.py ã¨åãæ¹å¼ã®ã»ãã確å®ã«å¹çããã ããã
ãã®ãµã³ãã«ã§ã¯ç°¡ç´ ãªã³ã¼ããåªå
ãããã¨ã¨ãLasagne ã試ãã¦ããç°å¢ã GPGPU ãå©ç¨ã§ããªããã®ã ã£ãã®ã§(ç¬)ããã¼ã¿ãå¼æ°ã§æ¸¡ãã·ã³ãã«ãªå½¢ã«ããã
#### prediction loss_eval = objective.get_loss(X_batch, target=y_batch, deterministic=True) pred = T.argmax( lasagne.layers.get_output(model, X_batch, deterministic=True), axis=1) accuracy = T.mean(T.eq(pred, y_batch), dtype=theano.config.floatX) test = theano.function([X_batch, y_batch], [loss_eval, accuracy])
äºæ¸¬ã®ããã®é¢æ°ãå®ç¾©ãã¦ããã¨ããã
deterministic ã¨ãããã©ã¡ã¼ã¿ã¯ããããããDropout ãªã©ã®ãã¤ãºå±¤ã«ã¹ã«ã¼ã§æ¸¡ããã¦ãTrue ã®ã¨ãã«ã¯ã©ã³ãã ã«æ¨ã¦ãã®ããããï¼å¦ç¿æã®ã¿ããããã¢ã¦ãããï¼ã¨ããå¶å¾¡ã®ããã ã¨æãããã
ãã以å¤ã«ã¯ç¹ã«çåã¯ãªãã ããã
#### inference numpy.random.seed() nlist = numpy.arange(N) for i in xrange(100): numpy.random.shuffle(nlist) for j in xrange(N / batch_size): ns = nlist[batch_size*j:batch_size*(j+1)] train_loss = train(X[ns], y[ns]) loss, acc = test(test_X, test_y) print("%d: train_loss=%.4f, test_loss=%.4f, test_accuracy=%.4f" % (i+1, train_loss, loss, acc))
ããããæºåãå
¨ã¦çµãã£ã¦æ¨è«ã§ãããããbatch_size ãã¤ããè¨ç·´ã«ä¸ãããã¨ãã§ããªãã®ã§ãããã§ãèªåã§ã³ã¼ããã¡ããã¨æ¸ãå¿
è¦ããããã¨ãã£ã¦ããèªãã°ãããã¬ãã«ãªã®ã§å¤§ä¸å¤«ã ããã
ãã£ããå¼æ°ã§ãã¼ã¿ã渡ãã®ã ããã渡ããã¼ã¿ã®é åºãã©ã³ãã ã«ãªãããã«ãããbatch_size ã«å¯¾ãã¦ä½ãåã¯ãã®ã¤ãã¬ã¼ã·ã§ã³ã§ã¯ä½¿ãããªãããã ããã©ã³ãã ã«ãã¦ãããã¨ã§ä½¿ãããªããã¼ã¿ã®åãããªãããã¨ãæå¾
ãã¦ããã
ãã¹ããã¼ã¿ã®è©ä¾¡ã®æ¹ã¯ batch_size ã«åããå¿
è¦ããªãã®ã§ãä¸çºå¼ã³åºãã§æ¸ãã§ããï¼ãã®ã¨ãã¯â¦â¦ï¼ã
以ä¸ããããä¸çªã·ã³ãã«ãª Lasagne ã®ä½¿ãæ¹ã
ã§ããLasagne ã§ãã£ã¨ãã£ã¼ãã©ã¼ãã³ã°ã£ã½ãç»åå¦çãããã¨ããå ´åã«ã¯ããäºæéãããå¿
è¦ã«ãªãã
ãµã³ãã«ã§ä½¿ã£ã¦ãããã¼ã¿ã»ãã digits 㯠8x8 ã®ç»åãªã®ã§ããããå
¥åã¨ããç³ã¿è¾¼ã¿ï¼ max-pooling ï¼ããããã¢ã¦ããç¹ã交ããã¢ãã«ã®ãµã³ãã«ã³ã¼ãããã¡ãã
import numpy import lasagne import theano import theano.tensor as T # dataset def digits_dataset(input_width, input_height, test_N = 400): import sklearn.datasets data = sklearn.datasets.load_digits() N = data.data.shape[0] X = data.data.reshape(N, 1, input_width, input_height) y = numpy.array(data.target, dtype=numpy.int32) numpy.random.seed(0) z = numpy.arange(data.data.shape[0]) numpy.random.shuffle(z) test_X = X[z<test_N] test_y = y[z<test_N] X = X[z>=test_N] y = y[z>=test_N] return X, y, test_X, test_y n_classes = 10 input_width = input_height = 8 X, y, test_X, test_y = digits_dataset(input_width, input_height) N = X.shape[0] test_N = test_X.shape[0] print(X.shape, test_X.shape) #### model batch_size=100 l_in = lasagne.layers.InputLayer( shape=(batch_size, 1, input_width, input_height), ) l_conv1 = lasagne.layers.Conv2DLayer( l_in, num_filters=8, filter_size=(3, 3), nonlinearity=lasagne.nonlinearities.rectify, W=lasagne.init.GlorotUniform(), ) l_pool1 = lasagne.layers.MaxPool2DLayer(l_conv1, pool_size=(2, 2)) l_hidden1 = lasagne.layers.DenseLayer( l_pool1, num_units=256, nonlinearity=lasagne.nonlinearities.rectify, ) l_hidden1_dropout = lasagne.layers.DropoutLayer(l_hidden1, p=0.2) l_hidden2 = lasagne.layers.DenseLayer( l_hidden1_dropout, num_units=64, nonlinearity=lasagne.nonlinearities.rectify, ) model = lasagne.layers.DenseLayer( l_hidden2, num_units=n_classes, nonlinearity=lasagne.nonlinearities.softmax, ) #### loss function objective = lasagne.objectives.Objective(model, loss_function=lasagne.objectives.categorical_crossentropy) X_batch = T.tensor4('x') y_batch = T.ivector('y') loss_train = objective.get_loss(X_batch, target=y_batch) #### update function learning_rate = 0.01 momentum = 0.9 all_params = lasagne.layers.get_all_params(model) updates = lasagne.updates.nesterov_momentum( loss_train, all_params, learning_rate, momentum) #### training train = theano.function( [X_batch, y_batch], loss_train, updates=updates ) #### prediction loss_eval = objective.get_loss(X_batch, target=y_batch, deterministic=True) pred = T.argmax( lasagne.layers.get_output(model, X_batch, deterministic=True), axis=1) accuracy = T.mean(T.eq(pred, y_batch), dtype=theano.config.floatX) test = theano.function([X_batch, y_batch], [loss_eval, accuracy]) #### inference numpy.random.seed() nlist = numpy.arange(N) for i in xrange(100): numpy.random.shuffle(nlist) for j in xrange(N / batch_size): ns = nlist[batch_size*j:batch_size*(j+1)] train_loss = train(X[ns], y[ns]) result = [] for j in xrange(test_N / batch_size): j1, j2 = batch_size*j, batch_size*(j+1) result.append(test(test_X[j1:j2], test_y[j1:j2])) loss, acc = numpy.mean(result, axis=0) print("%d: train_loss=%.4f, test_loss=%.4f, test_accuracy=%.4f" % (i+1, train_loss, loss, acc))
å ã®ãµã³ãã«ã³ã¼ãã¨ããä¼¼ã¦ããããç´°ããã¨ãããçµæ§éãã®ã§ããã®ããããä¸å¿ã«èª¬æã㦠Lasagne ãã¥ã¼ããªã¢ã«ãçµãããã
- 2次å ãã¼ã¿ãå ¥åã«ä½¿ãã¨ãã¯ã4次å ãã³ã½ã«ã§æ¸¡ãå¿ è¦ãããã digits_dataset() 㧠X ã reshape ãã¦ããè¡ãè¦ã¦ãããã°æã£åãæ©ããããã®å½¢ã (ãã¼ã¿ä»¶æ°, 1, 横次å , 縦次å ) ã¨ã2次å ç®ããªãã 1 ã§ãªãã¨ãããªãï¼çç±ã¯èª¿ã¹ã¦ãªãã Theano ã®å¶éï¼ï¼
- ã¢ãã«ã®å®ç¾©ã§ Conv2DLayer ã MaxPool2DLayer ã DropoutLayer ã§ç³ã¿è¾¼ã¿ã max-pooling ãããããã¢ã¦ããè¨è¿°ã§ããããããã¯ã³ã¼ããè¦ãã°ãããã¨æãã®ã§èª¬æç¥ã
- loss function ã®å®ç¾©ã§ãå ¥åãã¼ã¿ã表ãå¤æ° X_batch ãã1次å ã®ã¨ã㯠T.matrix ã ã£ããã2次å ã§ã¯ T.tensor4 ã¨ããã
- 1次å ã®ã¨ã㯠updates é¢æ°ã ãã batch_size ã®ãã°ãããã£ãã®ã ãã2次å ã§ã¯ loss function ãã®ãã®ã«ã batch_size ã®ãã°ããåã¶ããã«ãªããã¤ã¾ã test ãä¸çºå¼ã³åºããã§ããªããªãã®ã§ããã¡ãã§ãåå²ãã¦ã«ã¼ããã¦çµæãå¹³åãã¨ãã£ãå¦çãè¡ãå¿ è¦ãããããã¹ãã¯è¨ç·´ã¨éã£ã¦ã©ã³ãã ãµã³ããªã³ã°ããããã«ã¯ãããªãã®ã§ããã¹ããã¼ã¿ã®ãµã¤ãºã¯ batch_size ã®æ´æ°åã§ãããã¨ãæã¾ããã