PHP-ML ã¨ããæ©æ¢°å¦ç¿ã©ã¤ãã©ãªã使ã£ã¦ãMNIST ã®ææ¸ãæ°åèªèã試ãã¦ã¿ã¾ããã11 æ 29 æ¥ã«éå¬ããã 第120回 PHP勉強会 ã® LT ã§çºè¡¨ãã¾ããã®ã§ãã¹ã©ã¤ããå
¬éãã¾ãã
åå¼·ä¼ã® LT çºè¡¨ã§ã¯ãå¦ç¿ããã¢ãã«ã§ææ¸ãæ°åã®èªèã試ããããã«ãç°¡åãªãã¢ãä½æãã¦ç´¹ä»ãã¾ãããä¸è¨ã®ã¦ã§ããµã¤ãã§ããã¦ã¹ã§é©å½ãªæ°åãæãã¦ãå®è¡ããã¿ã³ãã¯ãªãã¯ããã¨ãå¤å¥çµæã表示ããã¾ãã
手書き数字認識デモ
PHP-ML ã¯ãPHP ã§å®è£
ãããæ©æ¢°å¦ç¿ã©ã¤ãã©ãªã§ããç¾æç¹ã§ãåé¡ãå帰ãã¯ã©ã¹ã¿ãªã³ã°çã®ä»£è¡¨çãªã¢ã«ã´ãªãºã ãããã¤ãå®è£
ããã¦ãã¾ããã»ã¼ãã¹ã¦ PHP ã§å®è£
ããã¦ãããã*1ãå¦çé度ãã¡ã¢ãªæ¶è²»éã®ä¸¡é¢ã§å¤§è¦æ¨¡ãã¼ã¿ã§ã®å®ç¨ã¯é£ããã¨ããå°è±¡ã§ãããç°¡åãªåé¡ã«é©ç¨ãã¦éãã§ã¿ãããã³ã¼ããèªãã§åå¼·ãã¦ã¿ããããã«ã¯è¯ãããããã¾ããã
PHP-ML - Machine Learning library for PHP
ä»åã¯ããã¸ã¹ãã£ãã¯å帰ã MNIST ãã¼ã¿ã»ããã«é©ç¨ãã¦ãã¢ãã«å¦ç¿ãå¤å¥ãè¡ãã¾ãããå¤ã¯ã©ã¹åé¡ã«ã¤ãã¦ã¯ one-vs-rest ã®ææ³ãçµã¿è¾¼ã¾ãã¦ãããå©ç¨å´ã§ã®å¯¾å¿ã¯ä¸è¦ã§ããä¸æ¹ãç¾æç¹ã§ã¯ãã¸ã¹ãã£ãã¯å帰ã®å®è£
ã«ããã¤ããã°ãããããã§ãæ£ããåä½ãããããã«ã¯ããã¤ãã®ä¿®æ£ãå¿
è¦ã§ãããä¿®æ£ããç¶æ
ã®ã³ã¼ãã¯ç§ã®ãªãã¸ããªã«ããã¾ãã
[2017-12-05 追è¨] master ãã©ã³ãã§ä¿®æ£æ¸ã¿ã§ãã
åå¼·ä¼ã® LT ã§ã¯ãæ®å¿µãªããå®è£ å 容ã¾ã§ç«ã¡å ¥ã£ã¦èª¬æãããã¨ãã§ããªãã£ãã®ã§ããã®è¨äºã§è©³ãã説æãã¾ãã
MNIST ãã¼ã¿ã»ããã PHP-ML ã®å ¥åå½¢å¼ã«å¤æãã
ã¾ããä»åå©ç¨ãã MNIST ã®ãã¼ã¿ã»ãããã¦ã§ããµã¤ããããã¦ã³ãã¼ããã¾ãã
MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges
PHP-ML ã§ã¯ãCsvDataset ã¯ã©ã¹ãç¨ãã¦ãç¹å¾´éã¨ã©ãã«ãããªã CSV å½¢å¼ã®ãã¼ã¿ã»ãããèªã¿è¾¼ãã¾ãã
CSV Dataset - PHP-ML - Machine Learning library for PHP
ããã§ããã¦ã³ãã¼ããã MNIST ã®ãã¼ã¿ã»ãããå±éãã¦ããã®å½¢å¼ã® CSV ãã¡ã¤ã«ã«å¤æãã¾ãã次ã®ãããªã¹ã¯ãªããã§å¤æã§ãã¾ãããã®ãããã®è©³ç´°ã¯éå»ã®è¨äºã§ç´¹ä»ãã¦ãã¾ããåãã¯ã»ã«ã®å¤ã¯ 0 ãã 255 ã®ç¯å²ã«ããã®ã§ã255 ã§å²ã£ã¦ 0 ãã 1 ã®æµ®åå°æ°ç¹æ°ã«å¤æãã¦ããã¾ãã
MNIST 手書き数字データを画像ファイルに変換する - y_uti のブログ
#!/bin/bash for l in train t10k; do paste -d' ' \ <(gzip -dc $l-images-idx3-ubyte.gz | od -An -v -tu1 -j16 -w784) \ <(gzip -dc $l-labels-idx1-ubyte.gz | od -An -v -tu1 -j8 -w1) \ | awk -F, '{ for (i = 1; i < NF; i++) printf("%f,", $i / 255); printf("%d\n", $NF); }' >$l.csv done
å¤æçµæã¯ãåè¡ã 785 åã® CSV ãã¡ã¤ã«ã«ãªãã¾ããå é ãã 784 åç®ã¾ã§ã¯ã28x28 ã®ã°ã¬ã¼ã¹ã±ã¼ã«ç»åã«ãããåãã¯ã»ã«ã®å¤ã表ãã¾ããæå¾ã®åã¯ããã®ç»åã 0 ãã 9 ã®ã©ã®æ°åã§ãããã表ãã¾ãã
ã¢ãã«ãå¦ç¿ãã
ã¢ãã«å¦ç¿ãè¡ãããã°ã©ã ã¯ã以ä¸ã®ããã«å®è£ ã§ãã¾ã*2ã
<?php require_once __DIR__ . '/vendor/autoload.php'; use \Phpml\Classification\Linear\LogisticRegression; use \Phpml\Dataset\CsvDataset; use \Phpml\ModelManager; // å¦ç¿ç¨ã®ãã¼ã¿ã»ãããèªã¿è¾¼ã $dataset = new CsvDataset('train.csv', 784, false); // ã¢ãã«ãå¦ç¿ãã $classifier = new LogisticRegression(100, false, LogisticRegression::BATCH_TRAINING); $classifier->train($dataset->getSamples(), $dataset->getTargets()); // å¦ç¿ã«ãã£ã¦å¾ãããã¢ãã«ãä¿åãã $modelManager = new ModelManager(); $modelManager->saveToFile($classifier, 'model.dat');
æåã«ãCSV ãã¡ã¤ã«ãããã¼ã¿ã»ãããèªã¿è¾¼ã¿ã¾ããå é ã®äºã¤ã®å¼æ°ã¯å¿ é ã§ããã¡ã¤ã«åã¨ç¹å¾´éã®æ¬¡å æ°ãæå®ãã¾ãã第ä¸å¼æ°ã® false ã¯ãCSV ãã¡ã¤ã«ããããè¡ãå«ã¾ãªããã¨ãæå®ãã¾ãã
å¦ç¿å¦çã®æ¬ä½ã§ã¯ãLogisticRegression ã¯ã©ã¹ã®ã¤ã³ã¹ã¿ã³ã¹ãçæãã¦ãå¦ç¿ãå®è¡ãã¾ããã³ã³ã¹ãã©ã¯ã¿ã®å¼æ°ã«ãæ大ã®å復åæ°ããã¼ã¿ã®æ£è¦åã®æç¡ãå¦ç¿æ¹æ³ãæå®ãã¦ãã¾ãã第ä¸å¼æ°ã«ã¯ ONLINE_TRAINING, CONJUGATE_GRAD_TRAINING ãæå®ã§ãã¾ãããç§ãä»å試ããéãã§ã¯ CONJUGATE_GRAD_TRAINING ã¯ä¸æãåãã¾ããã§ããããã®ã»ãã«ã第åå¼æ°ä»¥éã§ã³ã¹ãé¢æ°ãæ£ååé
ãæå®ã§ãã¾ãã詳細ã¯ã½ã¼ã¹ã³ã¼ãã®ã³ã¡ã³ããåç
§ãã¦ãã ããã
php-ml/LogisticRegression.php at 0.5.0 · php-ai/php-ml · GitHub
å¦ç¿æ¸ã¿ã®ã¢ãã«ã¯ãModelManager ã¯ã©ã¹ãå©ç¨ãã¦ãã¡ã¤ã«ã«ä¿åã§ãã¾ããã¢ãã«ãä¿åãã¦ãããã¨ã§ãã¢ãã«å¦ç¿å¦çã¨å¦ç¿çµæãç¨ããå¤å¥å¦çãåé¢ã§ãã¾ãã
ãã®ããã«ç°¡åãªããã°ã©ã ã§ã¢ãã«å¦ç¿ãè¡ãã¾ããããããå®éã«åããã«ã¯å¤§ããªè¨ç®ãªã½ã¼ã¹ãå¿ è¦ã§ããç§ã試ããã¨ããã§ã¯ãPHP 7.2.0 RC5 㧠memory_limit=1024M ã¨ããç°å¢ã§ train.csv ã¯ã¡ã¢ãªä¸è¶³ã«ãªãã¾ããã
å¤å¥ãã
å¦ç¿æ¸ã¿ã®ã¢ãã«ãå©ç¨ãã¦ãã¹ããã¼ã¿ãå¤å¥ããå¦çã¯ã次ã®ããã«å®è£ ã§ãã¾ããå¤å¥å¦çã®æ¬ä½ã¯ããã°ã©ã ã®ååé¨åã§ãå¾åã¯å¤å¥ç²¾åº¦ãè©ä¾¡ããå¦çã§ãã
<?php require_once __DIR__ . '/vendor/autoload.php'; use \Phpml\Classification\Linear\LogisticRegression; use \Phpml\Dataset\CsvDataset; use \Phpml\Metric\Accuracy; use \Phpml\Metric\ConfusionMatrix; use \Phpml\ModelManager; // å¤å¥å¯¾è±¡ã®ãã¼ã¿ã»ãããèªã¿è¾¼ã $dataset = new CsvDataset('t10k.csv', 784, false); // å¦ç¿æ¸ã¿ã®ã¢ãã«ãèªã¿è¾¼ã $modelManager = new ModelManager(); $classifier = $modelManager->restoreFromFile('model.dat'); // å¤å¥ãã $predicted = $classifier->predict($dataset->getSamples()); // æ£è§£çãè¨ç®ãã $accuracy = Accuracy::score($dataset->getTargets(), $predicted); echo 'Accuracy = ', $accuracy * 100, "%\n"; // æ··åè¡åãè¨ç®ãã $confusionMatrix = ConfusionMatrix::compute($dataset->getTargets(), $predicted); foreach ($confusionMatrix as $row) { array_walk($row, function ($col) { printf('%4d ', $col); }); echo "\n"; }
æåã«ãCSV ãã¡ã¤ã«ãããã¼ã¿ã»ãããèªã¿è¾¼ã¿ã¾ããããã¯ã¢ãã«å¦ç¿ã¨å¤ããã¾ããã次ã«ãModelManager ã¯ã©ã¹ãå©ç¨ãã¦ããã¡ã¤ã«ã«ä¿åãã¦ããã¢ãã«ãèªã¿è¾¼ã¿ã¾ããããããå©ç¨ãã¦ãLogisticRegression ã¯ã©ã¹ã® predict ã¡ã½ããã§å¤å¥å¦çãå®è¡ãã¾ããå¤å¥çµæã¨ãã¦å¾ããã $predicted ã¯ãåè¦ç´ ã 0 ãã 9 ã®ããããã®å¤ãæã¤é åã§ãããã¼ã¿ã»ããã®åãµã³ãã«ã«å¯¾ããå¤å¥çµæã表ãã¾ãã
ããã°ã©ã ã®å¾åã§ã¯ãAccuracy ã¯ã©ã¹ã¨ ConfusionMatrix ã¯ã©ã¹ãç¨ãã¦ãæ£è§£çãæ··åè¡åãè¨ç®ãã¦è¡¨ç¤ºãã¾ãããã®ãããªææ¨ã«ãã£ã¦ãå¦ç¿ãããã¢ãã«ã®è¯ãæªããè©ä¾¡ã§ãã¾ãã
Accuracy - PHP-ML - Machine Learning library for PHP
Confusion Matrix - PHP-ML - Machine Learning library for PHP
å®è¡ä¾
以ä¸ã®ããã°ã©ã ãå®è¡ããæ§åã示ãã¾ããç§ã®ç°å¢ã§ã¯ 60,000 æåã®ãã¼ã¿ãå¦ç¿ã§ããã ãã®ãªã½ã¼ã¹ãç¡ãã£ããããCSV ã®å é ãã 10,000 è¡ãåãåºãã¦å¦ç¿ãã¼ã¿ã¨ãã¾ãã
$ mv train.csv train-all.csv $ head -n 10000 train-all.csv >train.csv
ä½æããããã°ã©ã ãç¨ãã¦ã¢ãã«å¦ç¿ãè¡ããç¶ã㦠t10k.csv ãå¤å¥ãã¾ãã
$ php train.php $ php predict.php Accuracy = 84.95% 953 0 3 6 1 4 10 2 1 0 0 1105 14 3 0 6 4 0 3 0 5 9 905 40 11 6 20 21 12 3 5 1 25 916 3 42 3 10 2 3 1 1 11 2 926 4 10 6 1 20 15 4 5 41 16 780 14 7 6 4 8 3 14 2 11 25 892 3 0 0 6 9 39 7 5 5 1 945 0 11 14 14 109 94 18 285 19 28 388 5 9 7 10 24 121 55 1 97 0 685
æ£è§£ç㯠84.95% ã¨ãªãã¾ãããæ··åè¡åã¯ãåè¡ãæ£è§£ã©ãã«ãååãå¤å¥çµæã表ãã¾ãããããã 0 ãã 9 ã®é ã§ãããã¨ãã° 2 è¡ 3 åã® 14 ã¨ããå¤ã¯ãå®é㯠1 ã§ããã 2 ã¨å¤å¥ããããã®ã 14 ãµã³ãã«åå¨ããã¨ããæå³ã§ãã対è§ç·ä¸ã¯å®éã®æ°åã¨å¤å¥çµæãä¸è´ãã¦ãããã®ã§ããããã¯å¤å¥ã«æåãããã¨ã«ãªãã¾ãã対è§ç·ä¸ã®æ°å¤ãåè¨ãã㨠8,495 ã¨ãªããå ¨ä½ã§ 10,000 ãµã³ãã«ãªã®ã§æ£è§£ç 84.95% ã«ãªãã¾ããæ··åè¡åãè¦ããã¨ã§ãå ¨ä½ã®æ£è§£çã ãã§ã¯ãªã "1" ã¯è¯ãå¤å¥ã§ãã¦ããã "8" ã¯æ¥µç«¯ã«å¤å¥ç²¾åº¦ãä½ãã¨ãã£ãæ å ±ãèªã¿åãã¾ãã
ãããããå¦ç¿
ã¢ãã«å¦ç¿ã®éã« train ã¡ã½ããã«ä»£ã㦠partialTrain ã¡ã½ãããç¨ããã¨ãããã¾ã§ã®å¦ç¿çµæãåæç¶æ ã¨ãã¦å¦ç¿ãç¶ãããã¨ãã§ãã¾ãããã®æ©è½ãå©ç¨ãã¦ãããããå¦ç¿ã試ãã¦ã¿ã¾ããã¾ããtrain.csv ãé©å½ã«åå²ãã¾ããããã§ã¯ split ã³ãã³ããç¨ã㦠1,000 è¡ãã¤åå²ãã¾ãããtrain-00 ãã train-59 ã¾ã§ã® 60 ãã¡ã¤ã«ãçæããã¾ãã
$ split -l 1000 -d train.csv train-
ãããããå¦ç¿ã®å®è£ ã¯ä»¥ä¸ã®ã¨ããã§ããæåã« LogisticRegression ã¯ã©ã¹ã®ã¤ã³ã¹ã¿ã³ã¹ãçæãã¦ãäºéã®ã«ã¼ãã§å¦ç¿ãè¡ãã¾ããå å´ã®ã«ã¼ãã§ã¯ã1,000 ãµã³ãã«ãã¤ã«åå²ããåãã¼ã¿ã»ãããé ã«èªã¿è¾¼ãã§ã¢ãã«å¦ç¿ãè¡ãã¾ããããã 10 åç¹°ãè¿ãã¦å ¨ä½ã®å¦ç¿ã¨ãã¦ãã¾ãã
<?php require_once __DIR__ . '/vendor/autoload.php'; use \Phpml\Classification\Linear\LogisticRegression; use \Phpml\Dataset\CsvDataset; use \Phpml\ModelManager; $classifier = new LogisticRegression(1, false, LogisticRegression::BATCH_TRAINING); for ($n = 0; $n < 10; ++$n) { for ($k = 0; $k < 60; ++$k) { $dataset = new CsvDataset(__DIR__ . '/train-' . sprintf('%02d', $k), 784, false); $classifier->partialTrain($dataset->getSamples(), $dataset->getTargets()); } } $modelManager = new ModelManager(); $modelManager->saveToFile($classifier, 'model-minibatch.dat');
å®è¡ä¾ã¯ä»¥ä¸ã®ã¨ããã§ã*3ããããããå¦ç¿ã§ã¯å ¨ä½ã¨ã㦠60,000 ãµã³ãã«ãå¦ç¿ãã¦ãããããããé«ã精度ãå¾ããã¨ãã§ãã¾ããã
$ php train-minibatch.php $ php predict-minibatch.php Accuracy = 91.08% 961 0 0 1 0 2 10 2 4 0 0 1102 2 2 1 2 4 1 21 0 11 11 886 19 12 6 14 15 50 8 7 0 17 909 2 27 5 11 25 7 4 3 5 1 908 2 9 1 13 36 11 4 2 28 11 772 20 7 28 9 14 3 2 2 11 18 902 1 5 0 5 14 20 5 9 3 3 935 4 30 9 10 6 16 14 35 12 9 857 6 10 9 2 13 47 13 1 25 13 876
*1:SVM ã«é¢ãã¦ã¯ libsvm ã®ãã¤ããªãå«ã¾ãã¦ããããããå®è¡ããæ¹å¼ã«ãªã£ã¦ãã¾ãã
*2:PHP-ML 㯠composer ã§ã¤ã³ã¹ãã¼ã«ã§ããã®ã§ãvendor ãã£ã¬ã¯ããªé ä¸ã«ç½®ããã¦ãããã¨ãåæã¨ãã¦ãã¾ããå¿ è¦ã«å¿ã㦠require_once ãä¿®æ£ãã¦ãã ããã
*3:predict-minibatch.php ã¯ãå©ç¨ããã¢ãã«ãã¡ã¤ã«åãå¤æ´ããç¹ãé¤ãã°åæ²ã®ããã°ã©ã ã¨ã¾ã£ããåãã§ãã