CNNã«ããæåã³ã¼ãä¸æãªããã¥ã¡ã³ãã®æ¨å®
CNNã«ããæåã³ã¼ãä¸æãªããã¥ã¡ã³ãã®æ¨å®
Advent Calenderé å»ãã訳
- å¹´æ«å¿ãããã
- ãã¿ã¨æå¾ ãã¦ããããã¤ããã¾ã¨ãã«çµæãåºãã«è¦ããæãããã¦ãã
- å URLã®åªå¤±
ãã¤ãåããæåã³ã¼ãã£ã³ã°ãæ¨å®ãã
Twitterã§æã ããºããã¿ã¨ãã¦ãæ©æ¢°å¦ç¿ãããã»ã©ãã¦ã¯ããããã®ã«ãä»ã ã«Browserã¯æã æååãããExcelã¯UTF8ãçªã£è¾¼ãã¨æååãããå°åºãæåçã§æä½éã®äººæ¨©ãä¿è·ãããç¶æ ã§ã¯ããã¾ããã
å®éãã«ã¼ã«ãã¼ã¹ã§æ¨å®ãããã¨ããã¨ããã®æ§ã«shift jisã¨eucã§ã¯å®å ¨ã«èåãªæ å ±ã使ã£ã¦ããããã§ãªãã®ã§ããªãããã®ãã¥ã¼ãªã¹ãã£ãã¯ãªã«ã¼ã«ã人éãä½æãã¦å¯¾å¿ãã¦ããã®ã ã¨æãã¾ããããã®æ§ãªã¦ã¼ã¹ã±ã¼ã¹ã®å ´åãæ©æ¢°å¦ç¿ãå¼·ãåãçºæ®ãã¾ãã
ãã®åº¦ãããããã§ããã¨æãããã£ã¦è¨ãè¿ãã¦ããã®ã§ãããå®è¨¼å®é¨ãè¡ãããã¨æãã¾ãã
ãªãã®æ©æ¢°å¦ç¿ã¢ã«ã´ãªãºã ãããã
ãã¥ã¼ã¹ãµã¤ããã¹ã¯ã¬ã¤ãã³ã°ããã¨ã大éã®UTF8ã®ããã¹ãæ å ±ãåå¾ã§ãã¾ã
ãã®ããã¹ãæ å ±ããã¨ã«ãnkfã¨ããã³ãã³ãã§ãeuc, sjisã®æåã³ã¼ãã«å¤æãã¦ãæ§ã ãªæåã³ã¼ãã®ãã¼ã¸ã§ã³ãä½ãã¾ã
Pythonãããã¤ãã®è¨èªã§ã¯ãUTF8以å¤ãæ±ãã¨ãã°ãã®ã§ããããã¤ãåã¨ãã¦ã¿ãªãã¨èªã¿è¾¼ã¿ãå¯è½ã«ãªãããã¤ãåã«ã¯ãªãããã®ç¹å¾´ãè¦ã¦åãããã§ãï¼ä»®èª¬ï¼
ãã¤ãåããã¯ãã«åãã¦ãCNNã®ããã¹ãåé¡ã®æ©æ¢°å¦ç¿ã§åé¡ãããã¨ãè¯ãããã§ã
ãããã¯ã¼ã¯
VGGã®ãããã¯ã¼ã¯ãåèã«ç·¨éãã¾ããã
ç®çé¢æ°
å¾®å¦ãªå¤æçµæã«ãªã£ãå ´åã確çãæ£ããåºåãããã®ã§ãsotfmaxã§ã¯ãªãã3ã¤ã®sigmoidãåºåãã¦ãããããã®binary cross entropyãæ失ã¨ãã¦ãã¾ã
åºåã®è§£éæ§ãè¯ãã®ã§å人çã«ãã使ããã¯ããã¯ã§ã
ã³ã¼ã
å ¨ä½ã®ã³ã¼ãã¯githubã«ããã¾ã
CBRDã¨ããé¢æ°ã¯osciiartããã®ä½ãæ¹ãåèã«ããã¦ããã ãã¾ãã
def CBRD(inputs, filters=64, kernel_size=3, droprate=0.5): x = Conv1D(filters, kernel_size, padding='same', kernel_initializer='random_normal')(inputs) x = BatchNormalization()(x) x = Activation('relu')(x) return x input_tensor = Input( shape=(300, 95) ) x = input_tensor x = CBRD(x, 2) x = CBRD(x, 2) x = MaxPool1D()(x) x = CBRD(x, 4) x = CBRD(x, 4) x = MaxPool1D()(x) x = CBRD(x, 8) x = CBRD(x, 8) x = MaxPool1D()(x) x = CBRD(x, 16) x = CBRD(x, 16) x = CBRD(x, 16) x = MaxPool1D()(x) x = CBRD(x, 32) x = CBRD(x, 32) x = CBRD(x, 32) x = MaxPool1D()(x) x = Flatten()(x) x = Dense(3, name='dense_last', activation='sigmoid')(x) model = Model(inputs=input_tensor, outputs=x) model.compile(loss='binary_crossentropy', optimizer='adam')
ãã¼ã¿ã»ãã
nifty newsããã¨niconico newsããã®ãã¥ã¼ã¹ã³ã¼ãã¹ãå©ç¨ãã¾ããã
zipãã¡ã¤ã«ãåå²ãã¦å§ç¸®ãã¦ãã¾ã
ããããæå ã§è©¦ãã¦ããã ãã¦æ§è½ãåºãªãã¨æããå ´åã¯ããããããã³ã¼ãã¹ã®å±æ§ããã£ã¦ããªããã®ã§ãã®ã§ãåå¦ç¿ãã¦ãããã¨æãã¾ã
https://github.com/GINK03/keras-cnn-character-code-detection/tree/master/dataset
åå¦ç
dbmã«å ¥ã£ããã¼ã¿ã»ããããå 容ãããã¹ããã¡ã¤ã«ã§åãåºãã¾ã
$ python3 14-make_files.py
nkfã使ã£ã¦eucã®ãã¼ã¿ã»ãããä½æãã¾ã(Python2ã§å®è¡)
$ python2 15-make_euc.py
nkfã使ã£ã¦sjisã®ãã¼ã¿ã»ãããä½æãã¾ã(Python2ã§å®è¡)
$ python2 16-make_shiftjis.py
byte表ç¾ã«å¯¾ãã¦indexãã¤ãã¾ã(Python3ã§å®è¡)
$ python3 17-unicode_vector.py
æçµçã«ç¨ãããã¼ã¿ã»ãããä½æãã¦KVSã«æ ¼ç´ãã¾ã(LevelDBãå¿ è¦)
$ python3 18-make_pair.py
å¦ç¿
$ python3 19-train.py --train
ãã¹ããã¼ã¿ã«ããã精度
hashå¤ã§ãã¼ã¿ã管çãã¦ãã¦ã7ããå§ã¾ããã¼ã¿ããã¹ããã¼ã¿ãã¦ãã¾ã
Train on 464 samples, validate on 36 samples Epoch 1/1 464/464 [==============================] - 1s 1ms/step - loss: 2.1088e-05 - val_loss: 2.8882e-06
val_lossã極ãã¦å°ããå¤ã«ãªã£ã¦ãããååå°ããå¤ãåºãã¦ãã¾ã
精度
7ããå§ã¾ãhashå¤ã®ãã¼ã¿ã»ããã§1000件æ¤è¨¼ããã¨ããã99.9%ã§ããï¼ãããï¼
$ python3 19-train.py --precision actual precision 99.9
äºæ³
$ python3 19-train.py --predict --file=${FILE_PATH}
ä¾
$ python3 19-train.py --predict --fild= $ python3 19-train.py --predict --file=../keras-mojibake-grabled/eucs/000000123.txt Using TensorFlow backend. this document is EUC. # <- EUCã¨ãã¦å¤å¥ããã
çµããã«
ã¢ãã«ã®ãµã¤ãºèªä½ã¯ã151kbyteã¨ããªãã³ã³ãã¯ãã«åã¾ã£ã¦ãã¦ã精度èªä½ãå®è·µçã§ãã
Microsoft Excelãªã©ã§æåã³ã¼ããå¤å®ãããªãåãã¦ãã¦ãã¦ãæ¯åãæ°åæ失ããã®ã§ããããã¯ã¼ã¯èªä½ã¯æ·±ãã§ããã軽éãªã®ã§çµã¿è¾¼ãã§å©ç¨ãããã¨ãå¯è½ãã¨æãã¾ãã
ãã®ããã«ãå®éã«æ©ä¼å¦ç¿ãé©å¿ãã¦ãçæ´»ãè±ãã«ãªãã¨è¯ãã§ããã