ã¯ããã«
é å¼µãã°ãä½ããããã£ã¦ãä¿¡ãã¦ããnikkieã§ãã
2019å¹´12ææ«ããèªç¶è¨èªå¦çã®ãã¿ã§æ¯é±1æ¬ããã°ãæ¸ãã¦ãã¾ãã
ä»é±ã¯TensorFlowã«ãããæ°ããã®ãã¼ã¿ã®æ±ãæ¹ã®ãã¥ã¼ããªã¢ã«ã«åãçµã¿ã¾ããã
ãã¥ã¼ããªã¢ã«ãtf.data ã使ã£ãããã¹ãã®èªã¿è¾¼ã¿ã
åé¡è¨å®ã¯ãè±æããã¹ãã®å¤ã¯ã©ã¹åé¡ã§ãã
- ãã¼ã¿ï¼ãã¡ãã¹ï¼ãã¼ãã¼ï¼ã®ãã¤ãªã¢ã¹ãï¼ãã¤ãªã¢ãããï¼ã®è±èªç¿»è¨³ç3éã
- cowper, derby, butler
- æ¨è«ï¼ä¸ããããããã¹ããã©ã®ç¿»è¨³çã®ãã®ã
以ä¸ã®æé ã§é²ã¿ã¾ããã
- ããã¹ãã®ãã¦ã³ãã¼ã
- ããã¹ãããã¼ã¿ã»ããã«èªã¿è¾¼ã
- ããã¹ãè¡ãæ°åã«ã¨ã³ã³ã¼ããã
- ãã¼ã¿ã»ãããããã¹ãç¨ã¨è¨ç·´ç¨ã®ãããã«åå²ããï¼ãããã¾ããð¢ï¼
- ã¢ãã«ã®æ§ç¯ã»è¨ç·´
åä½ç°å¢
$ sw_vers ProductName: Mac OS X ProductVersion: 10.14.6 BuildVersion: 18G103 $ # 注ï¼ã³ã¼ããåããã®ã«GPUã¯ä½¿ã£ã¦ãã¾ãã $ python -V # venvã«ããä»®æ³ç°å¢ãä½¿ç¨ Python 3.7.3 $ pip list # grepã使ã£ã¦æç²ãã¦è¡¨ç¤º tensorflow 2.1.0 tensorflow-datasets 1.3.2 matplotlib 3.1.2
- ipythonã®ã³ã¼ãã§ã»ã«ã®çªå·ãåå¾ããã®ã¯ãããã°ãæ¸ãã«å½ããå®è¡ããã³ã¼ããããããã§ã
- ä¹±æ°ã®ã·ã¼ããåºå®ããã«å®æ½ãã¾ãããåç¾æ§ãæ
ä¿ããå ´åã
numpy
ã®ã·ã¼ãã®åºå®ãtensorflow.random.set_seed
ãshuffle
ã¡ã½ããã®seed
å¼æ°ãæå®ãããã¨ã«ãªãã¨èãã¦ãã¾ãï¼æªå®æ½ï¼
ãããããtf.data
ã¨ã¯
2019å¹´10æã«è¡ã£ãPyCon Singaporeã®TensorFlow 2.0 ãã¥ã¼ããªã¢ã«ã§tf.data
ã«ã¤ãã¦ã¤ã³ããããã¦ãã¾ãã1ã
ãã¥ã¼ããªã¢ã«ã«ããã¨ãtf.data
ã®ç®çã¯ã深層å¦ç¿ã«åãçµãä¸ã§CPUã¨GPUã交äºã«éãã§ãã¾ãï¼idleï¼ç¶æ
ã®è§£æ±ºã¨ã®ãã¨ã§ããã
tf.data
ã使ããªãå ´åtf.data
ã使ãå ´å- CPUã§ã®ãã¼ã¿ã®æºåã¨GPUã§ã®å¦ç¿ãåæã«é²ããï¼ãã¤ãã©ã¤ã³ï¼
ãã¼ã¿ãtf.data.Dataset
ã¨ãã¦æ±ããã¨ã§ãCPUã¨GPUã交äºã«éã¶ç¶æ
ã¯è§£æ±ºããã¾ãã
ããã«ãã¼ã¿ã®æ±ãæ¹ï¼ã¤ã³ã¿ã¼ãã§ã¼ã¹ï¼ãæãã®ãã¡ãªããã ã¨èãã¾ãã
1. ããã¹ãã®ãã¦ã³ãã¼ã
tf.keras.utils.get_file
ãç¨ãã¦ãããã¹ãï¼ãã¤ãªã¢ã¹ãã®3種ã®è±èªç¿»è¨³ï¼ã®URLãæå®ãããã¦ã³ãã¼ããã¾ãã
In [4]: DIRECTORY_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/illiad/' In [5]: FILE_NAMES = ['cowper.txt', 'derby.txt', 'butler.txt'] In [6]: for name in FILE_NAMES: ...: text_dir = tf.keras.utils.get_file(name, origin=DIRECTORY_URL+name) ...:
- æåï¼ãã¼ã¿ãcacheã«ãªãããããã¦ã³ãã¼ããã
- å¿
é å¼æ°
fname
ï¼xxx.txt
- å¿
é å¼æ°
origin
ï¼ããã¹ãã®URL - ãã¼ã¿ã®cacheå
ã¯
cache_dir
å¼æ°ã¨cache_subdir
å¼æ°ã§æå® - ä»åã¯ã©ã¡ããããã©ã«ãå¤ï¼
cache_dir
å¼æ°ãNone
ãcache_subdir
å¼æ°ã'datasets'
ï¼ - â
$HOME/.keras/datasets/
2 ã«ãã¦ã³ãã¼ãããã
- å¿
é å¼æ°
- è¿ãå¤ï¼ãã¦ã³ãã¼ãããããã¡ã¤ã«ãæããã¹ï¼æå
ã®ãã·ã³ã®ä¸ã®ãã¹ï¼
- ä¸è¨ã®cacheã«ãã¦ã³ãã¼ãããããã¨ã確èªã§ãã¾ã
ãã®ãã¥ã¼ããªã¢ã«ã§ä½¿ããã¦ããããã¹ããã¡ã¤ã«ã¯ãããããããã¿ãè¡çªå·ãç« ã®ã¿ã¤ãã«ã®åé¤ãªã©ãããã¤ãã®å ¸åçãªåå¦çãè¡ã£ããã®ã§ãã
ã¨ã®è¨è¼ãããããã¹ãã®åå¦çã¯tf.data
ã使ãåã«è¡ãããã§ãï¼åå¦çãã¦butler.txt
ãªã©ã¨åãå½¢å¼ï¼åèªã®åè§ã¹ãã¼ã¹åºåãï¼ã«ããæ³å®ï¼ã
$ head -n3 .keras/datasets/butler.txt # 注ï¼ã«ã¬ã³ããã£ã¬ã¯ããªã¯ãã¼ã ãã£ã¬ã¯ã㪠Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. Many a brave soul did it send hurrying down to Hades, and many a hero did it yield a prey to dogs and
$ wc -l .keras/datasets/*.txt 12131 .keras/datasets/butler.txt # 24% 19142 .keras/datasets/cowper.txt # 39% 18333 .keras/datasets/derby.txt # 37% 49606 total
1ã§ã¯ãã¾ã tf.data
ã¯ä½¿ã£ã¦ãã¾ããã
2. ããã¹ãããã¼ã¿ã»ããã«èªã¿è¾¼ã
ããã¹ãã¯ãtf.data.TextLineDataset
3ã¨ãã¦èªã¿è¾¼ã¿ã¾ãã
TextLineDataset
ã¯ãããã¹ããã¡ã¤ã«ãããã¼ã¿ã»ãããä½æããããã«è¨è¨ããã¦ãã¾ãããã®ä¸ã§ã¯ãå ã®ããã¹ããã¡ã¤ã«ã®ä¸è¡ä¸è¡ããµã³ãã«ã§ããï¼ãã¥ã¼ããªã¢ã«åé ããï¼
In [9]: def labeler(example, index): ...: return example, tf.cast(index, tf.int64) ...: In [10]: labeled_data_sets = [] In [11]: for i, file_name in enumerate(FILE_NAMES): ...: lines_dataset = tf.data.TextLineDataset(os.path.join(parent_dir, file_name)) ...: labeled_dataset = lines_dataset.map(lambda ex: labeler(ex, i)) ...: labeled_data_sets.append(labeled_dataset) ...:
å¿
é å¼æ°filenames
ãæå®ãã¦TextLineDataset
ãä½æãã¾ãï¼for
æã®ç¯ã®1è¡ç®ï¼ã
ããã¹ããã¡ã¤ã«ã®1è¡1è¡ã«ã¤ãã¦ã©ãã«ãä»ãã¾ãï¼for
æã®ç¯ã®2è¡ç®ï¼ã
ã©ãã«ã¨ç¿»è¨³ã®å¯¾å¿ã¯ä»¥ä¸ã®éãã§ãã
ã©ãã« | 翻訳 |
---|---|
0 | cowper |
1 | derby |
2 | butler |
ï¼ãªã¹ãFILE_NAMES
ã«ãããã¤ã³ããã¯ã¹ãã©ãã«ã§ãï¼
TextLineDataset.map
ã使ã£ã¦ãlines_dataset
ã®1件1件ã«ã¤ãã¦ãlabeler
é¢æ°ãé©ç¨ãã¾ãã
labeler
é¢æ°ã¯ã©ãã«ã®åãtf.int64
ã«å¤æãã¾ãã
labeled_data_sets
ã¯
- [0]ï¼cowperã«ãã翻訳ããã¹ã
- [1]ï¼derbyã«ãã翻訳ããã¹ã
- [2]ï¼butlerã«ãã翻訳ããã¹ãã®é ã«ä¸¦ãã§ãã¾ãã
In [12]: len(labeled_data_sets) Out[12]: 3
ã¯ã©ã¹ãã¨ã«åããã¦ããã®ã§ãTextLineDataset.concatenate
ã§ä¸ç¶ãã®Datasetï¼all_labeled_data
ï¼ã«ãªãããã«ç¹ãã¾ãï¼concatenate
ããçµæã§all_labeled_data
ãæ´æ°ãã¦ç¹ãã¦ããï¼ã
In [16]: all_labeled_data = labeled_data_sets[0] In [17]: for labeled_dataset in labeled_data_sets[1:]: ...: all_labeled_data = all_labeled_data.concatenate(labeled_dataset) ...:
ç¹ããç¶æ
ã ã¨ã©ãã«ã0â1â2ã¨é çªã«ä¸¦ãã§ããã®ã§ãTextLineDataset.shuffle
ãã¦ä¸¦ã¹æ¿ãã¾ãã
In [23]: all_labeled_data = all_labeled_data.shuffle( ...: BUFFER_SIZE, reshuffle_each_iteration=False)
並ã¹æ¿ããå¾ãall_labeled_data
å
é 3件ãè¦ã¾ãã
In [18]: for ex in all_labeled_data.take(3): ...: print(ex) ...: (<tf.Tensor: shape=(), dtype=string, numpy=b'when his true comrade fell at the hands of the Trojans, and he now lies'>, <tf.Tensor: shape=(), dtype=int64, numpy=2>) (<tf.Tensor: shape=(), dtype=string, numpy=b"On his broad palm, and darkness veil'd his eyes.">, <tf.Tensor: shape=(), dtype=int64, numpy=0>) (<tf.Tensor: shape=(), dtype=string, numpy=b'All-bounteous Mercury clandestine there'>, <tf.Tensor: shape=(), dtype=int64, numpy=0>)
TextLineDataset.take
ã®è¿ãå¤ï¼Datasetï¼ããåãåºããex
ã¯tuple
ã§ããã
tupleã®ã¤ã³ããã¯ã¹1ã®è¦ç´ ããã©ãã«ã表ãã¦ãã¾ãï¼tf.Tensor
ãªãã¸ã§ã¯ããªããã®ï¼ã
ã©ãã«ã®å¤ï¼numpy
ï¼ãè¦ãã¨ãå
é ã®ããã¹ãã®ã©ãã«ã2ãªã®ã§ãã·ã£ããã«ããã¦ãã¾ããã
In [23]: type(ex) Out[23]: tuple In [26]: ex[0] # ããã¹ãã表ã Out[26]: <tf.Tensor: shape=(), dtype=string, numpy=b'All-bounteous Mercury clandestine there'> In [27]: ex[1] # ã©ãã«ã表ã Out[27]: <tf.Tensor: shape=(), dtype=int64, numpy=0>
ããã§ã¯ãããã¹ãè¡ã«ã¯ã©ã¹ã表ãã©ãã«ãä»ãã1ã¤ã®Datasetã¨ãã¦ã¾ã¨ãããã¨ããã¾ããã
3. ããã¹ãè¡ãæ°åã«ã¨ã³ã³ã¼ããã
æ©æ¢°å¦ç¿ã¢ãã«ãæ±ãã®ã¯åèªã§ã¯ãªãã¦æ°åã§ãããããæååã¯æ°åã®ãªã¹ãã«å¤æããå¿ è¦ãããã¾ãã
ãã¤ã¿ã¼éä¿¡ã®ãã¼ã¿ã»ããã®ããã«4ãæ´æ°ã®ä¸¦ã³ã«ããã¨ãããã¨ã§ããã
以ä¸ã®2ã¹ãããã§ããã
- ããã£ãã©ãªã¼ã®æ§ç¯
- ããã¹ãã®ã¨ã³ã³ã¼ã
ããã£ãã©ãªã¼ã®æ§ç¯
tfds.features.text.Tokenizer
ã使ã£ã¦ãããã¹ãã«å«ã¾ããåèªãåãåºãã¾ãã
In [28]: sample_tokenizer = tfds.features.text.Tokenizer() In [29]: ex[0].numpy() Out[29]: b'All-bounteous Mercury clandestine there' In [30]: sample_tokenizer.tokenize(ex[0].numpy()) Out[30]: ['All', 'bounteous', 'Mercury', 'clandestine', 'there']
TextLineDataset
ã®1件1件ï¼ã¿ãã«ï¼ã§ã¯ãã¤ã³ããã¯ã¹0ãããã¹ãã表ãã¾ãã
ããã¹ãèªä½ã¯numpy()
ã¡ã½ããã§åå¾ã§ãã¾ãã
ãããTokenizer.tokenize
ã«ä¸ããã¨ãåèªãæ ¼ç´ãããªã¹ããè¿ãã¾ãã
å
¨ã¦ã®ããã¹ãã«ã¤ãã¦ä¸è¨ã®å¦çãç¹°ãè¿ããå
¥æããåèªãããã£ãã©ãªã¼ã¨ãã¾ãã
tokenize
ã®è¿ãå¤ãã»ããã«å
¥ãããã¨ã§ãéè¤ãé¤ãã¾ãã
In [32]: tokenizer = tfds.features.text.Tokenizer() In [33]: vocabulary_set = set() In [34]: for text_tensor, _ in all_labeled_data: ...: some_tokens = tokenizer.tokenize(text_tensor.numpy()) ...: vocabulary_set.update(some_tokens) ...: In [35]: vocab_size = len(vocabulary_set) In [36]: vocab_size Out[36]: 17178
æ¥æ¬èªããã¹ããæ±ãå ´åã¯ãTokenizer
åæåæã«alphanum_only
å¼æ°ãFalse
ã«æå®ãããã¨ã«ãªãããã§ãã
ããã¹ãã®ã¨ã³ã³ã¼ã
vocabulary_set
ãtfds.features.text.TokenTextEncoder
ã«æ¸¡ãã¦ã¨ã³ã³ã¼ãã¼ãä½æãã¾ãã
In [37]: encoder = tfds.features.text.TokenTextEncoder(vocabulary_set)
ã¨ã³ã³ã¼ãTokenTextEncoder
ã¯ã
- å¿
é å¼æ°
vocab_list
ã渡ãã¦åæå encode
ã¡ã½ããã«ããã¹ãã渡ãã¨ãå¤æããæ´æ°ã®ãªã¹ãï¼a list of integersï¼ãè¿ã
ããã使ã£ã¦ãall_labeled_data
ã®åããã¹ãããæ´æ°ã並ãã ãªã¹ãã«ç½®ãæãã¾ãã
In [43]: def encode(text_tensor, label): ...: encoded_text = encoder.encode(text_tensor.numpy()) ...: return encoded_text, label ...: In [44]: def encode_map_fn(text, label): ...: encoded_text, label = tf.py_function( ...: encode, inp=[text, label], Tout=(tf.int64, tf.int64)) ...: encoded_text.set_shape([None]) ...: label.set_shape([]) ...: return encoded_text, label ...: In [45]: all_encoded_data = all_labeled_data.map(encode_map_fn)
tf.py_function
ã«ã¤ãã¦ã¯ä»åæãä¸ãããã¦ããªãã®ã§ãããããã¥ã¡ã³ãã®ä¸ã«
tf.function
ãå¼ã³åºããããã³ã« Python ã®ã³ã¼ããå®è¡ãããã®ã§ããã°ãtf.py_function
ãã´ã£ããã§ãã
ã¨ããè¨è¼ãè¦ã¤ãã¾ãã5ã
all_labeled_data
ã®1件1件ã«map
ã¡ã½ãããä»ãã¦encode
é¢æ°ãé©ç¨ããããã«ä½¿ã£ã¦ããããã§ãã
set_shape
ã®ãããã¯è¨èªåã§ããã¬ãã«ã§ç解ã§ãã¦ããªãã®ã§ãããtf.function
ã¨ã®é¢ä¿ã®ä¸ã§ã¨ãããã¨ããããã§ãã
ã¨ã³ã³ã¼ãããããã¹ãã®ç¢ºèª
In [40]: for ex in all_encoded_data.take(3): ...: print(ex) ...: (<tf.Tensor: shape=(15,), dtype=int64, numpy= array([ 7516, 2349, 11186, 13393, 4276, 1931, 1495, 8889, 16159, 1495, 6530, 423, 5728, 1429, 12401])>, <tf.Tensor: shape=(), dtype=int64, numpy=2>) (<tf.Tensor: shape=(10,), dtype=int64, numpy= array([ 4287, 2349, 384, 3282, 423, 8741, 14597, 2713, 2349, 5192])>, <tf.Tensor: shape=(), dtype=int64, numpy=0>) (<tf.Tensor: shape=(5,), dtype=int64, numpy=array([14260, 10012, 1468, 907, 9474])>, <tf.Tensor: shape=(), dtype=int64, numpy=0>)
4. ãã¼ã¿ã»ãããããã¹ãç¨ã¨è¨ç·´ç¨ã®ãããã«åå²ãã
å°ããªãã¹ãç¨ãã¼ã¿ã»ããã¨ããã大ããªè¨ç·´ç¨ã»ãããä½æãã¾ãã
- ãã¹ãç¨ãã¼ã¿ã¯
TextLineDataset.take
ã§å é ããåãåºã - è¨ç·´ç¨ãã¼ã¿ã¯
TextLineDataset.skip
ã§ãå é ãããã¹ãç¨ãã¼ã¿ãé¿ãã¦åãåºã
è¨ç·´ç¨ãã¼ã¿ã¨ãã¹ãç¨ãã¼ã¿ãããããåããã¾ãã
å
ã»ã©è¦ãããã«ãããã¹ããå¤æããæ´æ°ã®ãªã¹ãã®é·ãã¯ç°ãªãã¾ãï¼å
ã®ããã¹ãã®åèªæ°ãç°ãªãããï¼ã
ãããåã«ããããçããªã¹ãã«ã¯0ãåãã¦ãåããµã¤ãºã«ãã¾ãã
ããã£ãï¼TextLineDataset.padded_batch
ãã¥ã¼ããªã¢ã«ã®ã³ã¼ãã®ã¨ããã«å®è¡ããã¨ãå¿ é å¼æ°2ã¤ãæå®ããã¦ããªãããã«ã¨ã©ã¼ãçºçãã¾ãã
In [41]: train_data = all_encoded_data.skip(TAKE_SIZE).shuffle(BUFFER_SIZE) In [42]: train_data = train_data.padded_batch(BATCH_SIZE) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-42-8afff0f1d82a> in <module> ----> 1 train_data = train_data.padded_batch(BATCH_SIZE) TypeError: padded_batch() missing 1 required positional argument: 'padded_shapes'
padded_shapes
å¼æ°ã«ã¤ãã¦
A nested structure of
tf.TensorShape
ortf.int64
vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g.tf.compat.v1.Dimension(None)
in atf.TensorShape
or-1
in a tensor-like object) will be padded to the maximum size of that dimension in each batch.
解決ã®åèã«ãªã£ãã®ã¯ã以ä¸ã®Issueã®ã³ã¼ãï¼
output_shapes_train = tf.compat.v1.data.get_output_shapes(ds_train)
tf.compat.v1.data.get_output_shapes
ã§all_encoded_data
ã®output shapesãåå¾ããpadded_batch
ã®padded_shapes
å¼æ°ã«æ¸¡ãã¾ãã
In [45]: output_shapes = tf.compat.v1.data.get_output_shapes(all_encoded_data) In [46]: output_shapes Out[46]: (TensorShape([None]), TensorShape([])) In [47]: train_data = train_data.padded_batch(BATCH_SIZE, output_shapes)
train_data
ã®ãããã®ãã¡ãå
é 1件ã確èªãã¾ãã
In [52]: for sample_text, sample_label in train_data.take(1): ...: print(sample_text) ...: print('-' * 40) ...: print(sample_label) ...: tf.Tensor( [[ 9310 722 10783 ... 0 0 0] [ 5766 12211 13137 ... 0 0 0] [ 2584 10652 4521 ... 0 0 0] ... [ 7497 8881 8781 ... 0 0 0] [ 6450 7516 1495 ... 0 0 0] [ 1785 2713 15027 ... 0 0 0]], shape=(64, 17), dtype=int64) ---------------------------------------- tf.Tensor( [1 0 0 0 2 2 2 1 0 0 0 0 2 2 0 0 0 0 2 1 1 0 1 0 1 1 1 0 1 1 1 1 1 2 2 0 2 0 1 0 2 2 2 2 1 2 1 0 2 1 0 0 0 0 0 2 1 1 0 1 0 2 0 0], shape=(64,), dtype=int64)
sample_text
ã®æ´æ°ã®ä¸¦ã³ã¯çµããã0ã¨ãªã£ã¦ãé·ããæã£ã¦ãã¾ããã
BATCH_SIZE
ã§ã¾ã¨ã¾ã£ã¦ãã¾ãï¼ããããããåã¨ããèªèã§ãï¼ã
ãã¹ãç¨ãã¼ã¿ãåæ§ã«ãããåãã¾ãã
In [55]: test_data = all_encoded_data.take(TAKE_SIZE) In [56]: test_data = test_data.padded_batch(BATCH_SIZE, output_shapes)
0ã¨ããæ°ããçªå·ã§åããã®ã§ãããã£ãã©ãªã¼ãµã¤ãºã1å¢ããã¾ãã
In [57]: vocab_size += 1
5. ã¢ãã«ã®æ§ç¯ã»è¨ç·´
ãã¥ã¼ããªã¢ã«ã§ã¯ãã¹ãç¨ãã¼ã¿ãè¨ç·´ä¸ã®ããªãã¼ã·ã§ã³ã«ä½¿ã£ã¦ããã®ãæ°ã«ãªã£ãã®ã§ããã¹ãç¨ã»ããªãã¼ã·ã§ã³ç¨ã»è¨ç·´ç¨ã«åãã¾ããã
In [82]: test_data = all_encoded_data.take(TAKE_SIZE) In [83]: test_data = test_data.padded_batch(BATCH_SIZE, output_shapes) In [84]: train_data = all_encoded_data.skip(TAKE_SIZE).shuffle(BUFFER_SIZE) In [85]: val_data = train_data.take(TAKE_SIZE) In [86]: val_data = val_data.padded_batch(BATCH_SIZE, output_shapes) In [87]: train_data = train_data.skip(TAKE_SIZE).shuffle(BUFFER_SIZE) In [88]: train_data = train_data.padded_batch(BATCH_SIZE, output_shapes)
ãã¥ã¼ããªã¢ã«ã«æ²¿ã£ãã¢ãã«ã¨ãã¾ãã
In [89]: model = tf.keras.Sequential( ...: [ ...: tf.keras.layers.Embedding(vocab_size, 64), ...: tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)), ...: tf.keras.layers.Dense(64, activation='relu'), ...: tf.keras.layers.Dense(64, activation='relu'), ...: tf.keras.layers.Dense(3, activation='softmax') ...: ] ...: )
- Embedding layer
- LSTM
- Dense 1
- Dense 2
- Dense 3ï¼softmaxãã¨ã£ã¦ã3ã¯ã©ã¹ã®ã©ããã«åé¡ï¼
In [90]: model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', ...: metrics=['accuracy']) In [91]: history = model.fit(train_data, epochs=3, validation_data=val_data) Epoch 1/3 619/Unknown - 21s 33ms/step - loss: 0.5528 - accuracy: 0.72992020-01-19 13:15:39.947590: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 2020-01-19 13:15:48.626987: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 619/619 [==============================] - 29s 47ms/step - loss: 0.5528 - accuracy: 0.7299 - val_loss: 0.3283 - val_accuracy: 0.8696 Epoch 2/3 617/619 [============================>.] - ETA: 0s - loss: 0.3209 - accuracy: 0.85992020-01-19 13:16:15.284238: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 619/619 [==============================] - 27s 43ms/step - loss: 0.3207 - accuracy: 0.8599 - val_loss: 0.2153 - val_accuracy: 0.9172 Epoch 3/3 615/619 [============================>.] - ETA: 0s - loss: 0.2440 - accuracy: 0.89502020-01-19 13:16:41.786026: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 619/619 [==============================] - 27s 43ms/step - loss: 0.2438 - accuracy: 0.8951 - val_loss: 0.1813 - val_accuracy: 0.9298
ãã¹ãç¨ãã¼ã¿ã«å¯¾ãã¦ã®æ£è§£çã¯84.0%ã§ããï¼ãã¥ã¼ããªã¢ã«éãï¼
In [92]: eval_loss, eval_acc = model.evaluate(test_data) 79/Unknown - 2s 23ms/step - loss: 0.3762 - accuracy: 0.83962020-01-19 13:21:13.484423: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 79/Unknown - 2s 23ms/step - loss: 0.3762 - accuracy: 0.8396 In [93]: f'Eval loss: {eval_loss}, Eval accuracy: {eval_acc}' Out[93]: 'Eval loss: 0.37618066295038294, Eval accuracy: 0.8396000266075134'
éå»ã«ä½ã£ã plot_accuracy
é¢æ°6ã使ã£ã¦ãå¦ç¿ç¶æ³ãå¯è¦åãã¾ãã
è¨ç·´ç¨ãã¼ã¿ã«ãããæ£è§£çãããªãã¼ã·ã§ã³ç¨ãã¼ã¿ã«ãããæ£è§£çãä¸åã£ã¦ãããããå¦ç¿ä¸è¶³ã¨ããå°è±¡ã§ãã
epochs
ã¯ããå°ãå¢ããããã§ãã
ææ³
åãã¦ãã£ãã触ã£ãtf.data
ã
ãã®åãçµã¿ã®ä¸ã§ã¯ããã¾ã§ãtf.keras.preprocessing.text.Tokenizer
ã§ããã¹ããnumpyã®arrayã«å¤æãã¦æ±ãã¾ããã
tf.data
ã«ã¯Tensor
ãtf.function
ãªã©é¢ä¿ãããã®ãå¤ãã¾ã æ´ã¿ããã¦ããªãã®ã§ããããã¼ã¿ã®æ±ããæãã®ã¯ä¾¿å©ãããªã®ã§ãå¼ãç¶ã触ã£ã¦ããããã§ãï¼èªããã³ã¼ããå¢ãããã§ããï¼ã
kerasã®Tokenizerã¨tfdsã®Tokenizer, TokenTextEncoderã¯å½¹å²ããã¶ã£ã¦ããå°è±¡ã§ãããã©ã¡ãã使ã£ãã»ããããã¨ããæéãããã®ããç§ãæ°ã«ãªãã¾ãï¼
å¦ãã ãã¨
- TensorFlowã ãã®å©ç¨ã§è±èªã®ããã¹ããæ°å¤ã«ã¨ã³ã³ã¼ãããæ¹æ³
- ããã¹ããæ±ãã¨ãã«æå¹ã¨èãEmbeddingãLSTMã®layerã®ä½¿ãæ¹ï¼ã®åãã®ä¸æ©ï¼
Future Worksï¼ä»å¾ã®ãã¿å¸³ï¼
tf.data
ã§æ¥æ¬èªãæ±ã£ã¦ã¿ã- tf.data.Dataset apiでテキスト (自然言語処理) の前処理をする方法をまとめる - Qiita ããå§ãã¦ã¢ãã«ã¬ããªãã
- ããã¥ã¡ã³ãã«ããä»ã®ãã¥ã¼ããªã¢ã«
- Tensorã®æ±ãã調ã¹ã
tf.function
ã調ã¹ã- ãã¤ã¿ã¼éä¿¡ã®ã¢ã¦ãããããæ´æ°ï¼Datasetã¨ãã¦æ±ããã¢ãã«ãã¢ãããã¼ãï¼
- ãã®ãã¥ã¼ããªã¢ã«ã§ã©ãã«ã®one hot encodingã試ããã
-
ã¹ã©ã¤ãä¸ã®å³ã¯ããã¥ã¡ã³ãããæ´æ°ãããããã§ããããã¥ã¡ã³ãå ã®èª¬æã¯ãã¡ãï¼Better performance with the tf.data API | TensorFlow Core↩
-
get_file
ã®ããã¥ã¡ã³ããããBy default the file at the urlorigin
is downloaded to the cache_dir~/.keras
, placed in the cache_subdirdatasets
, and given the filenamefname
.ã↩ -
issubclass(tf.data.TextLineDataset, tf.data.Dataset)
ãTrueãªã®ã§ãtf.data.TextLineDataset
ã¯tf.data.Dataset
ã®ãµãã¯ã©ã¹ã§ãï¼tf.data.Dataset
ã¨åæ§ã«ãã¤ãã©ã¤ã³ã§ãã¼ã¿ãæ±ããã¨ãããã¨ã§ãï¼ref:ããã¥ã¡ã³ã çµã¿è¾¼ã¿é¢æ°issubclass
↩ -
[前処理編] ロイター通信のデータセットを用いて、ニュースをトピックに分類するモデル(MLP)をkerasで作る(TensorFlow 2系) - Qiita↩
-
ref: [モデル構築編] ロイター通信のデータセットを用いて、ニュースをトピックに分類するモデル(MLP)をkerasで作る(TensorFlow 2系) - Qiita↩