SlideShare a Scribd company logo
GI B
AE
01 2/ 1
9
2
2
4
3
B1
5
)(
3
C Te TC a
C RTs Ci C C
ü t t p s a s g C
• (/ 2) / H N Cs L
• s C C N
• Nv
• ( - N
• . N
• coRh C L (/ 2) /
• D V s m LirgS nd I GN C
dpa V E
A :
2015 32016 h c 1 1 P P b 2
( )
0.
9 2
. . 1
2
5
Kaggle 2019 11 125,564 627 (top 0.5%)
SIGNATE
ü 2
https://www.slideshare.net/matsukenbook/signate-108228406
6
2
4
3
B1
5
( ( ) )( ( ) ( (
01
https://gluebenchmark.com/leaderboard
BERT入門
9
6 IB 5B MF
EB / MLPN C 4F DPFN F - BL F F S / 4- 5 EBQ N / MM kdi l o
EB C MA B FIB MBB - PM S o W xo
5F M N C 9BNB M E 7 M LEM NB / MLPN 597/ 2 - PM S ho r o
' BI F BR P FIF MF S B EI M 7B MN LB MI / MM ho o u# o w
8P M 8PBN F 7 FMN 887 2 - PM S ho r o
5P F643 5 EBA 5643 I - PM S
2ho contradiction, entailment, neutralo3 Xtrain
n ergenreo
5P F643 5FNI EBA 5643 II - PM S
2ho contradiction, entailment, neutralo3 Xtrain
n dm genreo
8PBN F 643 8643 - PM S
F FLBAF o n di u r o
9B D FTF D BR P 1 F IB 9 1 - PM S ho r o
# F DM A 643 643 - PM S
C u rk d e ab trpo r
fsu ir
0F D N F N 5 F - 5 EBQ N / MM
contradiction, entailment, neutralo3 Xtrain datamd GLUE
y W dg
o ho z w csr
10
2
4
3
B1
5
11
1
ü / : : s : : SbF d
aF. /gfe lind
ü / : :
- B 6 - 6 - B -
. E 1 / 6: : .1/
gfed tRF mkn R : : N
ü 2 B 6 urdk f lind
ü 2 B N B T o ed
• 4 B (
• (
x
ü / B: : 6: B: Bd M 4 66:
) 2 L b p P Flin r cb
12
2
4
3
B1
5
- 1 2 1 1
13
0 B 4: : 8 C 45 .: BC 45
h BB 4 F: 8 45 ) ) (
0 B 4: : 8 .: BC Wo T dbe P unPR0
B 4: : 8To P vyo wtWPR
ü dbe PTo wt ( 1 2 21
( 1 ( ( )1 1 1 1
1 B 1 2 ) iL e
E :8 B
ü a : : :4 (sx C
)sx
0 B 4: : 8To PRE :8 B vyo P rn
-12 e gW k /4 p lk
W n wt
1 2
N E MF L
14
text
1
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
text two kids are playing in a swimming pool with a green colored crocodile float .
ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012
text
2
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12
text two kids push an in ##fl ##atable crocodile around in a pool .
ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012
4 9 : :olpi uT t ir T ?:B :
r ShW Pz g
: - B 7 : . B 7 :
B 7 : ea g P g
:# B:9_3 ?:B : ir 4 9 : :i uT t r T ?:B :
B5 9colp nkm g s :# B:9 ?:B : _ (
x = - = B : B : B5 : ?:B : = /= = = . ?:B :
_ : 232#0 _ B9: ++df
b t _)
15
['[CLS]', 'two', 'kids', 'are', 'playing', 'in', 'a', 'swimming', 'pool', 'with', 'a',
'green', 'colored', 'crocodile', 'float', '.', '[SEP]', 'two', 'kids', 'push', 'an',
'in', '##fl', '##atable', 'crocodile', 'around', 'in', 'a', 'pool', '.', '[SEP]']
text
1
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
text two kids are playing in a swimming pool with a green colored crocodile float .
ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012
text
2
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12
text two kids push an in ##fl ##atable
crocod
ile
aroun
d
in a pool .
ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012
1 1: 1
16
[CLS] two kids are playing kids[SEP] two kidstwo [SEP]
( o % 8 ) o - n s
8 o EA B E n s o
n s l r 4 5t c 5 t
0 11 20 3 , 11 5 e p 5
playing two two
[CLS] two kids are [MASK] kids[SEP] dog kids[MASK] [SEP]
pred2 pred3 pred4
kids
pred1
cross entropy
loss
cross entropy
loss
cross entropy
loss
cross entropy
loss
2 - 2 2 2
17
N I 1 2
IsNext
pred1
cross entropy
loss
e
( ) ) )
[CLS] two kids are playing kids[SEP] two kidstwo [SEP]
/
18
#
unk_token = “[UNK]”, #
sep_token = “[SEP]”, #
pad_token = “[PAD]”, #
cls_token = “[CLS]”, #
mask_token = "[MASK]", # pre-training
https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L141-L145
19
2
4
3
B1
5
20
/
- E B
https://huggingface.co/transformers/index.html
21
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text1 = "[CLS] Two kids are playing in a swimming pool with a green colored crocodile
float. [SEP]"
text2 = "Two kids push an inflatable crocodile around in a pool. [SEP]"
tokenized_text = tokenizer.tokenize(text1 + " " + text2)
print(tokenized_text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
pos_sep = tokenized_text.index(“[SEP]”)+1 # al o gi [SEP] 1st sentence
segments_ids = [0]*pos_sep + [1]*(len(indexed_tokens)-pos_sep)
tokens_tensor = tf.Variable([indexed_tokens])
segments_tensors = tf.Variable([segments_ids])
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
print(model.summary())
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
print(outputs)
r Pf BF E gd T - P
C - c C fm nk Fph
e R
-
22
https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L32-L52
vocab
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",
"bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",
"bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",
"bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",
…
"bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/vocab.txt",
"bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/vocab.txt",
}
}
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_tf_bert.py#L32-L52
weight
TF_BERT_PRETRAINED_MODEL_ARCHIVE_MAP = {
"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5",
"bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-tf_model.h5",
"bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-tf_model.h5",
"bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-tf_model.h5",
"bert-base-multilingual-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5",
"bert-base-multilingual-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-tf_model.h5",
…
"bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/tf_model.h5",
"bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/tf_model.h5",
}
)
23
layer_outputs[0]
shape
24
L . -/ . 1 . --/ - 5 51 5 0 1 5 5 5 51 5 /- 5 / 5 # ##
vocab_size=30522,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
layer_norm_eps=1e-12,
:
25
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayer
TFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
: /1 //1 /. :. : : : :. : 1 / .
//1 /. C :. : C ::1.1 1 F B
: 21 /
C
TFBertForSequenceClassification
26
TFBertMainLayer
TFBertPooler
TFBertEmbedding TFBertEncorder
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayerTFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
input_ids
position_ids
token_type_ids
input_embeds
(n_seq,)
(n_seq,)
(n_seq,)
(n_seq, dim)
input
attention_mask
(n_seq,)
Dense
DD N
Extract
first seq
a e
i F h
(dim, ) (dim, )(n_seq, dim)
= bgF
(n_seq, dim)
pooled_output
sequence_output
(dim,)
(n_seq, dim)
N N
Dropout Dense
DD N
(dim, ) (n_class)
output
BERT c Dropout, Dense _B fTS
dC
27
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayerTFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
input_ids
position_ids
token_type_ids
input_embeds
8 ab
pd c n
heT(
(n_seq,)
(n_seq,)
(n_seq,)
(n_seq, dim)
8 , , 8 , , 8 o
Weight
[word_embeddings]
Embedding
[position_embeddings]
Embedding
[token_type_embeddings]
gather
(vocab_size, dim)
(n_seq, dim)
(type_vocab_size, dim)
+
(n_seq, dim)
LayerNormalization Dropout hidden_status
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
max_position_embeddings=512
( 8 7
sq[ m
N l i
! " ] E
_f r
) 8 y /
/ n L
gx 8 ] E k
6 6 , t z
8
input
(n_seq, dim)(n_seq, dim)(n_seq, dim)
28
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayerTFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
TFBertEncoder
hidden_status
attention_mask
input
(n_seq, dim)
(n_seq,)
TFBertLayer TFBertLayer TFBertLayer TFBertLayer
TFBertLayer TFBertLayer TFBertLayer TFBertLayer
TFBertLayer TFBertLayer TFBertLayer TFBertLayer
hidden_status
2
(n_seq, dim)
22 2 2 12
TFBertLayer
TFBertSelfOutput
29
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayerTFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
hidden_status
attention_mask
input
(n_seq, dim)
(n_seq,) TFBertSelfAttention
TFBertSelfOutput
Dense Layer
NormalizationDropout
TFBertIntermediate
Dense gelu
TFBertOutput
Dense Dropout
Layer
Normalization+
hidden_status
(n_seq, dim)
(n_seq,
dim)
(n_seq,
dim) (n_seq, dim)
(n_seq,
dim)
(n_seq,
dim)
(n_seq,
dim)
(n_seq,
dim) (n_seq, dim)
D A
A
=A
A z c lnpo cbvs
30
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayerTFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
hidden_status
attention_mask
multi head
multi head
multi head
Dense
[query (Q)]
Dense
[ Key (K)]
Dense
[ Value (V)]
input
= A u dcb
- [mf g
q ehi ]
Qc[_a A DNMMb
u[- vs N
b
]
tS
(n_seq, dim)
D = A e e AA A e Tb
(n_seq, dim)
(n_head, n_seq,
dim/n_head) (n_head, n_seq, n_seq)
(n_seq,)
softmax Dropout hidden_status
x _ [
A c
cb
attention_mask
attention_mask
attention_mask
attention_mask
(n_head, n_seq,
n_seq)
(n_head, n_seq,
n_seq)
(n_head, n_seq,
dim/n_head)
Reshape
(n_seq, dim)
x _ b
Q vseQQ ]
A A (
) NtS M cbu
AA A e AA A K
[
+
sMc KV A
e r
31
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
max_position_embeddings=512
hidden_size=768
1 2 = 1
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
dim/n_head
=64
dim/n_head
=64
dim/n_head
=64
n_head = 12
multi head
max_position_embeddings=512
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
( )(
32
max_position_embeddings=512
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
max_position_embeddings = 512
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
Q lh K lh i d
!"
!#
!$
!%
!&##
'" '# '$
!(
!&
'% '&##
!( '$ i
) ( e D D ) ( am
Key
Query
lh ( n D b )( D D g
( )(
33
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
softmax
m g D x w tl
Key
Query
QueryK “kids”
KeyKo s K
Softmax q
i1 b
q ehd K () f n
sK (K K n ra
( )(
34
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
Key
Query
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
max_position_embeddings=512
hidden_size=768
!"
!#
!$
!%
!&##
!'
!&
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
[SEP]
[PAD]
[PAD]
Qweight h
D gd ( ) 1Q ( a ( ( )
hQ
V gd ( ( ) i ( ( ( Q be
35
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
https://arxiv.org/abs/1810.04805
Transformers
https://huggingface.co/transformers
3rd party pre-trained
https://huggingface.co/models
36

More Related Content

BERT入門

  • 3. )( 3 C Te TC a C RTs Ci C C ü t t p s a s g C • (/ 2) / H N Cs L • s C C N • Nv • ( - N • . N • coRh C L (/ 2) / • D V s m LirgS nd I GN C dpa V E A :
  • 4. 2015 32016 h c 1 1 P P b 2 ( ) 0. 9 2 . . 1 2
  • 5. 5 Kaggle 2019 11 125,564 627 (top 0.5%) SIGNATE ü 2 https://www.slideshare.net/matsukenbook/signate-108228406
  • 7. ( ( ) )( ( ) ( ( 01 https://gluebenchmark.com/leaderboard
  • 9. 9 6 IB 5B MF EB / MLPN C 4F DPFN F - BL F F S / 4- 5 EBQ N / MM kdi l o EB C MA B FIB MBB - PM S o W xo 5F M N C 9BNB M E 7 M LEM NB / MLPN 597/ 2 - PM S ho r o ' BI F BR P FIF MF S B EI M 7B MN LB MI / MM ho o u# o w 8P M 8PBN F 7 FMN 887 2 - PM S ho r o 5P F643 5 EBA 5643 I - PM S 2ho contradiction, entailment, neutralo3 Xtrain n ergenreo 5P F643 5FNI EBA 5643 II - PM S 2ho contradiction, entailment, neutralo3 Xtrain n dm genreo 8PBN F 643 8643 - PM S F FLBAF o n di u r o 9B D FTF D BR P 1 F IB 9 1 - PM S ho r o # F DM A 643 643 - PM S C u rk d e ab trpo r fsu ir 0F D N F N 5 F - 5 EBQ N / MM contradiction, entailment, neutralo3 Xtrain datamd GLUE y W dg o ho z w csr
  • 11. 11 1 ü / : : s : : SbF d aF. /gfe lind ü / : : - B 6 - 6 - B - . E 1 / 6: : .1/ gfed tRF mkn R : : N ü 2 B 6 urdk f lind ü 2 B N B T o ed • 4 B ( • ( x ü / B: : 6: B: Bd M 4 66: ) 2 L b p P Flin r cb
  • 13. - 1 2 1 1 13 0 B 4: : 8 C 45 .: BC 45 h BB 4 F: 8 45 ) ) ( 0 B 4: : 8 .: BC Wo T dbe P unPR0 B 4: : 8To P vyo wtWPR ü dbe PTo wt ( 1 2 21 ( 1 ( ( )1 1 1 1 1 B 1 2 ) iL e E :8 B ü a : : :4 (sx C )sx 0 B 4: : 8To PRE :8 B vyo P rn -12 e gW k /4 p lk W n wt 1 2 N E MF L
  • 14. 14 text 1 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 text two kids are playing in a swimming pool with a green colored crocodile float . ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012 text 2 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 text two kids push an in ##fl ##atable crocodile around in a pool . ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012 4 9 : :olpi uT t ir T ?:B : r ShW Pz g : - B 7 : . B 7 : B 7 : ea g P g :# B:9_3 ?:B : ir 4 9 : :i uT t r T ?:B : B5 9colp nkm g s :# B:9 ?:B : _ ( x = - = B : B : B5 : ?:B : = /= = = . ?:B : _ : 232#0 _ B9: ++df b t _)
  • 15. 15 ['[CLS]', 'two', 'kids', 'are', 'playing', 'in', 'a', 'swimming', 'pool', 'with', 'a', 'green', 'colored', 'crocodile', 'float', '.', '[SEP]', 'two', 'kids', 'push', 'an', 'in', '##fl', '##atable', 'crocodile', 'around', 'in', 'a', 'pool', '.', '[SEP]'] text 1 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 text two kids are playing in a swimming pool with a green colored crocodile float . ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012 text 2 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 text two kids push an in ##fl ##atable crocod ile aroun d in a pool . ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012
  • 16. 1 1: 1 16 [CLS] two kids are playing kids[SEP] two kidstwo [SEP] ( o % 8 ) o - n s 8 o EA B E n s o n s l r 4 5t c 5 t 0 11 20 3 , 11 5 e p 5 playing two two [CLS] two kids are [MASK] kids[SEP] dog kids[MASK] [SEP] pred2 pred3 pred4 kids pred1 cross entropy loss cross entropy loss cross entropy loss cross entropy loss
  • 17. 2 - 2 2 2 17 N I 1 2 IsNext pred1 cross entropy loss e ( ) ) ) [CLS] two kids are playing kids[SEP] two kidstwo [SEP] /
  • 18. 18 # unk_token = “[UNK]”, # sep_token = “[SEP]”, # pad_token = “[PAD]”, # cls_token = “[CLS]”, # mask_token = "[MASK]", # pre-training https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L141-L145
  • 21. 21 import tensorflow as tf from transformers import BertTokenizer, TFBertForSequenceClassification tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text1 = "[CLS] Two kids are playing in a swimming pool with a green colored crocodile float. [SEP]" text2 = "Two kids push an inflatable crocodile around in a pool. [SEP]" tokenized_text = tokenizer.tokenize(text1 + " " + text2) print(tokenized_text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) pos_sep = tokenized_text.index(“[SEP]”)+1 # al o gi [SEP] 1st sentence segments_ids = [0]*pos_sep + [1]*(len(indexed_tokens)-pos_sep) tokens_tensor = tf.Variable([indexed_tokens]) segments_tensors = tf.Variable([segments_ids]) model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased') print(model.summary()) outputs = model(tokens_tensor, token_type_ids=segments_tensors) print(outputs) r Pf BF E gd T - P C - c C fm nk Fph e R -
  • 22. 22 https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L32-L52 vocab PRETRAINED_VOCAB_FILES_MAP = { "vocab_file": { "bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt", "bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt", "bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt", "bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt", … "bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/vocab.txt", "bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/vocab.txt", } } https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_tf_bert.py#L32-L52 weight TF_BERT_PRETRAINED_MODEL_ARCHIVE_MAP = { "bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5", "bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-tf_model.h5", "bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-tf_model.h5", "bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-tf_model.h5", "bert-base-multilingual-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5", "bert-base-multilingual-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-tf_model.h5", … "bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/tf_model.h5", "bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/tf_model.h5", }
  • 24. 24 L . -/ . 1 . --/ - 5 51 5 0 1 5 5 5 51 5 /- 5 / 5 # ## vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act="gelu", hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, :
  • 25. 25 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput : /1 //1 /. :. : : : :. : 1 / . //1 /. C :. : C ::1.1 1 F B : 21 / C
  • 26. TFBertForSequenceClassification 26 TFBertMainLayer TFBertPooler TFBertEmbedding TFBertEncorder TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayerTFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput input_ids position_ids token_type_ids input_embeds (n_seq,) (n_seq,) (n_seq,) (n_seq, dim) input attention_mask (n_seq,) Dense DD N Extract first seq a e i F h (dim, ) (dim, )(n_seq, dim) = bgF (n_seq, dim) pooled_output sequence_output (dim,) (n_seq, dim) N N Dropout Dense DD N (dim, ) (n_class) output BERT c Dropout, Dense _B fTS dC
  • 27. 27 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayerTFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput input_ids position_ids token_type_ids input_embeds 8 ab pd c n heT( (n_seq,) (n_seq,) (n_seq,) (n_seq, dim) 8 , , 8 , , 8 o Weight [word_embeddings] Embedding [position_embeddings] Embedding [token_type_embeddings] gather (vocab_size, dim) (n_seq, dim) (type_vocab_size, dim) + (n_seq, dim) LayerNormalization Dropout hidden_status [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] hidden_size=768 max_position_embeddings=512 ( 8 7 sq[ m N l i ! " ] E _f r ) 8 y / / n L gx 8 ] E k 6 6 , t z 8 input (n_seq, dim)(n_seq, dim)(n_seq, dim)
  • 28. 28 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayerTFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput TFBertEncoder hidden_status attention_mask input (n_seq, dim) (n_seq,) TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer hidden_status 2 (n_seq, dim) 22 2 2 12
  • 29. TFBertLayer TFBertSelfOutput 29 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayerTFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput hidden_status attention_mask input (n_seq, dim) (n_seq,) TFBertSelfAttention TFBertSelfOutput Dense Layer NormalizationDropout TFBertIntermediate Dense gelu TFBertOutput Dense Dropout Layer Normalization+ hidden_status (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) D A A =A
  • 30. A z c lnpo cbvs 30 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayerTFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput hidden_status attention_mask multi head multi head multi head Dense [query (Q)] Dense [ Key (K)] Dense [ Value (V)] input = A u dcb - [mf g q ehi ] Qc[_a A DNMMb u[- vs N b ] tS (n_seq, dim) D = A e e AA A e Tb (n_seq, dim) (n_head, n_seq, dim/n_head) (n_head, n_seq, n_seq) (n_seq,) softmax Dropout hidden_status x _ [ A c cb attention_mask attention_mask attention_mask attention_mask (n_head, n_seq, n_seq) (n_head, n_seq, n_seq) (n_head, n_seq, dim/n_head) Reshape (n_seq, dim) x _ b Q vseQQ ] A A ( ) NtS M cbu AA A e AA A K [ + sMc KV A e r
  • 31. 31 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] max_position_embeddings=512 hidden_size=768 1 2 = 1 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] dim/n_head =64 dim/n_head =64 dim/n_head =64 n_head = 12 multi head max_position_embeddings=512
  • 32. [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] ( )( 32 max_position_embeddings=512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] max_position_embeddings = 512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] Q lh K lh i d !" !# !$ !% !&## '" '# '$ !( !& '% '&## !( '$ i ) ( e D D ) ( am Key Query lh ( n D b )( D D g
  • 35. 35 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova https://arxiv.org/abs/1810.04805 Transformers https://huggingface.co/transformers 3rd party pre-trained https://huggingface.co/models
  • 36. 36