BERT入門

)(
3
C Te TC a
C RTs Ci C C
ü t t p s a s g C
• (/ 2) / H N Cs L
• s C C N
• Nv
• ( - N
• . N
• coRh C L (/ 2) /
• D V s m LirgS nd I GN C
dpa V E
A :

2015 32016 h c 1 1 P P b 2
( )
0.
9 2
. . 1
2

5
Kaggle 2019 11 125,564 627 (top 0.5%)
SIGNATE
ü 2
https://www.slideshare.net/matsukenbook/signate-108228406

( ( ) )( ( ) ( (
01
https://gluebenchmark.com/leaderboard

9
6 IB 5B MF
EB / MLPN C 4F DPFN F - BL F F S / 4- 5 EBQ N / MM kdi l o
EB C MA B FIB MBB - PM S o W xo
5F M N C 9BNB M E 7 M LEM NB / MLPN 597/ 2 - PM S ho r o
' BI F BR P FIF MF S B EI M 7B MN LB MI / MM ho o u# o w
8P M 8PBN F 7 FMN 887 2 - PM S ho r o
5P F643 5 EBA 5643 I - PM S
2ho contradiction, entailment, neutralo3 Xtrain
n ergenreo
5P F643 5FNI EBA 5643 II - PM S
2ho contradiction, entailment, neutralo3 Xtrain
n dm genreo
8PBN F 643 8643 - PM S
F FLBAF o n di u r o
9B D FTF D BR P 1 F IB 9 1 - PM S ho r o
# F DM A 643 643 - PM S
C u rk d e ab trpo r
fsu ir
0F D N F N 5 F - 5 EBQ N / MM
contradiction, entailment, neutralo3 Xtrain datamd GLUE
y W dg
o ho z w csr

11
1
ü / : : s : : SbF d
aF. /gfe lind
ü / : :
- B 6 - 6 - B -
. E 1 / 6: : .1/
gfed tRF mkn R : : N
ü 2 B 6 urdk f lind
ü 2 B N B T o ed
• 4 B (
• (
x
ü / B: : 6: B: Bd M 4 66:
) 2 L b p P Flin r cb

- 1 2 1 1
13
0 B 4: : 8 C 45 .: BC 45
h BB 4 F: 8 45 ) ) (
0 B 4: : 8 .: BC Wo T dbe P unPR0
B 4: : 8To P vyo wtWPR
ü dbe PTo wt ( 1 2 21
( 1 ( ( )1 1 1 1
1 B 1 2 ) iL e
E :8 B
ü a : : :4 (sx C
)sx
0 B 4: : 8To PRE :8 B vyo P rn
-12 e gW k /4 p lk
W n wt
1 2
N E MF L

14
text
1
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
text two kids are playing in a swimming pool with a green colored crocodile float .
ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012
text
2
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12
text two kids push an in ##fl ##atable crocodile around in a pool .
ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012
4 9 : :olpi uT t ir T ?:B :
r ShW Pz g
: - B 7 : . B 7 :
B 7 : ea g P g
:# B:9_3 ?:B : ir 4 9 : :i uT t r T ?:B :
B5 9colp nkm g s :# B:9 ?:B : _ (
x = - = B : B : B5 : ?:B : = /= = = . ?:B :
_ : 232#0 _ B9: ++df
b t _)

15
['[CLS]', 'two', 'kids', 'are', 'playing', 'in', 'a', 'swimming', 'pool', 'with', 'a',
'green', 'colored', 'crocodile', 'float', '.', '[SEP]', 'two', 'kids', 'push', 'an',
'in', '##fl', '##atable', 'crocodile', 'around', 'in', 'a', 'pool', '.', '[SEP]']
text
1
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
text two kids are playing in a swimming pool with a green colored crocodile float .
ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012
text
2
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12
text two kids push an in ##fl ##atable
crocod
ile
aroun
d
in a pool .
ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012

1 1: 1
16
[CLS] two kids are playing kids[SEP] two kidstwo [SEP]
( o % 8 ) o - n s
8 o EA B E n s o
n s l r 4 5t c 5 t
0 11 20 3 , 11 5 e p 5
playing two two
[CLS] two kids are [MASK] kids[SEP] dog kids[MASK] [SEP]
pred2 pred3 pred4
kids
pred1
cross entropy
loss
cross entropy
loss
cross entropy
loss
cross entropy
loss

2 - 2 2 2
17
N I 1 2
IsNext
pred1
cross entropy
loss
e
( ) ) )
[CLS] two kids are playing kids[SEP] two kidstwo [SEP]
/

18
#
unk_token = “[UNK]”, #
sep_token = “[SEP]”, #
pad_token = “[PAD]”, #
cls_token = “[CLS]”, #
mask_token = "[MASK]", # pre-training
https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L141-L145

20
/
- E B
https://huggingface.co/transformers/index.html

21
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text1 = "[CLS] Two kids are playing in a swimming pool with a green colored crocodile
float. [SEP]"
text2 = "Two kids push an inflatable crocodile around in a pool. [SEP]"
tokenized_text = tokenizer.tokenize(text1 + " " + text2)
print(tokenized_text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
pos_sep = tokenized_text.index(“[SEP]”)+1 # al o gi [SEP] 1st sentence
segments_ids = [0]*pos_sep + [1]*(len(indexed_tokens)-pos_sep)
tokens_tensor = tf.Variable([indexed_tokens])
segments_tensors = tf.Variable([segments_ids])
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
print(model.summary())
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
print(outputs)
r Pf BF E gd T - P
C - c C fm nk Fph
e R
-

22
https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L32-L52
vocab
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",
"bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",
"bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",
"bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",
…
"bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/vocab.txt",
"bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/vocab.txt",
}
}
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_tf_bert.py#L32-L52
weight
TF_BERT_PRETRAINED_MODEL_ARCHIVE_MAP = {
"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5",
"bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-tf_model.h5",
"bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-tf_model.h5",
"bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-tf_model.h5",
"bert-base-multilingual-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5",
"bert-base-multilingual-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-tf_model.h5",
…
"bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/tf_model.h5",
"bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/tf_model.h5",
}

24
L . -/ . 1 . --/ - 5 51 5 0 1 5 5 5 51 5 /- 5 / 5 # ##
vocab_size=30522,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
layer_norm_eps=1e-12,
:

25
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayer
TFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
: /1 //1 /. :. : : : :. : 1 / .
//1 /. C :. : C ::1.1 1 F B
: 21 /
C

26
TFBertMainLayer
TFBertPooler
TFBertEmbedding TFBertEncorder
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBertLayerTFBertLayer
TFBert Attention
input_ids
position_ids
token_type_ids
input_embeds
(n_seq,)
(n_seq,)
(n_seq,)
(n_seq, dim)
input
attention_mask
(n_seq,)
Dense
DD N
Extract
first seq
a e
i F h
(dim, ) (dim, )(n_seq, dim)
= bgF
(n_seq, dim)
pooled_output
sequence_output
(dim,)
(n_seq, dim)
N N
Dropout Dense
DD N
(dim, ) (n_class)
output
BERT c Dropout, Dense _B fTS
dC

27
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
input_ids
position_ids
token_type_ids
input_embeds
8 ab
pd c n
heT(
(n_seq,)
(n_seq,)
(n_seq,)
(n_seq, dim)
8 , , 8 , , 8 o
Weight
[word_embeddings]
Embedding
[position_embeddings]
Embedding
[token_type_embeddings]
gather
(vocab_size, dim)
(n_seq, dim)
(type_vocab_size, dim)
+
(n_seq, dim)
LayerNormalization Dropout hidden_status
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
max_position_embeddings=512
( 8 7
sq[ m
N l i
! " ] E
_f r
) 8 y /
/ n L
gx 8 ] E k
6 6 , t z
8
input
(n_seq, dim)(n_seq, dim)(n_seq, dim)

28
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
TFBertEncoder
hidden_status
attention_mask
input
(n_seq, dim)
(n_seq,)
TFBertLayer TFBertLayer TFBertLayer TFBertLayer
hidden_status
2
(n_seq, dim)
22 2 2 12

TFBertLayer
TFBertSelfOutput
29
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
hidden_status
attention_mask
input
(n_seq, dim)
(n_seq,) TFBertSelfAttention
TFBertSelfOutput
Dense Layer
NormalizationDropout
TFBertIntermediate
Dense gelu
TFBertOutput
Dense Dropout
Layer
Normalization+
hidden_status
(n_seq, dim)
(n_seq,
dim)
(n_seq,
dim) (n_seq, dim)
(n_seq,
dim)
(n_seq,
dim)
(n_seq,
dim)
(n_seq,
dim) (n_seq, dim)
D A
A
=A

A z c lnpo cbvs
30
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
hidden_status
attention_mask
multi head
multi head
multi head
Dense
[query (Q)]
Dense
[ Key (K)]
Dense
[ Value (V)]
input
= A u dcb
- [mf g
q ehi ]
Qc[_a A DNMMb
u[- vs N
b
]
tS
(n_seq, dim)
D = A e e AA A e Tb
(n_seq, dim)
(n_head, n_seq,
dim/n_head) (n_head, n_seq, n_seq)
(n_seq,)
softmax Dropout hidden_status
x _ [
A c
cb
attention_mask
attention_mask
attention_mask
attention_mask
(n_head, n_seq,
n_seq)
(n_head, n_seq,
n_seq)
(n_head, n_seq,
dim/n_head)
Reshape
(n_seq, dim)
x _ b
Q vseQQ ]
A A (
) NtS M cbu
AA A e AA A K
[
+
sMc KV A
e r

31
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
1 2 = 1
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
dim/n_head
=64
dim/n_head
=64
dim/n_head
=64
n_head = 12
multi head

[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
( )(
32
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
max_position_embeddings = 512
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
Q lh K lh i d
!"
!#
!$
!%
!&##
'" '# '$
!(
!&
'% '&##
!( '$ i
) ( e D D ) ( am
Key
Query
lh ( n D b )( D D g

( )(
33
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
softmax
m g D x w tl
Key
Query
QueryK “kids”
KeyKo s K
Softmax q
i1 b
q ehd K () f n
sK (K K n ra

( )(
34
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
Key
Query
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
!"
!#
!$
!%
!&##
!'
!&
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
[SEP]
[PAD]
[PAD]
Qweight h
D gd ( ) 1Q ( a ( ( )
hQ
V gd ( ( ) i ( ( ( Q be

35
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
https://arxiv.org/abs/1810.04805
Transformers
https://huggingface.co/transformers
3rd party pre-trained
https://huggingface.co/models

BERT入門

More Related Content

BERT入門