Modeling missing data in distant supervision for information extraction (Ritt...Naoaki Okazaki
This document summarizes the MultiR model for distant supervision relation extraction. MultiR introduces latent variables to indicate the relation expressed by each sentence and handles missing data by relaxing hard constraints from previous models. It allows an entity pair to have multiple relations and incorporates the tendency that knowledge bases include popular entities and relations. The model is trained using an algorithm similar to perceptron and inference involves finding the highest weight assignment of relations consistent with the knowledge base.
Learning to automatically solve algebra word problemsNaoaki Okazaki
Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina Barzilay.
ACL-2014, pages 271–281.
(presented by Naoaki Okazaki at the paper reading organized by Preferred Infrastructure)
3. 単語列からラベル: ̂𝑦𝑦 = argmax
𝑦𝑦∈𝑌𝑌
𝑃𝑃(𝑦𝑦|𝒙𝒙)
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 3
The movie is the best I’ve ever seen!
The movie is coming soon on cinemas.
This movie is rubbish!!!
• モデル:ナイーブ・ベイズ,パーセプトロン,
ロジスティック回帰,サポート・ベクトル・マシン
𝒙𝒙: 単語列 𝑃𝑃(𝑦𝑦|𝒙𝒙) �𝑦𝑦
4. 単語列から系列ラベル: �𝒚𝒚 = argmax
𝒚𝒚∈𝑌𝑌 𝑚𝑚
𝑃𝑃(𝒚𝒚|𝒙𝒙)
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 4
• モデル:隠れマルコフモデル,条件付き確率場,符号・復号
• 探索法:点予測,動的計画法,ビーム探索,…
In March 2005, the New York Times acquired About, Inc.
IN NNP CD DT NNP NNP NNP VBD NNP NNP
O B-NP I-NP B-NP I-NP I-NP I-NP B-VP B-NP B-NP
2005年 3月 , ニューヨーク・タイムズ は About 社 を 買収 し た .
(品詞)
(句)
(翻訳)
(入力)
I heard Google and Yahoo were among the other bidders.(対話)
7. 言語処理におけるDNNの進展
• 単語の分散表現
• エンコーダ・デコーダ
• 分散表現の合成
• アテンション
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 7
very
good
movie
very
good
movie
very good movie
very
good
movie
とても
良い
映画
very
good
movie
8. 単語の分散表現の学習:
Skip-gram with negative sampling (Mikolov+ 13)
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 8
draughtofferpubs beer, cider, and wine
last
use
place
people
make
city
full
know
build
time
group
have
new
game
rather
age
show
take
take
team
season
say
個
の
単
語
を
ユ
ニ
グ
ラ
ム
分
布
か
ら
サ
ン
プ
リ
ン
グ
し
,
こ
れ
ら
が
予
測
さ
れ
な
い
よ
う
に
更
新
(
負
例
)
個
の
文
脈
語
を
予
測
す
る
よ
う
に
更
新
同じ単語がサン
プルされること
もあり得る
単語ベクトル𝒗𝒗𝑤𝑤 (𝑑𝑑次元)
予測ベクトル�𝒗𝒗𝑐𝑐 (𝑑𝑑次元)
: 内積 → +∞ へ
: 内積 → −∞ へ
ベクトルの更新方針
コーパス
(ℎ = 2, 𝑘𝑘 = 1の場合)
16. 関係知識 (Nickel+ 16)
• (subject, predicate, object)の三つ組の集合
Leonard Nimoy was an actor who played the character Spock
in the science-fiction movie Star Trek
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 16
Subject (s) Predicate (r) Object (t)
Leonard_Nimoy profession Actor
Leonard_Nimoy starredIn Star_Trek
Leonard_Nimoy played Spock
Spock characterIn Star_Trek
StarTrek genre Science_Fiction
17. 関係知識の因子分解(分散表現化)
• RESCAL (Nickel+ 11)
• score 𝑠𝑠, 𝑟𝑟, 𝑡𝑡 = 𝑥𝑥𝑠𝑠
T 𝑊𝑊𝑟𝑟 𝑥𝑥𝑡𝑡
• 𝑥𝑥𝑠𝑠 ∈ ℝ𝑑𝑑
, 𝑥𝑥𝑡𝑡 ∈ ℝ𝑑𝑑
, 𝑊𝑊𝑟𝑟 ∈ ℝ𝑑𝑑×𝑑𝑑
• TransE (Bordes+ 13)
• score 𝑠𝑠, 𝑟𝑟, 𝑡𝑡 = − 𝑥𝑥𝑠𝑠 + 𝑤𝑤𝑟𝑟 − 𝑥𝑥𝑡𝑡 2
2
• 𝑥𝑥𝑠𝑠 ∈ ℝ𝑑𝑑
, 𝑤𝑤𝑟𝑟 ∈ ℝ𝑑𝑑
, 𝑥𝑥𝑡𝑡 ∈ ℝ𝑑𝑑
• max-margin損失関数を最小化する場合
𝐽𝐽 = �
(𝑠𝑠,𝑟𝑟,𝑡𝑡)∈𝐷𝐷
max 0,1 − score 𝑠𝑠, 𝑟𝑟, 𝑡𝑡 + score 𝑠𝑠∗, 𝑟𝑟, 𝑡𝑡∗
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 17
𝐷𝐷に存在しない三つ組(負例)←知識ベースの三つ組集合
Japan
Tokyo
capital capital
UK
London
Japan capital Tokyo
25. 動的分散表現による読解 (Kobayashi+ 16)
Once X1 was the U.S. president. X1 faced criticism for affairs.
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 25
X1の表現1
Later X1 was divorced with the wife X2.
質問: [?] was the wife of the president.
X1の表現2
X1の表現3 X2の表現1
双方向LSTMでXのベク
トルをエンコード(先頭
と末尾の単語のベクトル
も結合して用いる)
[?]の表現
X1の表現 X2の表現
アテンションでXの異なる
文脈のベクトルを統合
内積の大きい方を解答
するようにモデル化
Max-poolingによる初期化
35. 参考文献
• D Bahdanau, K Cho, Y Bengio: Neural Machine Translation by Jointly Learning to Align and Translate, in ICLR
(2015)
• Y Bengio, R Ducharme, P Vincent, C Janvin: A Neural Probabilistic Language Model, Journal of Machine Learning
Research, Vol. 3, pp. 1137–1155 (2003)
• A Bordes, N Usunier, A Garcia-Duran, J Weston, O Yakhnenko: Translating Embeddings for Modeling Multi-
relational Data. in Proc. of NIPS, pp. 2787–2795 (2013)
• D Chen, J Bolton, C D Manning: A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task, in
Proc. ACL (to apepar) (2016).
• K Cho, van B Merrienboer, C Gulcehre, D Bahdanau, F Bougares, H Schwenk, Y Bengio: Learning Phrase
Representations using RNN Encoder–Decoder for Statistical Machine Translation, in Proc. of EMNLP, pp. 1724–
1734 (2014)
• W W Cohen: TensorLog: A Differentiable Deductive Database, CoRR, Vol. abs/1605.06523 (2016)
• R Collobert, J Weston: A Unified Architecture for Natural Language Processing: Deep Neural Networks with
Multitask Learning, in Proc of ICML, pp. 160–167 (2008)
• A Graves: Generating Sequences With Recurrent Neural Networks, CoRR, Vol. abs/1308.0850 (2013)
• M Lee, X He, W-T Yih, J Gao, L Deng, P Smolensky: Reasoning in Vector Space: An Exploratory Study of Question
Answering, in Proc. of ICLR (2016)
• M-T Luong, H Pham, C D Manning: Effective Approaches to Attention-based Neural Machine Translation, in Proc.
of EMNLP, pp. 1412-1421 (2015)
• K Guu, J Miller, P Liang. Traversing Knowledge Graphs in Vector Space, in Proc. of EMNLP, pp 318-327 (2015)
• T Mikolov, I Sutskever, K Chen, G S Corrado, J Dean: Distributed Representations ofWords and Phrases and their
Compositionality, in Proc. of NIPS, pp. 3111–3119 (2013)
• K M Hermann, T Kocisky, E Grefenstette, L Espeholt, W Kay, M Suleyman, P Blunsom. Teaching machines to read
and comprehend, in Proc. of NIPS, pp. 1684-1692 (2015)
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 35
36. • F Hill, A Bordes, S Chopra, J Weston: The Goldilocks Principle: Reading Children's Books with Explicit Memory
Representations, in Proc. of ICLR (2016)
• S Kobayashi, R Tian, N Okazaki, K Inui. Dynamic Entity Representation with Max-pooling Improves Machine
Reading, in Proc. of NAACL-HLT, pp. 850-855 (2016)
• A Kumar, P Ondruska, M Iyyer, J Bradbury, I Gulrajani, V Zhong, R Paulus, R Socher: Ask Me Anything: Dynamic
Memory Networks for Natural Language Processing, in Proc. of ICML (2016)
• M Nickel, V Tresp, H-P Kriegel. A Three-Way Model for Collective Learning on Multi-Relational Data. in Proc. of
ICML, pp. 809–816 (2011)M Nickel, K Murphy, V Tresp, E Gabrilovich. A Review of Relational Machine Learning for
Knowledge Graphs. Proceedings of the IEEE, 104(1):11–33 (2016)
• P Rajpurkar, J Zhang, K Lopyrev, P Liang: SQuAD: 100,000+ Questions for Machine Comprehension of Text, CoRR,
Vol. abs/1606.05250, (2016)
• D Paperno, G Kruszewski, A Lazaridou, Q Pham, R Bernardi, S Pezzelle, M Baroni, G Boleda and R Fernandez. The
LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context, in Proc. of ACL (2016)
• S Riedel, L Yao, A McCallum: Latent Relation Representations for Universal Schemas, in Proc. of ICLR (2013)
• P Smolensky: Tensor product variable binding and the representation of symbolic structures in connectionist
• systems. Artificial Intelligence, 46(1-2), (1990)
• S Sukhbaatar, A Szlam, J Weston, R Fergus: End-to-End Memory Networks, in Proc of NIPS (2015)
• I Sutskever, J Martens, G Hinton: Generating Text with Recurrent Neural Networks, in Proc. of ICML, pp. 1017–1024
(2011)
• I Sutskever, O Vinyals, Q V Le: Sequence to Sequence Learning with Neural Networks, in Proc. of NIPS, pp. 3104–
3112 (2014)
• S Takase, N Okazaki, K Inui: Composing Distributed Representations of Relational Patterns. in Proc. ACL (2016).
• K Toutanova, D Chen, P Pantel, H Poon, P Choudhury, M Gamon: Representing Text for Joint Embedding of Text
and Knowledge Bases, in Proc. of EMNLP, pp. 1499-1509 (2015)
• J Weston, A Bordes, S Chopra, A M Rush, B van Merrienboer, A Joulin, T Mikolov: Towards AI-Complete Question
Answering: A Set of Prerequisite Toy Tasks, in Proc. of ICLR (2016)
2016-06-30 深層ニューラルネットワークによる知識の自動獲得・推論 36