1. This document discusses the history and recent developments in natural language processing and deep learning. It covers seminal NLP papers from the 1990s through 2000s and the rise of neural network approaches for NLP from 2003 onward.
2. Recent years have seen increased research and investment in deep learning, with many large companies establishing AI labs in 2012-2014 to focus on neural network techniques.
3. The document outlines some popular deep learning architectures for NLP tasks, including neural language models, word2vec, sequence-to-sequence learning, and memory networks. It also introduces the Chainer deep learning framework for Python.
38. 2.
! 2012/3: Google Hinton DNNresearch
! 2012/4: Baidu Institute of Deep Learning
! 2012/8, 10: Yahoo! IQ Engines LookFlow
! 2012/12: Facebook AI Lab LeCun
! 2014/1: Google DeepMind
! 2014/5: Andrew Ng Baidu
! 2014/8: IBM SyNAPSE
61. (1/5)
! [Brown+93] Peter F. Brown, Vincent J. Della Pietra, Stephen A.
Della Pietra, Robert L. Mercer.
The mathematics of statistical machine translation: parameter
estimation. Computational Linguistics Vol. 19 (2), 1993.
! [Berger+96] Adam L. Berger, Vincent J. Della Pietra, Stephen A.
Della Pietra.
A Maximum Entropy Approach to Natural Language
Processing. Computational Linguistics, Vol. 22 (1), 1996.
! [Lafferty+01] John Lafferty, Andrew McCallum, Fernando C. N.
Pereira.
Conditional Random Fields: Probabilistic Models for
Segmenting and Labeling Sequence Data. ICML2001.
62. (2/5)
! [Blei+03] David M. Blei, Andrew Y. Ng, Michael I. Jordan.
Latent Dirichlet Allocation. JMLR Vol. 3, 2003.
! [Teh06] Yee Whye Teh.
A Hierarchical Bayesian Language Model based on Pitman-Yor
Processes. ACL 2006.
! [Clarke+06] James Clarke, Mirella Lapata.
Constraint-Based Sentence Compression: An Integer
Programming Approach. COLING/ACL 2006.
! [Riedel+06] Sebastian Riedel, James Clarke.
Incremental Integer Linear Programming for Non-projective
Dependency Parsing. COLING/ACL 2006.
63. (3/5)
! [Koo+10] Terry Koo, Alexander M. Rush, Michael Collins, Tommi
Jaakkola, David Sontag.
Dual Decomposition for Parsing with Non-Projective Head
Automata. EMNLP 2010.
! [Rush+10] Alexander M. Rush, David Sontag, Michael Collins,
Tommi Jaakkola.
On Dual Decomposition and Linear Programming Relaxations
for Natural Language Processing. EMNLP 2010.
! [Bengio+03] Yoshua Bengio, Réjean Ducharme, Pascal Vincent,
Christian Jauvin.
A Neural Probabilistic Language Model. JMLR, 2003.
64. (4/5)
! [Mikolov+10] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan
"Honza" Cernocky, Sanjeev Khudanpur.
Recurrent neural network based language model.
Interspeech, 2010.
! [Mikolov+13] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean.
Efficient Estimation of Word Representations in Vector Space. CoRR,
2013.
! [Socher+12] Richard Socher, Brody Huval, Christopher D. Manning,
Andrew Y. Ng.
Semantic Compositionality through Recursive Matrix-Vector Spaces.
EMNLP2012.
! [Sutskever+14] I. Sutskever, O. Vinyals, Q. V. Le.
Sequence to Sequence Learning with Neural Networks.
NIPS 2014.
65. (5/5)
! [Vinyals+15] O. Vinyals, A. Toshev, S. Bengio, D.
Erhan.
Show and Tell: A Neural Image Caption Generator.
arXiv:1411.4555, 2014.
! [Weston+15] J. Weston, S. Chopra, A. Bordes.
Memory Networks.
ICLR 2015
! [Sukhbaatar+15] S. Sukhbaatar, A. Szlam, J. Weston,
R. Fergus.
End-To-End Memory Networks.
arXiv:1503.08895, 2015.