本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
Weighting of acoustic cues shifts to frication duration in identification of ...Keiichi Yasu
K. Yasu, T. Arai, K. Kobayashi and M. Shindo, “Weighting of acoustic cues shifts to frication duration in identification of fricatives/affricates when auditory properties are degraded due to aging,” In Proc. of the Interspeech, 3152−3156, Lyon, 2013.
Preferred Networks is a Japanese AI startup founded in 2014 that develops deep learning technologies. They presented at CEATEC JAPAN 2018 on their research using convolutional neural networks for computer vision tasks like object detection. They discussed techniques like residual learning and how they have achieved state-of-the-art results on datasets like COCO by training networks on large amounts of data using hundreds of GPUs.
Preferred Networks was founded in 2008 and has focused on deep learning research, developing the Chainer and CuPy frameworks. It has applied its technologies to areas including computer vision, natural language processing, and robotics. The company aims to build AI that is helpful, harmless, and honest through techniques like constitutional AI that help ensure systems behave ethically and avoid potential issues like bias, privacy concerns, and loss of control.
Preferred Networks was founded in 2008 and has developed technologies such as Chainer and CuPy. It focuses on neural networks, natural language processing, computer vision, and GPU computing. The company aims to build general-purpose AI through machine learning and has over 500 employees located in Tokyo and San Francisco.
This document discusses Preferred Networks' open source activities over the past year. It notes that Preferred Networks published 10 blog posts and tech talks on open source topics and uploaded 3 videos to their Youtube channel. It also mentions growing their open source community to over 120 members and contributors across 3 major open source projects. The document concludes by reaffirming Preferred Networks' commitment to open source software, blogging, and tech talks going forward.
1. This document discusses the history and recent developments in natural language processing and deep learning. It covers seminal NLP papers from the 1990s through 2000s and the rise of neural network approaches for NLP from 2003 onward.
2. Recent years have seen increased research and investment in deep learning, with many large companies establishing AI labs in 2012-2014 to focus on neural network techniques.
3. The document outlines some popular deep learning architectures for NLP tasks, including neural language models, word2vec, sequence-to-sequence learning, and memory networks. It also introduces the Chainer deep learning framework for Python.
1. The document discusses knowledge representation and deep learning techniques for knowledge graphs, including embedding models like TransE, TransH, and neural network models.
2. It provides an overview of methods for tasks like link prediction, question answering, and language modeling using recurrent neural networks and memory networks.
3. The document references several papers on knowledge graph embedding models and their applications to natural language processing tasks.
This document provides an overview of preferred natural language processing infrastructure and techniques. It discusses recurrent neural networks, statistical machine translation tools like GIZA++ and Moses, voice recognition systems from NICT and NTT, topic modeling using latent Dirichlet allocation, dependency parsing with minimum spanning trees, and recursive neural networks for natural language tasks. References are provided for several papers on these methods.
1. The document discusses the history and recent developments in natural language processing and deep learning. It provides an overview of seminal NLP papers from the 1990s to 2010s and deep learning architectures from 2003 to present.
2. Key deep learning models discussed include neural language models, word2vec, convolutional neural networks, and LSTMs. The document also notes the increasing interest and research in deep learning starting in 2012 by tech companies like Google, Facebook and Baidu.
3. Application examples mentioned include search engines, conversational agents, social media and news summarization tools.
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...Yuya Unno
1. The document presents the Multi Sense Skip-gram (MSSG) model for learning multiple embeddings per word in vector space.
2. MSSG assigns a separate embedding to each sense of a word using a context vector. It extends the Skip-gram model by learning sense-specific embeddings.
3. The Non-Parametric MSSG (NP-MSSG) model extends MSSG by using a non-parametric approach to learn the context vectors instead of fixed vectors, allowing an unbounded number of senses per word.
Guidance for beginners and experts on how to set up a Windows driver developm...Atomu Hidaka
This explains how to build a Windows driver development environment that can be used immediately by beginners and experts alike. The author, who has extensive experience developing various Windows drivers, shows the latest and simplest ways to use Visual Studio and WDK.
English follows Japanese.
筑波技術大学アレクサスキル開発チームがJAWS のユーザグループで登壇した内容です.視覚障害者の開発についても少しだけ触れています.
This is a presentation given by the Tsukuba University of Technology Alexa Skills Development Team at a JAWS user group. It also touches briefly on development for visually impaired people.
16. 分布仮説 (Distributional Hypothesis)
l 同じ⽂文脈で出現する単語は同じ意味を持つとい
うこと
l データから単語の意味を学習する話は、少なか
らずこの仮説が元になっている
16
The Distributional Hypothesis is that words
that occur in the same contexts tend to have
similar meanings (Harris, 1954). (ACL wikiより)
97. 参考⽂文献
l [Evert10] Stefan Evert.
Distributional Semantic Models. NAACL 2010 Tutorial.
l [Mikolov+13a] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey
Dean.
Efficient Estimation of Word Representations in Vector Space.
CoRR, 2013.
l [Morin+05] Frederic Morin, Yoshua Bengio.
Hierarchical Probabilistic Neural Network Language Model.
AISTATS, 2005.
l [Mikolov+13c] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory
S. Corrado, Jeffrey Dean.
Distributed Representations of Words and Phrases and their
Compositionality. NIPS, 2013.
97
98. 参考⽂文献
l [Kim+13] Joo-Kyung Kim, Marie-Catherine de Marneffe.
Deriving adjectival scales from continuous space word
representations. EMNLP, 2013.
l [Mikolov+13d] Tomas Mikolov, Quoc V. Le, Ilya Sutskever.
Exploiting Similarities among Languages for Machine
Translation. CoRR, 2013.
l [Neelakantan+14] Arvind Neelakantan, Jeevan Shankar,
Alexandre Passos, Andrew McCallum.
Efficient Non-parametric Estimation of Multiple Embeddings
per Word in Vector Space. EMNLP, 2014.
l [Le+14] Quoc Le, Tomas Mikolov.
Distributed Representations of Sentences and Documents.
ICML, 2014.
98
99. 参考⽂文献
l [Hochreiter+97] Sepp Hochreiter, Jurgen Schmidhunber.
Long Short-Term Memory. Neural Computation 9(8), 1997.
l [Mikolov+10] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan
Honza Cernocky, Sanjeev Khudanpur.
Recurrent neural network based language model.
Interspeech, 2010.
l [Graves13] Alex Graves.
Generating Sequences With Recurrent Neural Networks. arXiv:
1308.0850, 2013.
l [Vinyal+15a] Oriol Vinyals, Alexander Toshev, Samy Bengio,
Dumitru Erhan.
Show and tell: A neural image caption generator. CVPR, 2015.
99
100. 参考⽂文献
l [Sutskever+14] Ilya Sutskever, Oriol Vinyals, Quoc V. Le.
Sequence to Sequence Learning with Neural Networks.
NIPS 2014.
l [Vinyals+15b] Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav
Petrov, Ilya Sutskever, Geoffrey Hinton.
Grammar as a foreign language.
ICLR 2015.
l [Vinyals+15c] Oriol Vinyals, Quoc Le.
A Neural Conversational Model. ICML 2015.
100
101. 参考⽂文献
l [Socher+11] Richard Socher, Cliff Lin, Andrew Y. Ng, Christopher D.
Manning.
Parsing Natural Scenes and Natural Language with Recursive Neural
Networks. ICML 2011
l [Socher+12] Richard Socher, Brody Huval, Christopher D. Manning,
Andrew Y. Ng.
Semantic Compositionality through Recursive Matrix-Vector Spaces.
EMNLP2012.
l [Socher+13] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang,
Chris Manning, Andrew Ng, Chris Potts.
Recursive Deep Models for Semantic Compositionality Over a
Sentiment Treebank. EMNLP 2013.
l [Tai+15] Kai Sheng Tai, Richard Socher, Christopher D. Manning.
Improved Semantic Representations From Tree-Structured Long
Short-Term Memory Networks. ACL 2015.
101
102. 参考⽂文献
l [Bordes+11] A. Bordes, J. Weston, R. Collobert, Y. Bengio.
Learning structured embeddings of knowledge bases. AAAI2011.
l [Bordes+13] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O.
Yakhnenko.
Translating Embeddings for Modeling Multi-relational Data. NIPS
2013.
l [Fan+14] M. Fan, Q. Shou, E. Chang, T. F. Zheng.
Transition-based Knowledge Graph Embedding with Relational
Mapping Properties. PACLIC 2014.
l [Wang+14] Z. Wang, J. Zhang, J. Feng, Z. Chen.
Knowledge Graph Embedding by Translating on Hyperplanes. AAAI
2014.
l [Bordes&Weston14] A. Bordes, J. Weston.
Embedding Methods for Natural Language Processing. EMNLP2014
tutorial.
102
103. 参考⽂文献
l [Peng+15a] Baolin Peng, Kaisheng Yao.
Recurrent Neural Networks with External Memory for Language
Understanding. arXiv:1506.00195, 2015.
l [Weston+15] J. Weston, S. Chopra, A. Bordes.
Memory Networks. ICLR 2015.
l [Sukhbaatar+15] Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob
Fergus.
End-To-End Memory Networks. arXiv:1503.08895, 2015.
l [Kumar+15] Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert
English, Brian Pierce, Peter Ondruska, Ishaan Gulrajani, Richard Socher.
Ask Me Anything: Dynamic Memory Networks for Natural Language
Processing. arXiv:1506.07285, 2015.
l [Peng+15b] Baolin Peng, Zhengdong Lu, Hang Li, Kam-Fai Wong.
Towards Neural Network-based Reasoning. arXiv:1508.05508, 2015.
l [Kiros+15] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel,
Antonio Torralba, Raquel Urtasun, Sanja Fidler.
Skip-Thought Vectors. arXiv:1506.06726, 2015.
103