本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
Tensor Decomposition and its ApplicationsKeisuke OTAKI
This document discusses tensor factorizations and decompositions and their applications in data mining. It introduces tensors as multi-dimensional arrays and covers 2nd order tensors (matrices) and 3rd order tensors. It describes how tensor decompositions like the Tucker model and CANDECOMP/PARAFAC (CP) model can be used to decompose tensors into core elements to interpret data. It also discusses singular value decomposition (SVD) as a way to decompose matrices and reduce dimensions while approximating the original matrix.
文献紹介:Multi-dataset Training of Transformers for Robust Action RecognitionToru Tamaki
Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen, "Multi-dataset Training of Transformers for Robust Action Recognition" NeurIPS2022
https://arxiv.org/abs/2209.12362
https://openreview.net/forum?id=aGFQDrNb-KO
Introduction of Chainer, a framework for neural networks, v1.11. Slides used for the student seminar on July 20, 2016, at Sugiyama-Sato lab in the Univ. of Tokyo.
This document outlines Chainer's development plans, including past releases from versions 1.0 to 1.5, apologies about installation complications, and new policies and release schedules going forward from version 1.6. Key points include making installation easier, adding backwards compatibility, releasing minor versions every 6 weeks and revision versions every 2 weeks, and potential future features like profiling, debugging tools, and isolating CuPy.
1) The document discusses the development history and planned features of Chainer, a deep learning framework.
2) It describes Chainer's transition to a new model structure using Links and Chains to define networks in a more modular and reusable way.
3) The new structure will allow for easier saving, loading, and composition of network definitions compared to the previous FunctionSet/Optimizer approach.
38. Reference
P. Bachman, D. Precup. Data Generation as Sequential Decision Making. NIPS, 2015.
J. Bornschein, Y. Bengio. Reweighted Wake-Sleep. ICLR, 2015.
P. Dayan, G. E. Hinton, R. M. Neal, R. S. Zemel. The Helmholtz Machine. Neural Computation 7, 889-904,
1995.
I. J. Goodfellow, J. P.-Abadie, M. Mirza, B. Xu, D. W.-Farley, S. Ozair, A. Courville, Y. Bengio. Generative
Adversarial Nets. NIPS, 2014.
K. Gregor, I. Danihelka, A. Graves, D. Wierstra. DRAW: A Recurrent Neural Networks For Image
Generation. ICML, 2015.
G. E. Hinton, P. Dayan, B. J. Frey, R. M. Neal. The wake-sleep algorithm for unsupervised neural
networks. Science, vol. 268, pp. 1158-1161, 1995.
D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. ICLR, 2014.
A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow. Adversarial Autoencoders. ArXiv:1511.05644, 2015.
V. Mnih, N. Hees, A. Graves, K. Kavukcuoglu. Recurrent Models of Visual Attention. NIPS, 2014.
A. Radford, L. Metz, S. Chintala. Unsupervised Representation Learning with Deep Convolutional
Generative Adversarial Networks. ICLR, 2016.
D. J. Rezende, S. Mohamed, D. Wierstra. Stochastic Backpropagation and Approximate Inference in
Deep Generative Models. ICML, 2014.
38