本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
Anomaly detection in deep learning can be used for fraud detection by finding abnormal patterns in data like bad credit card transactions or fake locations. Deep learning is well-suited for anomaly detection because it can learn complex patterns from large amounts of data, represent its own features that are robust to noise, and learn cross-domain patterns. Techniques for anomaly detection include unsupervised methods using autoencoder reconstruction error and supervised methods using RNNs to learn from labeled time series data and predict anomalies. Production systems for anomaly detection can use streaming data from sources like Kafka with neural networks consuming the streaming updates.
Strata Beijing 2017: Jumpy, a python interface for nd4jAdam Gibson
GPUs should complement, not replace, the Hadoop ecosystem for big data workloads. Replacing the entire big data stack would be too costly. The presenter believes GPUs are best suited for accelerated computation and a few other use cases to gain an initial foothold in the market. Existing Python interfaces to machine learning frameworks rely too heavily on network communication and serialization, introducing significant overhead. Nd4j and Jumpy provide alternatives that use direct C++ interfaces and pointers for lower latency between Python and deep learning operations on CPU and GPU.
Anomaly detection in deep learning (Updated) EnglishAdam Gibson
This document discusses anomaly detection in deep learning. It begins by defining what an anomaly is, such as abnormal patterns in data for fraud detection. It then discusses techniques for anomaly detection using unsupervised autoencoders and supervised recurrent neural networks. Finally, it provides an example reference architecture for an anomaly detection pipeline that ingests data from external sources using NiFi, sends it to Kafka, makes predictions using deep learning models, indexes predictions in Elasticsearch using Logstash, and renders the data in Kibana.