本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
This document discusses Mahout, an Apache project for machine learning algorithms like classification, clustering, and pattern mining. It describes using Mahout with Hadoop to build a Naive Bayes classifier on Wikipedia data to classify articles into categories like "game" and "sports". The process includes splitting Wikipedia XML, training the classifier on Hadoop, and testing it to generate a confusion matrix. Mahout can also integrate with other systems like HBase for real-time classification.
Introducton to Convolutional Nerural Network with TensorFlowEtsuji Nakai
Explaining basic mechanism of the Convolutional Neural Network with sample TesnsorFlow codes.
Sample codes: https://github.com/enakai00/cnn_introduction
Machine Learning Basics for Web Application DevelopersEtsuji Nakai
This document provides an overview of machine learning basics for web application developers. It discusses linear binary classifiers and logistic regression, how to measure model fitness with loss functions, and graphical understandings of linear classifiers. It then covers linear multiclass classifiers using softmax functions, image classification with neural networks, and ways to improve accuracy using convolutional neural networks. Finally, it discusses client applications that use pre-trained machine learning models through API services and examples of smile detection and cucumber classification.
Your first TensorFlow programming with JupyterEtsuji Nakai
This document provides an introduction and overview of TensorFlow and how to use it with Jupyter notebooks on Google Cloud Platform (GCP). It explains that TensorFlow is Google's open source library for machine learning and was launched in 2015. It is used for many production machine learning projects. Jupyter is introduced as an interactive web-based platform for data analysis that can also be used as a TensorFlow runtime environment. The document then provides details on the programming paradigm and model of TensorFlow, giving an example of using it for a least squares method problem to predict temperatures. It explains the key components of defining a model, loss function, and training algorithm to optimize variables in a session.
This document provides an introduction to deep Q-networks (DQN) for beginners. It explains that DQNs can be used to learn optimal actions in video games by collecting data on screen states, player actions, rewards, and next states without knowing the game's rules. The key idea is to approximate a "Q function" that represents the total expected rewards if optimal actions are taken from each state onward. A deep neural network is used as the candidate function, and its parameters are adjusted using an error function to satisfy the Q-learning equation. To collect the necessary state-action data, the game is played with a mix of random exploration and exploiting the current best actions from the Q-network.
29. 29
CodeZine Academy
オーバーフィッティングを意図的に抑える手法
- M = 9 の高次の項は絶対値が突出し
て大きくなっています。これは、
パラメータの過剰調整であり、
オーバーフィッティングの兆候と
考えられます。
Table of the coefficients
M=0 M=1 M=3 M=9
0 -0.02844 0.498661 -0.575134 -0.528572
1 NaN -1.054202 12.210765 151.946893
2 NaN NaN -29.944028 -3569.939743
3 NaN NaN 17.917824 34234.907567
4 NaN NaN NaN -169228.812728
5 NaN NaN NaN 478363.615824
6 NaN NaN NaN -804309.985246
7 NaN NaN NaN 795239.975974
8 NaN NaN NaN -426702.757987
9 NaN NaN NaN 95821.189286
■
N = 10 の例で実際に計算された係数 の値を見ると下表のようにな
ります。
■
そこで、適当な定数 λ を用いて、下記のように修正した誤差関数を最小にする
という条件で係数を決めると、次数が高くでもオーバーフィッティングが発生
しにくくなります。
- 最適な λ の値は、試行錯誤で決める必要があります。