This document discusses neural networks and deep learning concepts such as artificial neurons, edges, weights, biases, activation functions, backpropagation, optimization algorithms like stochastic gradient descent, and neural network architectures like convolutional neural networks. It provides examples of neural network calculations and discusses tasks like image classification using datasets such as ImageNet and CIFAR-10.
- The document discusses linear regression models and methods for estimating coefficients, including ordinary least squares and regularization methods like ridge regression and lasso regression.
- It explains how lasso regression, unlike ordinary least squares and ridge regression, has the property of driving some of the coefficient estimates exactly to zero, allowing for variable selection.
- An example using crime rate data shows how lasso regression can select a more parsimonious model than other methods by setting some coefficients to zero.
- The document discusses linear regression models and methods for estimating coefficients, including ordinary least squares and regularization methods like ridge regression and lasso regression.
- It explains how lasso regression, unlike ordinary least squares and ridge regression, has the property of driving some of the coefficient estimates exactly to zero, allowing for variable selection.
- An example using crime rate data shows how lasso regression can select a more parsimonious model than other methods by setting some coefficients to zero.
KDD2016勉強会 https://atnd.org/events/80771
論文:“Why Should I Trust You?”Explaining the Predictions of Any Classifier
著者:M. T. Ribeiro and S. Singh and C. Guestrin
論文リンク: http://www.kdd.org/kdd2016/subtopic/view/why-should-i-trust-you-explaining-the-predictions-of-any-classifier
PFN福田圭祐による東大大学院「融合情報学特別講義Ⅲ」(2022年10月19日)の講義資料です。
・Introduction to Preferred Networks
・Our developments to date
・Our research & platform
・Simulation ✕ AI
39. Passive Aggressive (続)
wi+1 := wi + y l(x, y, w)/(|x|2 + 1/C) x
l PAの最適化問題は閉じた解を持ち、次のように更新可能
l E
=
1
l α=
L(x,
y,
w)
/
(|x|2
+
1/C)
l A
=
I
l α∝L(x, y, w)であり、誤った割合に比例した更新幅を使う
更更新式
wi+1 := wi + αAx
63. 問題:分散とオンラインの融合は困難
l オンライン学習は頻繁に更更新する
l オンライン学習をそのまま分散化した場合、モデルの同期コストが
⾮非常に⼤大きくなってしまう
63
バッチ学習
オンライン学習
Learn
Model
Update
Time
Learn
Model
update
Learn
Model
Update
Learn
Model
update
Learn
Model
update
Learn
Model
update
モデル更更新時で
同期をとり
並列列化は容易易
更更新が頻繁なので
並列列化は困難
75. UPDATE
l データは任意のサーバーに送られる
l データに基づきローカルモデルがアップデートされる
l データは共有しない
75
Local
model
1
Local
model
2
Initial
model
Initial
model
Distributed
randomly
or consistently
76. MIX
l 各サーバーはローカルモデルの差分を送る
l モデルの差分はマージされ、再度度配布される
l モデルの差分はモデル⾃自⾝身と⽐比べ⾮非常に⼩小さく転送コストは⼩小さい
76
Local
model
1
Local
model
2
Mixed
model
Mixed
model
Initial
model
Initial
model
=
=
Model
diff
1
Model
diff
2
Initial
model
Initial
model
-
-
Model
diff
1
Model
diff
2
Merged
diff
Merged
diff
Merged
diff
+
+
=
=
=
+