This document summarizes a presentation on the bandit problem and algorithms to solve it. The presentation will:
1) Explain what the bandit problem is and provide a simple example.
2) Describe algorithms for solving the bandit problem, including epsilon-greedy and Thompson sampling.
3) Discuss how to apply bandit algorithms to problems that include contextual information.
機械学習の社会実装では、予測精度が高くても、機械学習がブラックボックであるために使うことができないということがよく起きます。
このスライドでは機械学習が不得意な予測結果の根拠を示すために考案されたLIMEの論文を解説します。
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.
機械学習の社会実装では、予測精度が高くても、機械学習がブラックボックであるために使うことができないということがよく起きます。
このスライドでは機械学習が不得意な予測結果の根拠を示すために考案されたLIMEの論文を解説します。
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.
Anomaly detection in deep learning can be used for fraud detection by finding abnormal patterns in data like bad credit card transactions or fake locations. Deep learning is well-suited for anomaly detection because it can learn complex patterns from large amounts of data, represent its own features that are robust to noise, and learn cross-domain patterns. Techniques for anomaly detection include unsupervised methods using autoencoder reconstruction error and supervised methods using RNNs to learn from labeled time series data and predict anomalies. Production systems for anomaly detection can use streaming data from sources like Kafka with neural networks consuming the streaming updates.
The document provides an introduction and background about the speaker, Kenichi Matsui. It discusses his career experience working for several large companies in software development, communications, and consulting. It then covers some of his current responsibilities related to data analysis and machine learning as a data scientist and group manager. Specific topics covered include an overview of data science skills and roles, machine learning techniques like classification and regression, and data analysis competitions.
Kaggle Google Quest Q&A Labeling 反省会 LT資料 47th place solutionKen'ichi Matsui
The document discusses different approaches that were tried for improving the performance of a model for a question answering competition, including pre-training on additional data, modifying the model architecture by changing layers or heads, and using different loss functions or features. Various models were experimented with, such as BERT, RoBERTa, ALBERT, and XLNet. However, concatenating the question and answer encodings did not work as expected.
Two sentences are tokenized and encoded by a BERT model. The first sentence describes two kids playing with a green crocodile float in a swimming pool. The second sentence describes two kids pushing an inflatable crocodile around in a pool. The tokenized sentences are passed through the BERT model, which outputs the encoded representations of the token sequences.
1) The document discusses univariate distribution relationships and provides code examples to generate and plot Bernoulli, binomial, and normal distributions from random variable samples.
2) The code generates random variable samples from Bernoulli, binomial, and normal distributions with varying parameter values and plots the empirical distributions alongside the theoretical distributions.
3) Confidence intervals for the normal distribution are also calculated and printed based on the sample size, probability, and theoretical normal distribution parameters.
This document contains a summary of 3 papers on deep residual networks and squeeze-and-excitation networks:
1. Kaiming He et al. "Deep Residual Learning for Image Recognition" which introduced residual networks for image recognition.
2. Andreas Veit et al. "Residual Networks Behave Like Ensembles of Relatively Shallow Networks" which analyzed how residual networks behave like ensembles.
3. Jie Hu et al. "Squeeze-and-Excitation Networks" which introduced squeeze-and-excitation blocks to help convolutional networks learn channel dependencies.
The document also references the PyTorch ResNet implementation and provides URLs to the first and third papers. It contains non-English
The document discusses generative models and their ability to generate realistic images, audio, and text which can be used to augment datasets. It outlines how generative models work by learning the underlying patterns and structures from large amounts of data to generate new examples that resemble the training data. The document also cautions that generative models are still narrow and more work needs to be done to build models that capture the full complexity and diversity of the real world.
The document describes various probability distributions that can arise from combining Bernoulli random variables. It shows how a binomial distribution emerges from summing Bernoulli random variables, and how Poisson, normal, chi-squared, exponential, gamma, and inverse gamma distributions can approximate the binomial as the number of Bernoulli trials increases. Code examples in R are provided to simulate sampling from these distributions and compare the simulated distributions to their theoretical probability density functions.
This document discusses precision and recall, which are metrics used to evaluate the performance of classification models. Precision measures the proportion of predicted positive instances that are actually positive, while recall measures the proportion of actual positive instances that are correctly predicted to be positive. The document also presents formulas for calculating precision, recall, and the harmonic mean of precision and recall.
The document appears to discuss Bayesian statistical modeling and inference. It includes definitions of terms like the correlation coefficient (ρ), bivariate normal distributions, and binomial distributions. It shows the setup of a Bayesian hierarchical model with multivariate normal outcomes and estimates of the model parameters, including the correlations (ρA and ρB) between two groups of bivariate data.
- The document discusses random number generation and probability distributions. It presents methods for generating random numbers from Bernoulli, binomial, beta, and multinomial distributions using random bits generated from linear congruential generators.
- Graphical examples are shown comparing histograms of generated random samples to theoretical probability density functions. Code examples in R demonstrate how to simulate random number generation from various discrete distributions.
- The goal is to introduce different methods for random number generation from basic discrete distributions that are important for modeling random phenomena and Monte Carlo simulations.
8. 標本サンプル数 :
4.1.1 k近傍法: ラベルなしデータ
M
1
M
1
M
1
Data Set : D = {x(1)
, x(2)
, · · · , x(N)
}
↑このデータは「異常標本を含まない」or
「異常標本があっても圧倒的少数」
であるとする。
新たに取得したデータ: x0
N
特徴量数 : M
← 判定対象
45. 今日のハイライト:確かめてみる
(n)
1 (A) ⌘
X
i2N (n)
d2
A(x(n)
, x(i)
)
M
1
1
M
M
M
行列Aで微分すると
=
X
i2N (n)
(x(n)
x(i)
)T
A(x(n)
x(i)
)
=
X
i2N (n)
x(n i)T
Ax(n i)
k近傍のデータk近傍のデータ
=
X
i2N (n)
MX
k=1
MX
l=1
aklx
(n i)
k x
(n i)
l
対称!
@
(n)
1 (A)
@akl
=
X
i2N (n)
x
(n i)
k x
(n i)
l =
X
i2N (n)
(x
(n)
k x
(i)
k )(x
(n)
l x
(i)
l )
46. X
i2N (n)
(x
(i)
k x
(n)
k )(x
(i)
l x
(n)
l )
なんとなく、共分散っぽいものに
見えてくる!
中心 中心
k近傍のデータ k近傍のデータ
今日のハイライト:確かめてみる
47. 今日のハイライト:確かめてみる
(A) ⌘
1
N
NX
n=1
h
(n)
1 (A) +
(n)
2 (A)
i
=
1
N
NX
n=1
(n)
1 (A) +
1
N
NX
n=1
(n)
2 (A)
目的の最小化関数を定義し直す
この時、最初の項の行列の微分を考えるとこのようになる
= 1(A) + 2(A)
@ 1(A)
@akl
=
1
N
NX
n=1
X
i2N (n)
(x
(i)
k x
(n)
k )(x
(i)
l x
(n)
l )
48. C(i,j)
⌘ (ei ej)(ei ej)T
@ 1(A)
@akl
=
1
N
NX
n=1
X
i2N (n)
(x
(i)
k x
(n)
k )(x
(i)
l x
(n)
l )
@ (A)
@A
=
1
N
XCXT
今日のハイライト:確かめてみる
49. (n)
2 (A) ⌘
X
j2N (n)
NX
l=1
I[y(l)
6= y(n)
]
h
1 + d2
A(x(n)
, x(j)
) d2
A(x(n)
, x(l)
)
i
+
行列Aで微分すると
@
(n)
2 (A)
@apq
=
X
j2N (n)
NX
l=1
I[y(l)
6= y(n)
]
d2
A(x(n)
, x(j)
)
@apq
@d2
A(x(n)
, x(l)
)
@apq +
=
X
j2N(n)
NX
l=1
I[y(l)
6= y(n)
]
h
(x(n)
p x(j)
p )(x(n)
q x(j)
q ) (x(n)
p x(l)
p )(x(n)
q x(l)
q )
i
+
同一ラベルのk近傍が対象 異ラベルの全てが対象
同 異 同 同 異 異 同 異
k
n
k
k
l系
j系
今日のハイライト:確かめてみる
50. Aの固有値分解
A U[ ]+UT
式(4.12), (4.13)により行列Aが更新されたら
は負の固有値を0で置き換える
ことを意味している
[ ]+
主成分分析で次元削減をしているのと
似たようなイメージ
A = U UT
のように固有値分解を行い、下記でAを更新する
52. アルゴリズム
A A ⌘
@ (A)
@A
… 最小化の更新式(A)
【反復】
A U[ ]+UT
A = U UT … 固有値計算
… 負の固有値除外
下記を実行して都度収束判定を行い、収束するまで
繰り返し実行する。収束したらその時の行列 を出力する。
ステップ幅ηは毎回値を更新する
A⇤