SlideShare a Scribd company logo
Presenter: Shane	
  (Seungwhan)	
  Moon
PhD	
  student
Language	
  Technologies	
  Institute,	
  School	
  of	
  Computer	
  Science
Carnegie	
  Mellon	
  University
3/2/2016
How	
  it	
  works
AlphaGo vs	
  European	
  Champion	
  (Fan	
  Hui 2-­‐Dan)
October	
  5	
  – 9,	
  2015
<Official	
  match>
-­‐ Time	
  limit:	
  1	
  hour
-­‐ AlphaGo Wins (5:0)
*
rank
AlphaGo vs	
  World	
  Champion	
  (Lee	
  Sedol 9-­‐Dan)
March	
  9	
  – 15,	
  2016
<Official	
  match>
-­‐ Time	
  limit:	
  2	
  hours
Venue:	
  Seoul,	
  Four	
  Seasons	
  Hotel
Image	
  Source: Josun	
  Times Jan	
  28th
2015
Lee	
  Sedol
Photo	
  source: Maeil	
  Economics 2013/04
wiki
Computer	
  Go	
  AI?
Computer	
  Go	
  AI – Definition
s (state)
d	
  =	
  1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
=
(e.g.	
  we	
  can	
  represent	
  the	
  board	
  into	
  a	
  matrix-­‐like	
  form)
*	
  The	
  actual	
  model	
  uses	
  other	
  features	
  than	
  board	
  positions	
  as	
  well
Computer	
  Go	
  AI	
  – Definition
s (state)
d	
  =	
  1 d	
  =	
  2
a (action)
Given	
  s,	
  pick	
  the	
  best	
  a
Computer	
  Go
Artificial	
  
Intelligence
s a s'
Computer	
  Go	
  AI – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
How	
  about	
  simulating	
  all	
  possible	
  board	
  positions?
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
… d	
  =	
  maxD
Process	
  the	
  simulation	
  until	
  the	
  game	
  ends,
then	
  report	
  win	
  /	
  lose	
  results
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
… d	
  =	
  maxD
Process	
  the	
  simulation	
  until	
  the	
  game	
  ends,
then	
  report	
  win	
  /	
  lose	
  results
e.g. it	
  wins	
  13	
  times	
  if	
  the	
  next	
  stone	
  gets	
  placed	
  here
37,839	
  times
431,320	
  times
Choose	
  the	
  “next	
  action	
  /	
  stone”
that	
  has	
  the	
  most	
  win-­‐counts	
  in	
  the	
  full-­‐scale	
  simulation
This	
  is	
  NOT	
  possible;	
  it	
  is	
  said	
  the	
  possible	
  configurations	
  of	
  the	
  board	
  exceeds	
  the	
  number	
   of	
  atoms	
  in	
  the	
  universe
Key: To	
  Reduce Search	
  Space
Reducing	
  Search	
  Space
1.	
  Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
… d	
  =	
  maxD
Win?
Loss?
IF	
  there	
  is	
  a	
  model	
  that	
  can	
  tell	
  you	
  that	
  these	
  moves
are	
  not	
  common	
  /	
  probable	
  (e.g.	
  by	
  experts,	
  etc.)	
  …
Reducing	
  Search	
  Space
1.	
  Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
… d	
  =	
  maxD
Win?
Loss?
Remove	
  these	
  from	
  search	
  candidates	
  in	
  advance (breadth	
  reduction)
Reducing	
  Search	
  Space
2.	
  Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
… d	
  =	
  maxD
Win?
Loss?
Instead	
  of	
  simulating	
  until	
  the	
  maximum	
  depth ..
Reducing	
  Search	
  Space
2.	
  Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
V	
  =	
  1
V	
  =	
  2
V	
  =	
  10
IF	
  there	
  is	
  a	
  function	
  that	
  can	
  measure:
V(s):	
  “board	
  evaluation	
  of	
  state	
  s”
Reducing	
  Search	
  Space
1. Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
2. Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
1.	
  Reducing	
  “action	
  candidates”
Learning:	
  P	
  (	
  next	
  action	
  |	
  current	
  state	
  )
=	
  P	
  (	
  a	
  |	
  s	
  )
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Current	
  State
Prediction	
  
Model
Next	
  State
s1 s2
s2 s3
s3 s4
Data:	
  Online	
  Go experts (5~9	
  dan)
160K games, 30M	
  board	
  positions
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
Current	
  Board Next	
  Board
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
Current	
  Board Next	
  Action
There	
  are	
  19	
  X	
  19	
  =	
  361
possible	
  actions
(with	
  different	
  probabilities)
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s af:	
  s à a
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  
Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s)
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0	
  	
  	
  	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0.2 0.1 0 0
0 0 0 0 0 0.4	
  0.2 0 0
0 0 0 0 0 0.1	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
p(a|s) aargmax
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  
Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s) p(a|s) aargmax
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Deep	
  Learning
(13	
  Layer	
  CNN)
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s)
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0	
  	
  	
  	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0.2 0.1 0 0
0 0 0 0 0 0.4	
  0.2 0 0
0 0 0 0 0 0.1	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
p(a|s) aargmax
Current	
  Board Next	
  Action
Convolutional	
  Neural	
  Network	
  (CNN)
CNN	
  is	
  a	
  powerful	
  model	
  for	
  image	
  recognition	
  tasks;	
  it	
  abstracts	
  out	
  the	
  input	
  image	
  through	
  convolution	
  layers
Image	
  source
Convolutional	
  Neural	
  Network	
  (CNN)
And	
  they	
  use	
  this	
  CNN	
  model	
  (similar	
  architecture)	
  to	
  evaluate	
  the	
  board	
  position;	
  which learns	
  “some”	
  spatial	
  invariance
Go: abstraction	
  is	
  the	
  key	
  to	
  win
CNN:	
  abstraction	
  is	
  its	
  forte
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Current	
  Board Next	
  Action
Training:
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
VS
Improving	
  by	
  playing	
  against	
  itself
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Board	
  position win/loss
Training:
Loss
z	
  =	
  -­‐1
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Training:
z	
  =	
  +1
Board	
  position win/loss
Win
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model
ver 1.1
Updated	
  Model
ver 1.3VS
Return:	
  board	
  positions, win/lose info
It	
  uses	
  the	
  same	
  topology	
  as	
  the	
  expert	
  moves	
  imitator	
  model,	
  and	
  just	
  uses	
  the	
  updated parameters
Older	
  models	
  vs.	
  newer	
  models
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1.3
Updated	
  Model	
  
ver 1.7VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1.5
Updated	
  Model	
  
ver 2.0VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 3204.1
Updated	
  Model	
  
ver 46235.2VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1,000,000VS
The	
  final	
  model	
  wins 80%	
  of	
  the time
when	
  playing	
  against	
  the	
  first	
  model
Expert	
  Moves	
  
Imitator	
  Model
2.	
  Board	
  Evaluation
2.	
  Board	
  Evaluation
Updated	
  Model
ver 1,000,000
Board	
  Position
Training:
Win	
  /	
  Loss
Win
(0~1)
Value	
  
Prediction	
  
Model
(Regression)
Adds	
  a regression	
  layer	
  to	
  the	
  model
Predicts	
  values	
  between	
  0~1
Close	
  to	
  1:	
  a	
  good	
  board	
  position
Close	
  to	
  0:	
  a	
  bad	
  board	
  position
Reducing	
  Search	
  Space
1. Reducing	
  “action	
  candidates”
(Breadth	
  Reduction)
2. Board	
  Evaluation (Depth	
  Reduction)
Policy	
  Network
Value	
  Network
Looking	
  ahead	
  (w/	
  Monte	
  Carlo	
  Search	
  Tree)
Action	
  Candidates	
  Reduction
(Policy	
  Network)
Board	
  Evaluation
(Value	
  Network)
(Rollout):	
  Faster	
  version	
  of	
  estimating	
  p(a|s)
à uses shallow	
  networks	
  (3	
  ms à 2µs)
Results
Elo rating	
  system
Performance	
  with	
  different	
  combinations	
  of	
  AlphaGo components
Takeaways
Use	
  the	
  networks	
  trained	
  for	
  a	
  certain	
  task	
  (with	
  different	
  loss	
  objectives)	
  for	
  several	
  other	
  tasks
Lee	
  Sedol 9-­‐dan vs	
  AlphaGo
Lee	
  Sedol 9-­‐dan vs	
  AlphaGo
Energy	
  Consumption
Lee	
  Sedol AlphaGo
-­‐ Recommended calories	
  for	
  a man per	
  day
: ~2,500 kCal
-­‐ Assumption: Lee consumes	
  the	
  entire	
  amount	
  of	
  
per-­‐day calories	
  in	
  this	
  one	
  game
2,500	
  kCal *	
  4,184	
  J/kCal
~=	
  10M	
  [J]
-­‐ Assumption: CPU:	
  ~100	
  W,	
  GPU:	
  ~300 W
-­‐ 1,202 CPUs, 176 GPUs
170,000	
  J/sec	
  *	
  5	
  hr *	
  3,600	
  sec/hr
~=	
  3,000M	
  [J]
A	
  very,	
  very	
  rough	
  calculation	
  ;)
AlphaGo is	
  estimated	
  to	
  be	
  around	
  ~5-­‐dan
=	
  multiple	
  machines European	
  champion
Taking	
  CPU	
  /	
  GPU resources	
  to	
  virtually	
  infinity?
But	
  Google	
  has	
  promised	
  not	
  to	
  use	
  more	
  CPU/GPUs
than	
  they	
  used	
  for	
  Fan	
  Hui	
  for	
  the	
  game	
  with	
  Lee
No	
  one	
  knows
how it	
  will	
  converge
AlphaGo learns	
  millions	
  of	
  Go	
  games	
  every	
  day
AlphaGo will	
  presumably	
  converge	
  to	
  some	
  point	
  eventually.
However,	
  in	
  the	
  Nature	
  paper	
  they	
  don’t	
  report	
  how	
  AlphaGo’s performance	
  improves
as	
  a	
  function	
  of	
  times	
  AlphaGo plays	
  against	
  itself	
  (self-­‐plays).
What	
  if	
  AlphaGo learns	
  Lee’s	
  game	
  strategy
Google	
  said	
  they	
  won’t	
  use	
  Lee’s	
  game	
  plays	
  as	
  AlphaGo’s training	
  data	
  
Even	
  if	
  it	
  does,	
  it	
  won’t	
  be	
  easy	
  to	
  modify	
  the	
  model	
  trained	
  over	
  millions	
  of
data	
  points	
  with	
  just	
  a	
  few	
  game	
  plays	
  with	
  Lee
(prone	
  to	
  over-­‐fitting,	
  etc.)
AlphaGo’s Weakness?
AlphaGo – How	
  It	
  Works
Presenter: Shane	
  (Seungwhan)	
  Moon
PhD	
  student
Language	
  Technologies	
  Institute,	
  School	
  of	
  Computer	
  Science
Carnegie	
  Mellon	
  University
me@shanemoon.com
3/2/2016
Reference
• Silver,	
  David,	
  et	
  al.	
  "Mastering	
  the	
  game	
  of	
  Go	
  with	
  deep	
  neural	
  
networks	
  and	
  tree	
  search." Nature 529.7587	
  (2016):	
  484-­‐489.

More Related Content

What's hot (20)

Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
Bill Liu
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
Shane (Seungwhan) Moon
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
Jooyoul Lee
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
Joonhyung Lee
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Peerasak C.
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine Learning
Jerry Lu
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
UMBC
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
Jie-Han Chen
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
Karel Ha
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Usman Qayyum
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
Omar Enayet
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
Ashray Bhandare
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
Manohar Mukku
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
 
AI, Machine Learning and Deep Learning - The Overview
AI, Machine Learning and Deep Learning - The OverviewAI, Machine Learning and Deep Learning - The Overview
AI, Machine Learning and Deep Learning - The Overview
Spotle.ai
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
Hoang Nguyen
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
Sri Ambati
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
Bill Liu
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
Shane (Seungwhan) Moon
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
Jooyoul Lee
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
Joonhyung Lee
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Peerasak C.
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine Learning
Jerry Lu
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
UMBC
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
Jie-Han Chen
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
Karel Ha
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Usman Qayyum
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
Omar Enayet
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
Ashray Bhandare
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
Manohar Mukku
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
 
AI, Machine Learning and Deep Learning - The Overview
AI, Machine Learning and Deep Learning - The OverviewAI, Machine Learning and Deep Learning - The Overview
AI, Machine Learning and Deep Learning - The Overview
Spotle.ai
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
Hoang Nguyen
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
Sri Ambati
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 

Similar to How AlphaGo Works (20)

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Ruairi de Frein
 
Gan seminar
Gan seminarGan seminar
Gan seminar
San Kim
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Magnify Analytic Solutions
 
Risking Everything with Akka Streams
Risking Everything with Akka StreamsRisking Everything with Akka Streams
Risking Everything with Akka Streams
johofer
 
Unit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxUnit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptx
DrYogeshDeshmukh1
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
Mohammad Shaker
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
Suneel Marthi
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
Lovelyn Rose
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_school
Jun Hu
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
SC5.io
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
Olivier Teytaud
 
Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
KyuYeolJung
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
Vasco Duarte
 
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
NHN FORWARD
 
Detailed notes on Artificial Intelligence planning sjm.ppt
Detailed notes on Artificial Intelligence planning sjm.pptDetailed notes on Artificial Intelligence planning sjm.ppt
Detailed notes on Artificial Intelligence planning sjm.ppt
PrateekPatidar13
 
OpenGL L02-Transformations
OpenGL L02-TransformationsOpenGL L02-Transformations
OpenGL L02-Transformations
Mohammad Shaker
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online Learning
Jui-Hsin (Larry) Lai
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
M Waleed Kadous
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
MLconf
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
Kiho Suh
 
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Ruairi de Frein
 
Gan seminar
Gan seminarGan seminar
Gan seminar
San Kim
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Magnify Analytic Solutions
 
Risking Everything with Akka Streams
Risking Everything with Akka StreamsRisking Everything with Akka Streams
Risking Everything with Akka Streams
johofer
 
Unit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxUnit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptx
DrYogeshDeshmukh1
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
Mohammad Shaker
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
Suneel Marthi
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
Lovelyn Rose
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_school
Jun Hu
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
SC5.io
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
Olivier Teytaud
 
Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
KyuYeolJung
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
Vasco Duarte
 
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
NHN FORWARD
 
Detailed notes on Artificial Intelligence planning sjm.ppt
Detailed notes on Artificial Intelligence planning sjm.pptDetailed notes on Artificial Intelligence planning sjm.ppt
Detailed notes on Artificial Intelligence planning sjm.ppt
PrateekPatidar13
 
OpenGL L02-Transformations
OpenGL L02-TransformationsOpenGL L02-Transformations
OpenGL L02-Transformations
Mohammad Shaker
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online Learning
Jui-Hsin (Larry) Lai
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
M Waleed Kadous
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
MLconf
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
Kiho Suh
 

Recently uploaded (20)

Caching for Performance Masterclass: Caching Strategies
Caching for Performance Masterclass: Caching StrategiesCaching for Performance Masterclass: Caching Strategies
Caching for Performance Masterclass: Caching Strategies
ScyllaDB
 
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
 
Caching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory DatastoreCaching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory Datastore
ScyllaDB
 
Dev Dives: Unlock the future of automation with UiPath Agent Builder
Dev Dives: Unlock the future of automation with UiPath Agent BuilderDev Dives: Unlock the future of automation with UiPath Agent Builder
Dev Dives: Unlock the future of automation with UiPath Agent Builder
UiPathCommunity
 
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great ProductGDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
James Anderson
 
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
 
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
 
UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2
DianaGray10
 
Understanding Traditional AI with Custom Vision & MuleSoft.pptx
Understanding Traditional AI with Custom Vision & MuleSoft.pptxUnderstanding Traditional AI with Custom Vision & MuleSoft.pptx
Understanding Traditional AI with Custom Vision & MuleSoft.pptx
shyamraj55
 
EaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial KeyEaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial Key
kherorpacca127
 
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
 
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
 
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
 
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Bedrock Data Automation (Preview): Simplifying Unstructured Data ProcessingBedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Zilliz
 
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
 
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
 
Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025
Anthony Dahanne
 
AMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes WebinarAMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes Webinar
ThousandEyes
 
Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3
Margaret Maynard-Reid
 
UiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and OpportunitiesUiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and Opportunities
DianaGray10
 
Caching for Performance Masterclass: Caching Strategies
Caching for Performance Masterclass: Caching StrategiesCaching for Performance Masterclass: Caching Strategies
Caching for Performance Masterclass: Caching Strategies
ScyllaDB
 
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
 
Caching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory DatastoreCaching for Performance Masterclass: The In-Memory Datastore
Caching for Performance Masterclass: The In-Memory Datastore
ScyllaDB
 
Dev Dives: Unlock the future of automation with UiPath Agent Builder
Dev Dives: Unlock the future of automation with UiPath Agent BuilderDev Dives: Unlock the future of automation with UiPath Agent Builder
Dev Dives: Unlock the future of automation with UiPath Agent Builder
UiPathCommunity
 
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great ProductGDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great Product
James Anderson
 
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
 
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
 
UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2
DianaGray10
 
Understanding Traditional AI with Custom Vision & MuleSoft.pptx
Understanding Traditional AI with Custom Vision & MuleSoft.pptxUnderstanding Traditional AI with Custom Vision & MuleSoft.pptx
Understanding Traditional AI with Custom Vision & MuleSoft.pptx
shyamraj55
 
EaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial KeyEaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial Key
kherorpacca127
 
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
 
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
 
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
 
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Bedrock Data Automation (Preview): Simplifying Unstructured Data ProcessingBedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Bedrock Data Automation (Preview): Simplifying Unstructured Data Processing
Zilliz
 
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
 
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
 
Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025
Anthony Dahanne
 
AMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes WebinarAMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes Webinar
ThousandEyes
 
Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3
Margaret Maynard-Reid
 
UiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and OpportunitiesUiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and Opportunities
DianaGray10
 

How AlphaGo Works

  • 1. Presenter: Shane  (Seungwhan)  Moon PhD  student Language  Technologies  Institute,  School  of  Computer  Science Carnegie  Mellon  University 3/2/2016 How  it  works
  • 2. AlphaGo vs  European  Champion  (Fan  Hui 2-­‐Dan) October  5  – 9,  2015 <Official  match> -­‐ Time  limit:  1  hour -­‐ AlphaGo Wins (5:0) * rank
  • 3. AlphaGo vs  World  Champion  (Lee  Sedol 9-­‐Dan) March  9  – 15,  2016 <Official  match> -­‐ Time  limit:  2  hours Venue:  Seoul,  Four  Seasons  Hotel Image  Source: Josun  Times Jan  28th 2015
  • 4. Lee  Sedol Photo  source: Maeil  Economics 2013/04 wiki
  • 6. Computer  Go  AI – Definition s (state) d  =  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = (e.g.  we  can  represent  the  board  into  a  matrix-­‐like  form) *  The  actual  model  uses  other  features  than  board  positions  as  well
  • 7. Computer  Go  AI  – Definition s (state) d  =  1 d  =  2 a (action) Given  s,  pick  the  best  a Computer  Go Artificial   Intelligence s a s'
  • 8. Computer  Go  AI – An  Implementation  Idea? d  =  1 d  =  2 … How  about  simulating  all  possible  board  positions?
  • 9. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … …
  • 10. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … … … d  =  maxD Process  the  simulation  until  the  game  ends, then  report  win  /  lose  results
  • 11. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … … … d  =  maxD Process  the  simulation  until  the  game  ends, then  report  win  /  lose  results e.g. it  wins  13  times  if  the  next  stone  gets  placed  here 37,839  times 431,320  times Choose  the  “next  action  /  stone” that  has  the  most  win-­‐counts  in  the  full-­‐scale  simulation
  • 12. This  is  NOT  possible;  it  is  said  the  possible  configurations  of  the  board  exceeds  the  number   of  atoms  in  the  universe
  • 13. Key: To  Reduce Search  Space
  • 14. Reducing  Search  Space 1.  Reducing  “action  candidates”  (Breadth  Reduction) d  =  1 d  =  2 … d  =  3 … … … … d  =  maxD Win? Loss? IF  there  is  a  model  that  can  tell  you  that  these  moves are  not  common  /  probable  (e.g.  by  experts,  etc.)  …
  • 15. Reducing  Search  Space 1.  Reducing  “action  candidates”  (Breadth  Reduction) d  =  1 d  =  2 … d  =  3 … … d  =  maxD Win? Loss? Remove  these  from  search  candidates  in  advance (breadth  reduction)
  • 16. Reducing  Search  Space 2.  Position  evaluation  ahead  of  time  (Depth  Reduction) d  =  1 d  =  2 … d  =  3 … … d  =  maxD Win? Loss? Instead  of  simulating  until  the  maximum  depth ..
  • 17. Reducing  Search  Space 2.  Position  evaluation  ahead  of  time  (Depth  Reduction) d  =  1 d  =  2 … d  =  3 … V  =  1 V  =  2 V  =  10 IF  there  is  a  function  that  can  measure: V(s):  “board  evaluation  of  state  s”
  • 18. Reducing  Search  Space 1. Reducing  “action  candidates”  (Breadth  Reduction) 2. Position  evaluation  ahead  of  time  (Depth  Reduction)
  • 19. 1.  Reducing  “action  candidates” Learning:  P  (  next  action  |  current  state  ) =  P  (  a  |  s  )
  • 20. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Current  State Prediction   Model Next  State s1 s2 s2 s3 s3 s4 Data:  Online  Go experts (5~9  dan) 160K games, 30M  board  positions
  • 21. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model Current  Board Next  Board
  • 22. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model Current  Board Next  Action There  are  19  X  19  =  361 possible  actions (with  different  probabilities)
  • 23. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s af:  s à a Current  Board Next  Action
  • 24. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction   Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.1 0 0 0 0 0 0 0 0.4  0.2 0 0 0 0 0 0 0 0.1       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p(a|s) aargmax Current  Board Next  Action
  • 25. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction   Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) p(a|s) aargmax Current  Board Next  Action
  • 26. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Deep  Learning (13  Layer  CNN) 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.1 0 0 0 0 0 0 0 0.4  0.2 0 0 0 0 0 0 0 0.1       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p(a|s) aargmax Current  Board Next  Action
  • 27. Convolutional  Neural  Network  (CNN) CNN  is  a  powerful  model  for  image  recognition  tasks;  it  abstracts  out  the  input  image  through  convolution  layers Image  source
  • 28. Convolutional  Neural  Network  (CNN) And  they  use  this  CNN  model  (similar  architecture)  to  evaluate  the  board  position;  which learns  “some”  spatial  invariance
  • 29. Go: abstraction  is  the  key  to  win CNN:  abstraction  is  its  forte
  • 30. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Expert  Moves  Imitator  Model (w/  CNN) Current  Board Next  Action Training:
  • 31. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves   Imitator  Model (w/  CNN) Expert  Moves   Imitator  Model (w/  CNN) VS Improving  by  playing  against  itself
  • 32. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves   Imitator  Model (w/  CNN) Expert  Moves   Imitator  Model (w/  CNN) VS Return:  board  positions, win/lose info
  • 33. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves  Imitator  Model (w/  CNN) Board  position win/loss Training: Loss z  =  -­‐1
  • 34. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves  Imitator  Model (w/  CNN) Training: z  =  +1 Board  position win/loss Win
  • 35. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model ver 1.1 Updated  Model ver 1.3VS Return:  board  positions, win/lose info It  uses  the  same  topology  as  the  expert  moves  imitator  model,  and  just  uses  the  updated parameters Older  models  vs.  newer  models
  • 36. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1.3 Updated  Model   ver 1.7VS Return:  board  positions, win/lose info
  • 37. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1.5 Updated  Model   ver 2.0VS Return:  board  positions, win/lose info
  • 38. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 3204.1 Updated  Model   ver 46235.2VS Return:  board  positions, win/lose info
  • 39. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1,000,000VS The  final  model  wins 80%  of  the time when  playing  against  the  first  model Expert  Moves   Imitator  Model
  • 41. 2.  Board  Evaluation Updated  Model ver 1,000,000 Board  Position Training: Win  /  Loss Win (0~1) Value   Prediction   Model (Regression) Adds  a regression  layer  to  the  model Predicts  values  between  0~1 Close  to  1:  a  good  board  position Close  to  0:  a  bad  board  position
  • 42. Reducing  Search  Space 1. Reducing  “action  candidates” (Breadth  Reduction) 2. Board  Evaluation (Depth  Reduction) Policy  Network Value  Network
  • 43. Looking  ahead  (w/  Monte  Carlo  Search  Tree) Action  Candidates  Reduction (Policy  Network) Board  Evaluation (Value  Network) (Rollout):  Faster  version  of  estimating  p(a|s) à uses shallow  networks  (3  ms à 2µs)
  • 44. Results Elo rating  system Performance  with  different  combinations  of  AlphaGo components
  • 45. Takeaways Use  the  networks  trained  for  a  certain  task  (with  different  loss  objectives)  for  several  other  tasks
  • 46. Lee  Sedol 9-­‐dan vs  AlphaGo
  • 47. Lee  Sedol 9-­‐dan vs  AlphaGo Energy  Consumption Lee  Sedol AlphaGo -­‐ Recommended calories  for  a man per  day : ~2,500 kCal -­‐ Assumption: Lee consumes  the  entire  amount  of   per-­‐day calories  in  this  one  game 2,500  kCal *  4,184  J/kCal ~=  10M  [J] -­‐ Assumption: CPU:  ~100  W,  GPU:  ~300 W -­‐ 1,202 CPUs, 176 GPUs 170,000  J/sec  *  5  hr *  3,600  sec/hr ~=  3,000M  [J] A  very,  very  rough  calculation  ;)
  • 48. AlphaGo is  estimated  to  be  around  ~5-­‐dan =  multiple  machines European  champion
  • 49. Taking  CPU  /  GPU resources  to  virtually  infinity? But  Google  has  promised  not  to  use  more  CPU/GPUs than  they  used  for  Fan  Hui  for  the  game  with  Lee No  one  knows how it  will  converge
  • 50. AlphaGo learns  millions  of  Go  games  every  day AlphaGo will  presumably  converge  to  some  point  eventually. However,  in  the  Nature  paper  they  don’t  report  how  AlphaGo’s performance  improves as  a  function  of  times  AlphaGo plays  against  itself  (self-­‐plays).
  • 51. What  if  AlphaGo learns  Lee’s  game  strategy Google  said  they  won’t  use  Lee’s  game  plays  as  AlphaGo’s training  data   Even  if  it  does,  it  won’t  be  easy  to  modify  the  model  trained  over  millions  of data  points  with  just  a  few  game  plays  with  Lee (prone  to  over-­‐fitting,  etc.)
  • 53. AlphaGo – How  It  Works Presenter: Shane  (Seungwhan)  Moon PhD  student Language  Technologies  Institute,  School  of  Computer  Science Carnegie  Mellon  University [email protected] 3/2/2016
  • 54. Reference • Silver,  David,  et  al.  "Mastering  the  game  of  Go  with  deep  neural   networks  and  tree  search." Nature 529.7587  (2016):  484-­‐489.