The slides go through the implementation details of Google Deepmind's AlphaGo, a computer Go AI that defeated the European champion. The slides are targeted for beginners in the machine learning area.
Korean version (한국어 버젼): http://www.slideshare.net/ShaneSeungwhanMoon/ss-59226902
AlphaGo Zero is an AI agent created by DeepMind to master the game of Go without human data or expertise. It uses reinforcement learning through self-play with the following key aspects:
1. It uses a single deep neural network that predicts both the next move and the winner of the game from the current board position. This dual network is trained solely through self-play reinforcement learning.
2. The neural network improves the Monte Carlo tree search used to select moves. The search uses the network predictions to guide selection and backup of information during search.
3. Training involves repeated self-play games to generate data, then using this data to update the neural network parameters through gradient descent. The updated network plays
1) The document discusses AlphaGo and its use of machine learning techniques like deep neural networks, reinforcement learning, and Monte Carlo tree search to master the game of Go.
2) AlphaGo uses reinforcement learning to learn Go strategies and evaluate board positions by playing many games against itself. It also uses deep neural networks and convolutional neural networks to pattern-match board positions and Monte Carlo tree search to simulate future moves and strategies.
3) By combining these techniques, AlphaGo was able to defeat top human Go players by developing an intuitive understanding of the game and strategizing several moves in advance.
1) Alpha Zero was an AI developed by DeepMind that achieved master level play in the games of chess, shogi, and Go without relying on human data or prior knowledge.
2) It was able to achieve this by using a new form of deep reinforcement learning that allowed it to learn to play solely from games of self-play, starting from random play.
3) Alpha Zero demonstrated superhuman performance in chess, shogi, and Go by defeating previous champion programs in these games, despite being provided no domain knowledge except the game rules.
This document discusses Go and strategies for developing Go-playing AI programs. It summarizes the state space and game tree complexity of Go compared to other games. Early Go programs used rule-based strategies and domain knowledge. More recent programs like AlphaGo use neural networks trained through reinforcement learning from self-play to predict moves and evaluate board positions, combined with Monte Carlo tree search to achieve superhuman performance at Go.
AlphaZero is an AI system created by DeepMind that achieved superhuman ability in the games of chess, shogi, and Go without relying on human data. It uses a new form of deep reinforcement learning combined with Monte Carlo tree search to learn from games generated by self-play. AlphaZero was able to master each game to superhuman level in a matter of hours, defeating the previous world-champion programs in each case. It represents a major advance in unsupervised, self-taught machine learning.
The document discusses how AlphaGo, a computer program developed by DeepMind, was able to defeat world champion Lee Sedol at the game of Go. It achieved this through a combination of deep learning and tree search techniques. Four deep neural networks were used: three convolutional networks to reduce the action space and search depth through imitation learning, self-play reinforcement learning, and value prediction; and a smaller network for faster simulations. This combination of deep learning and search allowed AlphaGo to master the complex game of Go, demonstrating the capabilities of modern AI.
The document provides an introduction and overview of AlphaGo Zero, including:
- AlphaGo Zero achieved superhuman performance at Go without human data by using self-play reinforcement learning.
- It uses a policy network and Monte Carlo tree search to select moves. The network is trained through self-play games using its own policy and value outputs as training labels.
- Experiments showed AlphaGo Zero outperformed previous AlphaGo versions and human-trained networks, and continued improving with deeper networks and more self-play training.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
An introduction to DeepMind's newest board-game playing AI, AlphaZero.
I have improved significantly on my previous presentation in https://www.slideshare.net/ssuserc416e2/alphago-zero-mastering-the-game-of-go-without-human-knowledge, which had several errors (some rather glaring, such as the temperature equation for simulated annealing). Also, DeepMind released far more details in their new Science paper for AlphaZero.
One comment I would like to add is that the AlphaGo Zero used for comparison in this paper is a very weak version, not the final version. Thus, AlphaGo Zero is still SOTA for Go.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Watch video: https://youtu.be/zR11FLZ-O9M
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
Principles of Artificial Intelligence & Machine LearningJerry Lu
Artificial intelligence has captivated me since I worked on projects at Google that ranged from detecting fraud on Google Cloud to predicting subscriber retention on YouTube Red. Looking to broaden my professional experience, I then entered the world of venture capital by joining Baidu Ventures as its first summer investment associate where I got to work with amazingly talented founders building AI-focused startups.
Now at the Wharton School at the University of Pennsylvania, I am looking for opportunities to meet people with interesting AI-related ideas and learn about the newest innovations within the AI ecosystem. Within the first two months of business school, I connected with Nicholas Lind, a second-year Wharton MBA student who interned at IBM Watson as a data scientist. Immediately recognizing our common passion for AI, we produced a lunch-and-learn about AI and machine learning (ML) for our fellow classmates.
Using the following deck, we sought to:
- define artificial intelligence and describe its applications in business
- decode buzzwords such as “deep learning” and “cognitive computing”
- highlight analytical techniques and best practices used in AI / ML
- ultimately, educate future AI leaders
The lunch-and-learn was well received. When it became apparent that it was the topic at hand and not so much the free pizzas that attracted the overflowing audience, I was amazed at the level of interest. It was reassuring to hear that classmates were interested in learning more about the technology and its practical applications in solving everyday business challenges. Nick and I are now laying a foundation to make these workshops an ongoing effort so that more people across the various schools of engineering, design, and Penn at large can benefit.
With its focus on quantitative rigor, Wharton already feels like a perfect fit for me. In the next two years, I look forward to engaging with like-minded people, both in and out of the classroom, sharing my knowledge about AI with my peers, and learning from them in turn. By working together to expand Penn’s reach and reputation with respect to this new frontier, I’m confident that we can all grow into next-generation leaders who help drive companies forward in an era of artificial intelligence.
I’d love to hear what you think. If you found this post or the deck useful, please recommend them to your friends and colleagues!
Semantic segmentation with Convolutional Neural Network ApproachesUMBC
In this project, we propose methods for semantic segmentation with the deep learning state-of-the-art models. Moreover,
we want to filterize the segmentation to the specific object in specific application. Instead of concentrating on unnecessary objects we
can focus on special ones and make it more specialize and effecient for special purposes. Furtheromore, In this project, we leverage
models that are suitable for face segmentation. The models that are used in this project are Mask-RCNN and DeepLabv3. The
experimental results clearly indicate that how illustrated approach are efficient and robust in the segmentation task to the previous work
in the field of segmentation. These models are reached to 74.4 and 86.6 precision of Mean of Intersection over Union. The visual
Results of the models are shown in Appendix part.
An introduction to reinforcement learningJie-Han Chen
This document provides an introduction and overview of reinforcement learning. It begins with a syllabus that outlines key topics such as Markov decision processes, dynamic programming, Monte Carlo methods, temporal difference learning, deep reinforcement learning, and active research areas. It then defines the key elements of reinforcement learning including policies, reward signals, value functions, and models of the environment. The document discusses the history and applications of reinforcement learning, highlighting seminal works in backgammon, helicopter control, Atari games, Go, and dialogue generation. It concludes by noting challenges in the field and prominent researchers contributing to its advancement.
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
Reinforcement learning is a machine learning technique that involves trial-and-error learning. The agent learns to map situations to actions by trial interactions with an environment in order to maximize a reward signal. Deep Q-networks use reinforcement learning and deep learning to allow agents to learn complex behaviors directly from high-dimensional sensory inputs like pixels. DQN uses experience replay and target networks to stabilize learning from experiences. DQN has achieved human-level performance on many Atari 2600 games.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Reinforcement Learning : A Beginners TutorialOmar Enayet
This document provides an overview of reinforcement learning concepts including:
1) It defines the key components of a Markov Decision Process (MDP) including states, actions, transitions, rewards, and discount rate.
2) It describes value functions which estimate the expected return for following a particular policy from each state or state-action pair.
3) It discusses several elementary solution methods for reinforcement learning problems including dynamic programming, Monte Carlo methods, temporal-difference learning, and actor-critic methods.
This document provides an overview of generative adversarial networks (GANs). It explains that GANs were introduced in 2014 and involve two neural networks, a generator and discriminator, that compete against each other. The generator produces synthetic data to fool the discriminator, while the discriminator learns to distinguish real from synthetic data. As they train, the generator improves at producing more realistic outputs that match the real data distribution. Examples of GAN applications discussed include image generation, text-to-image synthesis, and face aging.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
This document provides an agenda for a presentation on deep learning, neural networks, convolutional neural networks, and interesting applications. The presentation will include introductions to deep learning and how it differs from traditional machine learning by learning feature representations from data. It will cover the history of neural networks and breakthroughs that enabled training of deeper models. Convolutional neural network architectures will be overviewed, including convolutional, pooling, and dense layers. Applications like recommendation systems, natural language processing, and computer vision will also be discussed. There will be a question and answer section.
AI, Machine Learning and Deep Learning - The OverviewSpotle.ai
The deck takes you into a fascinating journey of Artificial Intelligence, Machine Learning and Deep Learning, dissect how they are connected and in what way they differ. Supported by illustrative case studies, the deck is your ready reckoner on the fundamental concepts of AI, ML and DL.
Explore more videos, masterclasses with global experts, projects and quizzes on https://spotle.ai/learn
This document summarizes generative adversarial networks (GANs) and their applications. It begins by introducing GANs and how they work by having a generator and discriminator play an adversarial game. It then discusses several variants of GANs including DCGAN, LSGAN, conditional GAN, and others. It provides examples of applications such as image-to-image translation, text-to-image synthesis, image generation, and more. It concludes by discussing major GAN variants and potential future applications like helping children learn to draw.
Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them.
This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
The document provides an overview of deep learning, including its history, key concepts, applications, and recent advances. It discusses the evolution of deep learning techniques like convolutional neural networks, recurrent neural networks, generative adversarial networks, and their applications in computer vision, natural language processing, and games. Examples include deep learning for image recognition, generation, segmentation, captioning, and more.
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Ruairi de Frein
An article from the Telecommunications Software & Systems Group, Waterford Institute of Technology, Ireland describing algorithms for distributed Formal Concept Analysis
ABSTRACT
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms than recent distributed approaches. First, we describe the theoretical foundations underpinning our distributed formal concept analysis approach. Second, we provide a representative exemplar of how a classic centralized algorithm can be implemented in a distributed fashion using our methodology: we modify Ganter's classic algorithm by introducing a family of MR* algorithms, namely MRGanter and MRGanter+ where the prefix denotes the algorithm's lineage. To evaluate the factors that impact distributed algorithm performance, we compare our MR* algorithms with the state-of-the-art. Experiments conducted on real datasets demonstrate that MRGanter+ is efficient, scalable and an appealing algorithm for distributed problems.
Accepted for publication at the International Conference for Formal Concept Analysis 2012.
Project participants: Biao Xu, Ruairí de Fréin, Eric Robson, Mícheál Ó Foghlú
Ruairí de Fréin: rdefrein (at) gmail (dot) com
bibtex:
@incollection{
year={2012},
isbn={978-3-642-29891-2},
booktitle={Formal Concept Analysis},
volume={7278},
series={Lecture Notes in Computer Science},
editor={Domenach, Florent and Ignatov, DmitryI. and Poelmans, Jonas},
doi={10.1007/978-3-642-29892-9_26},
title={Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework},
url={http://dx.doi.org/10.1007/978-3-642-29892-9_26},
publisher={Springer Berlin Heidelberg},
keywords={Formal Concept Analysis; Distributed Mining; MapReduce},
author={Xu, Biao and Fréin, Ruairí and Robson, Eric and Ó Foghlú, Mícheál},
pages={292-308}
}
DOWNLOAD
The article Arxiv: http://arxiv.org/abs/1210.2401
The document discusses neural networks, generative adversarial networks, and image-to-image translation. It begins by explaining how neural networks learn through forward propagation, calculating loss, and using the loss to update weights via backpropagation. Generative adversarial networks are introduced as a game between a generator and discriminator, where the generator tries to fool the discriminator and vice versa. Image-to-image translation uses conditional GANs to translate images from one domain to another, such as maps to aerial photos.
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
An introduction to DeepMind's newest board-game playing AI, AlphaZero.
I have improved significantly on my previous presentation in https://www.slideshare.net/ssuserc416e2/alphago-zero-mastering-the-game-of-go-without-human-knowledge, which had several errors (some rather glaring, such as the temperature equation for simulated annealing). Also, DeepMind released far more details in their new Science paper for AlphaZero.
One comment I would like to add is that the AlphaGo Zero used for comparison in this paper is a very weak version, not the final version. Thus, AlphaGo Zero is still SOTA for Go.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Watch video: https://youtu.be/zR11FLZ-O9M
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
Principles of Artificial Intelligence & Machine LearningJerry Lu
Artificial intelligence has captivated me since I worked on projects at Google that ranged from detecting fraud on Google Cloud to predicting subscriber retention on YouTube Red. Looking to broaden my professional experience, I then entered the world of venture capital by joining Baidu Ventures as its first summer investment associate where I got to work with amazingly talented founders building AI-focused startups.
Now at the Wharton School at the University of Pennsylvania, I am looking for opportunities to meet people with interesting AI-related ideas and learn about the newest innovations within the AI ecosystem. Within the first two months of business school, I connected with Nicholas Lind, a second-year Wharton MBA student who interned at IBM Watson as a data scientist. Immediately recognizing our common passion for AI, we produced a lunch-and-learn about AI and machine learning (ML) for our fellow classmates.
Using the following deck, we sought to:
- define artificial intelligence and describe its applications in business
- decode buzzwords such as “deep learning” and “cognitive computing”
- highlight analytical techniques and best practices used in AI / ML
- ultimately, educate future AI leaders
The lunch-and-learn was well received. When it became apparent that it was the topic at hand and not so much the free pizzas that attracted the overflowing audience, I was amazed at the level of interest. It was reassuring to hear that classmates were interested in learning more about the technology and its practical applications in solving everyday business challenges. Nick and I are now laying a foundation to make these workshops an ongoing effort so that more people across the various schools of engineering, design, and Penn at large can benefit.
With its focus on quantitative rigor, Wharton already feels like a perfect fit for me. In the next two years, I look forward to engaging with like-minded people, both in and out of the classroom, sharing my knowledge about AI with my peers, and learning from them in turn. By working together to expand Penn’s reach and reputation with respect to this new frontier, I’m confident that we can all grow into next-generation leaders who help drive companies forward in an era of artificial intelligence.
I’d love to hear what you think. If you found this post or the deck useful, please recommend them to your friends and colleagues!
Semantic segmentation with Convolutional Neural Network ApproachesUMBC
In this project, we propose methods for semantic segmentation with the deep learning state-of-the-art models. Moreover,
we want to filterize the segmentation to the specific object in specific application. Instead of concentrating on unnecessary objects we
can focus on special ones and make it more specialize and effecient for special purposes. Furtheromore, In this project, we leverage
models that are suitable for face segmentation. The models that are used in this project are Mask-RCNN and DeepLabv3. The
experimental results clearly indicate that how illustrated approach are efficient and robust in the segmentation task to the previous work
in the field of segmentation. These models are reached to 74.4 and 86.6 precision of Mean of Intersection over Union. The visual
Results of the models are shown in Appendix part.
An introduction to reinforcement learningJie-Han Chen
This document provides an introduction and overview of reinforcement learning. It begins with a syllabus that outlines key topics such as Markov decision processes, dynamic programming, Monte Carlo methods, temporal difference learning, deep reinforcement learning, and active research areas. It then defines the key elements of reinforcement learning including policies, reward signals, value functions, and models of the environment. The document discusses the history and applications of reinforcement learning, highlighting seminal works in backgammon, helicopter control, Atari games, Go, and dialogue generation. It concludes by noting challenges in the field and prominent researchers contributing to its advancement.
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
Reinforcement learning is a machine learning technique that involves trial-and-error learning. The agent learns to map situations to actions by trial interactions with an environment in order to maximize a reward signal. Deep Q-networks use reinforcement learning and deep learning to allow agents to learn complex behaviors directly from high-dimensional sensory inputs like pixels. DQN uses experience replay and target networks to stabilize learning from experiences. DQN has achieved human-level performance on many Atari 2600 games.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Reinforcement Learning : A Beginners TutorialOmar Enayet
This document provides an overview of reinforcement learning concepts including:
1) It defines the key components of a Markov Decision Process (MDP) including states, actions, transitions, rewards, and discount rate.
2) It describes value functions which estimate the expected return for following a particular policy from each state or state-action pair.
3) It discusses several elementary solution methods for reinforcement learning problems including dynamic programming, Monte Carlo methods, temporal-difference learning, and actor-critic methods.
This document provides an overview of generative adversarial networks (GANs). It explains that GANs were introduced in 2014 and involve two neural networks, a generator and discriminator, that compete against each other. The generator produces synthetic data to fool the discriminator, while the discriminator learns to distinguish real from synthetic data. As they train, the generator improves at producing more realistic outputs that match the real data distribution. Examples of GAN applications discussed include image generation, text-to-image synthesis, and face aging.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
This document provides an agenda for a presentation on deep learning, neural networks, convolutional neural networks, and interesting applications. The presentation will include introductions to deep learning and how it differs from traditional machine learning by learning feature representations from data. It will cover the history of neural networks and breakthroughs that enabled training of deeper models. Convolutional neural network architectures will be overviewed, including convolutional, pooling, and dense layers. Applications like recommendation systems, natural language processing, and computer vision will also be discussed. There will be a question and answer section.
AI, Machine Learning and Deep Learning - The OverviewSpotle.ai
The deck takes you into a fascinating journey of Artificial Intelligence, Machine Learning and Deep Learning, dissect how they are connected and in what way they differ. Supported by illustrative case studies, the deck is your ready reckoner on the fundamental concepts of AI, ML and DL.
Explore more videos, masterclasses with global experts, projects and quizzes on https://spotle.ai/learn
This document summarizes generative adversarial networks (GANs) and their applications. It begins by introducing GANs and how they work by having a generator and discriminator play an adversarial game. It then discusses several variants of GANs including DCGAN, LSGAN, conditional GAN, and others. It provides examples of applications such as image-to-image translation, text-to-image synthesis, image generation, and more. It concludes by discussing major GAN variants and potential future applications like helping children learn to draw.
Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them.
This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
The document provides an overview of deep learning, including its history, key concepts, applications, and recent advances. It discusses the evolution of deep learning techniques like convolutional neural networks, recurrent neural networks, generative adversarial networks, and their applications in computer vision, natural language processing, and games. Examples include deep learning for image recognition, generation, segmentation, captioning, and more.
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Ruairi de Frein
An article from the Telecommunications Software & Systems Group, Waterford Institute of Technology, Ireland describing algorithms for distributed Formal Concept Analysis
ABSTRACT
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms than recent distributed approaches. First, we describe the theoretical foundations underpinning our distributed formal concept analysis approach. Second, we provide a representative exemplar of how a classic centralized algorithm can be implemented in a distributed fashion using our methodology: we modify Ganter's classic algorithm by introducing a family of MR* algorithms, namely MRGanter and MRGanter+ where the prefix denotes the algorithm's lineage. To evaluate the factors that impact distributed algorithm performance, we compare our MR* algorithms with the state-of-the-art. Experiments conducted on real datasets demonstrate that MRGanter+ is efficient, scalable and an appealing algorithm for distributed problems.
Accepted for publication at the International Conference for Formal Concept Analysis 2012.
Project participants: Biao Xu, Ruairí de Fréin, Eric Robson, Mícheál Ó Foghlú
Ruairí de Fréin: rdefrein (at) gmail (dot) com
bibtex:
@incollection{
year={2012},
isbn={978-3-642-29891-2},
booktitle={Formal Concept Analysis},
volume={7278},
series={Lecture Notes in Computer Science},
editor={Domenach, Florent and Ignatov, DmitryI. and Poelmans, Jonas},
doi={10.1007/978-3-642-29892-9_26},
title={Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework},
url={http://dx.doi.org/10.1007/978-3-642-29892-9_26},
publisher={Springer Berlin Heidelberg},
keywords={Formal Concept Analysis; Distributed Mining; MapReduce},
author={Xu, Biao and Fréin, Ruairí and Robson, Eric and Ó Foghlú, Mícheál},
pages={292-308}
}
DOWNLOAD
The article Arxiv: http://arxiv.org/abs/1210.2401
The document discusses neural networks, generative adversarial networks, and image-to-image translation. It begins by explaining how neural networks learn through forward propagation, calculating loss, and using the loss to update weights via backpropagation. Generative adversarial networks are introduced as a game between a generator and discriminator, where the generator tries to fool the discriminator and vice versa. Image-to-image translation uses conditional GANs to translate images from one domain to another, such as maps to aerial photos.
First presented at the MSUG Conference on June 4, 2015, this presentation discusses concepts and tools to add to your logistic regression modeling practice and also how to use these concepts and tools.
Akka Streams provides a powerful framework for building scalable and resilient reactive applications at Zalando. Some initial challenges included understanding the graph DSL and handling errors and failures. Lessons were learned around Akka HTTP clients timing out requests and internal buffer sizes. Monitoring streams required building custom stages. Overall, Akka Streams enables easy tuning, scaling, reliability and testability once the reactive paradigm is understood. While it takes time to master, Akka Streams is a potent tool that can manage everything from simple to complex event processing at large scale.
This document provides information about an artificial intelligence course at Sanjivani College of Engineering, including the course objectives, an overview of planning in AI, types of planning strategies like forward and backward state space planning, and an example of the block world planning problem. It also discusses logical representations of planning problems using first-order logic, the STRIPS planning framework, and an algorithm for goal stack planning.
Ropossum is a framework that lets you play the beloved Cut The Rope game as much as you want and the levels will keep coming. You can design your own levels, check your designed levels for playability at real time, ask it to complete your unfinished designs according to your own preferences, or even suggest endless playable design variations according to your initial level design.
Large scale landuse classification of satellite imagerySuneel Marthi
This document summarizes a presentation on classifying land use from satellite imagery. It describes using a neural network to filter out cloudy images, segmenting images with a U-Net model to identify tulip fields, and implementing the workflow with Apache Beam for inference on new images. Examples are shown of detecting large and small tulip fields. Future work proposed includes classifying rock formations using infrared bands and measuring crop health.
The document discusses deep learning and artificial neural networks. It provides an agenda for topics covered, including gradient descent, backpropagation, activation functions, and examples of neural network architectures like convolutional neural networks. It explains concepts like how neural networks learn patterns from data using techniques like stochastic gradient descent to minimize loss functions. Deep learning requires large amounts of processing power and labeled training data. Common deep learning networks are used for tasks like image recognition, object detection, and time series analysis.
Here are the key points about discounting:
- Discounting refers to exponentially decreasing the value of rewards received in the future, compared to rewards in the present. This is done with a discount factor γ between 0 and 1.
- Discounting encourages agents to prefer sooner rewards over later rewards. It helps algorithms converge by ensuring the value of future rewards is finite.
- With discounting, the utility of a sequence of rewards is calculated as:
U = R1 + γR2 + γ^2R3 + ...
- Discounting leads to stationary preferences over reward sequences, meaning the preferences don't change over time. The two ways to define utilities under stationary preferences are additive utility and discounted utility.
Practical AI for Business: Bandit AlgorithmsSC5.io
This document provides an overview of bandit algorithms and their applications. It begins by explaining the multi-armed bandit problem and some basic algorithms like epsilon-greedy to solve it. It then discusses more advanced techniques like Thompson sampling and their benefits over naive approaches. Finally, it outlines several real-world uses of bandit algorithms, including UI optimization, recommendation systems, and multiplayer games. Bandit algorithms provide a powerful way to optimize outcomes in situations where rewards are not immediately revealed.
This document discusses multi-agent reinforcement learning (MARL) concepts and potential military applications. It reviews current MARL research directions including cooperation, constraints, and learning methods. It also briefly explains reinforcement learning concepts like policy, value, and return. Several proposed MARL models are introduced, including QMIX, COMA, and RODE. QMIX is value-based and combines agent Q-values nonlinearly. COMA is policy-based using actor-critic with difference rewards. RODE extends QMIX by learning agent roles to partition action spaces. The document analyzes roles for a 2s3z task environment.
Story points considered harmful - or why the future of estimation is really i...Vasco Duarte
Story points are commonly used for estimating software projects, but this document argues they are flawed and an alternative approach using number of items completed provides a better measure of progress. The document analyzes several claims made to justify using story points and finds them lacking. Data from nine real projects shows a very high correlation between story points and number of items, indicating they measure the same underlying information. Additionally, the number of items metric more accurately predicted the actual project outcomes over multiple sprints in one example project. Overall, the document argues the number of items approach provides equivalent information to story points but with less overhead and greater accuracy.
※다운로드하시면 더 선명한 자료를 보실 수 있습니다.
게임을 더 재미있게 만들기 위해서는 레벨 디자인을 잘 해야 하는데, 이를 도와주는 Puzzle AI를 만들어 가는 과정을 설명합니다. Puzzle AI를 이용해 신규 레벨의 난이도를 예측하여 유저가 만족할 수 있는 레벨로 디자인하는 방법을 알아봅니다.
반복적인 게임 레벨 디자인을 도와주고, 레벨에 대한 객관적인 평가를 하고, 레벨 디자이너와 개발자에게 도움을 줘서 결국에는 고객을 만족시켜 매출에 도움을 줄 수 있는 방법을 알 수 있습니다.
목차
1. 재미있는 게임이란?
2. 적절한 난이도의 게임을 만들기 위한 방법
3. 에이전트를 이용한 게임의 난이도 평가
4. 게임의 난이도 예측
5. 실제 적용
대상
머신러닝, 강화 학습, 게임 AI, 게임 개발, 게임 QA, 레벨 디자인에 관심 있는 분 누구나
■관련 동영상: https://youtu.be/OUg0xcgkhls
The document discusses setting up 3D scenes in OpenGL using matrices. It states that to see a 3D scene, you need to set up the camera, projection, and world matrix. The camera and projection matrices are singletons that apply to all objects, while the world matrix is set separately for each object. It explains the camera matrix sets the camera position and orientation, while the projection matrix handles perspective vs orthographic projections. The world matrix transforms individual objects by scaling, rotating, and translating them. It provides an example of drawing objects by setting their world matrix before rendering.
Demo Videos: www.larry-lai.com/tracking.html
A real-time object tracking algorithm is proposed to cope with the variables of appearance changes like translation, zooming, rotation, panning/tilting, occlusion, luminance change, and blur. The proposed tracking scheme includes three steps. First, regional filter is employed to detect the candidate regions of targets. Next, these candidate regions are scaled to an uniform size for feature extraction. Finally, using feature matching to calculate the similarity between an instance and the target, and then store this instance if recognized as the target. We can see that the instance database would contain object's difference appearances as the tracking time going on. In other words, recognition capability will increase while the database become enlarging. To keep high computation performance, an algorithm with database reduction is proposed to limit the size of database. From our experiments, the proposed tracking system can achieve 30 FPS with resolution 1280x720 on an Intel I5 CPU 2.6GHz.
The presentation discusses whether reinforcement learning (RL) is reaching a tipping point for production use. While RL has achieved superhuman performance in research domains like games, its use in production is still rare due to challenges like high data requirements, online training limitations, and large state/action spaces. However, the talk notes recent progress in areas like distributed training, offline RL, and embeddings that reduce complexity. It identifies three patterns seen in successful production RL: parallelizable simulated tasks, low temporal problems like recommendations, and next-generation optimization. The presentation provides tips for simpler RL approaches and validation challenges when deploying RL models.
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
John Maxwell, a data scientist at Nordstrom, did his graduate work in international development economics, focusing on field experiments. He has since led research projects in Indonesia and Ethiopia related to microenterprise, developed large mathematical simulation models used for investment decisions by WSDOT, built dynamic pricing algorithms at Thriftbooks.com, and led the development of Nordstrom’s open source a/b testing service: Elwin. He currently focuses on contextual multi-armed bandit problems and machine learning infrastructure at Nordstrom.
Abstract summary
Solving the Contextual Multi-Armed Bandit Problem at Nordstrom:
The contextual multi-armed bandit problem, also known as associative reinforcement learning or bandits with side information, is a useful formulation of the multi-armed bandit problem that takes into account information about arms and users when deciding which arm to pull. The barrier to entry for both understanding and implementing contextual multi-armed bandits in production is high. The literature in this field pulls from disparate sources including (but not limited to) classical statistics, reinforcement learning, and information theory. Because of this, finding material that fills the gap between very basic explanations and academic journal articles is challenging. The goal of this talk is to provide those lacking intermediate materials as well as an example implementation. Specifically, I will explain key findings from some of the more cited papers in the contextual bandit literature, discuss the minimum requirements for implementation, and give an overview of a production system for solving contextual multi-armed bandit problems.
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous DrivingKiho Suh
모두의연구소에서 “Safe, Multi-agent Reinforcement Learning for Autonomous Driving”이라는 논문을 발표한 자료를 공유합니다. 3월에 Intel에 $15.3 billion(우리나라 돈으로 약 17조원)에 인수된 Mobileye라는 이스라엘 회사의 CEO, VP of Engineering이 쓴 논문입니다. 그리고 얼마전에 Reddit에서 핫이슈가 된 논문인 “Failures of Deep Learning”을 발표한 분들이기도 하고요.
이 논문에서 풀어보자 하는것은 “사람처럼 운전하기” 입니다. 운전은 “multi-agent” game이고 도로에서 다른 agent들은 사람이기 때문에 자율주행자동차가 사람처럼 운전을 해야만 합니다. 운전은 multi-agnet이고 엄청난 양의 가능한 시니라오가 있어서 Markov Assumption이 성립이 안되어서 Imitation Learning, Policy Gradient without Markovity, 그리고 심지어 좀 고전적인(?) 방법인 dynamic programming까지 같이 써서 이 논문의 문제를 풀어봅니다.
Caching for Performance Masterclass: Caching StrategiesScyllaDB
Exploring the tradeoffs of common caching strategies – and a look at the architectural differences.
- Which strategies exist
- When to apply different strategies
- ScyllaDB cache design
Caching for Performance Masterclass: The In-Memory DatastoreScyllaDB
Understanding where in-memory data stores help most and where teams get into trouble.
- Where in the stack to cache
- Memcached as a tool
- Modern cache primitives
Dev Dives: Unlock the future of automation with UiPath Agent BuilderUiPathCommunity
This webinar will offer you a first look at the powerful capabilities of UiPath Agent Builder, designed to streamline your automation processes and enhance your workflow efficiency.
📕 During the session, you will:
- Discover how to build agents with low-code experience, making it accessible for both developers and business users.
- Learn how to leverage automations and activities as tools within your agents, enabling them to handle complex and dynamic workflows.
- Gain insights into the AI Trust Layer, which provides robust management and monitoring capabilities, ensuring trust and transparency in your automation processes.
- See how agents can be deployed and integrated with your existing UiPath cloud and Studio environments.
👨🏫 Speaker:
Zach Eslami, Sr. Manager, Product Management Director, UiPath
⏩ Register for our upcoming Dev Dives March session:
Unleash the power of macOS Automation with UiPath
👉 AMER: https://bit.ly/Dev_Dives_AMER_March
👉 EMEA & APJ:https://bit.ly/Dev_Dives_EMEA_APJ_March
This session was streamed live on February 27, 2025, 15:00 GMT.
Check out future Dev Dives 2025 sessions at:
🚩 https://bit.ly/Dev_Dives_2025
GDG Cloud Southlake #40: Brandon Stokes: How to Build a Great ProductJames Anderson
How to Build a Great Product
Being a tech entrepreneur is about providing a remarkable product or service that serves the needs of its customers better, faster, and cheaper than anything else. The goal is to "make something people want" which we call, product market fit.
But how do we get there? We'll explore the process of taking an idea to product market fit (PMF), how you know you have true PMF, and how your product strategies differ pre-PMF from post-PMF.
Brandon is a 3x founder, 1x exit, ex-banker & corporate strategist, car dealership owner, and alumnus of Techstars & Y Combinator. He enjoys building products and services that impact people for the better.
Brandon has had 3 different careers (banking, corporate finance & strategy, technology) in 7 different industries; Investment Banking, CPG, Media & Entertainment, Telecommunications, Consumer application, Automotive, & Fintech/Insuretech.
He's an idea to revenue leader and entrepreneur that helps organizations build products and processes, hire talent, test & iterate quickly, collect feedback, and grow in unregulated and heavily regulated industries.
DevOps iş təhlükəsizliyi sizi maraqlandırır? İstər developer, istər təhlükəsizlik mühəndisi, istərsə də DevOps həvəskarı olun, bu tədbir şəbəkələşmək, biliklərinizi bölüşmək və DevSecOps sahəsində ən son təcrübələri öyrənmək üçün mükəmməl fürsətdir!
Bu workshopda DevOps infrastrukturlarının təhlükəsizliyini necə artırmaq barədə danışacayıq. DevOps sistemləri qurularkən avtomatlaşdırılmış, yüksək əlçatan və etibarlı olması ilə yanaşı, həm də təhlükəsizlik məsələləri nəzərə alınmalıdır. Bu səbəbdən, DevOps komandolarının təhlükəsizliyə yönəlmiş praktikalara riayət etməsi vacibdir.
Computational Photography: How Technology is Changing Way We Capture the WorldHusseinMalikMammadli
📸 Computational Photography (Computer Vision/Image): How Technology is Changing the Way We Capture the World
Heç düşünmüsünüzmü, müasir smartfonlar və kameralar necə bu qədər gözəl görüntülər yaradır? Bunun sirri Computational Fotoqrafiyasında(Computer Vision/Imaging) gizlidir—şəkilləri çəkmə və emal etmə üsulumuzu təkmilləşdirən, kompüter elmi ilə fotoqrafiyanın inqilabi birləşməsi.
UiPath Automation Developer Associate Training Series 2025 - Session 2DianaGray10
In session 2, we will introduce you to Data manipulation in UiPath Studio.
Topics covered:
Data Manipulation
What is Data Manipulation
Strings
Lists
Dictionaries
RegEx Builder
Date and Time
Required Self-Paced Learning for this session:
Data Manipulation with Strings in UiPath Studio (v2022.10) 2 modules - 1h 30m - https://academy.uipath.com/courses/data-manipulation-with-strings-in-studio
Data Manipulation with Lists and Dictionaries in UiPath Studio (v2022.10) 2 modules - 1h - https:/academy.uipath.com/courses/data-manipulation-with-lists-and-dictionaries-in-studio
Data Manipulation with Data Tables in UiPath Studio (v2022.10) 2 modules - 1h 30m - https:/academy.uipath.com/courses/data-manipulation-with-data-tables-in-studio
⁉️ For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
Understanding Traditional AI with Custom Vision & MuleSoft.pptxshyamraj55
Understanding Traditional AI with Custom Vision & MuleSoft.pptx | ### Slide Deck Description:
This presentation features Atul, a Senior Solution Architect at NTT DATA, sharing his journey into traditional AI using Azure's Custom Vision tool. He discusses how AI mimics human thinking and reasoning, differentiates between predictive and generative AI, and demonstrates a real-world use case. The session covers the step-by-step process of creating and training an AI model for image classification and object detection—specifically, an ad display that adapts based on the viewer's gender. Atulavan highlights the ease of implementation without deep software or programming expertise. The presentation concludes with a Q&A session addressing technical and privacy concerns.
EaseUS Partition Master Crack 2025 + Serial Keykherorpacca127
https://ncracked.com/7961-2/
Note: >> Please copy the link and paste it into Google New Tab now Download link
EASEUS Partition Master Crack is a professional hard disk partition management tool and system partition optimization software. It is an all-in-one PC and server disk management toolkit for IT professionals, system administrators, technicians, and consultants to provide technical services to customers with unlimited use.
EASEUS Partition Master 18.0 Technician Edition Crack interface is clean and tidy, so all options are at your fingertips. Whether you want to resize, move, copy, merge, browse, check, convert partitions, or change their labels, you can do everything with a few clicks. The defragmentation tool is also designed to merge fragmented files and folders and store them in contiguous locations on the hard drive.
UiPath Automation Developer Associate Training Series 2025 - Session 1DianaGray10
Welcome to UiPath Automation Developer Associate Training Series 2025 - Session 1.
In this session, we will cover the following topics:
Introduction to RPA & UiPath Studio
Overview of RPA and its applications
Introduction to UiPath Studio
Variables & Data Types
Control Flows
You are requested to finish the following self-paced training for this session:
Variables, Constants and Arguments in Studio 2 modules - 1h 30m - https://academy.uipath.com/courses/variables-constants-and-arguments-in-studio
Control Flow in Studio 2 modules - 2h 15m - https:/academy.uipath.com/courses/control-flow-in-studio
⁉️ For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
Bedrock Data Automation (Preview): Simplifying Unstructured Data ProcessingZilliz
Bedrock Data Automation (BDA) is a cloud-based service that simplifies the process of extracting valuable insights from unstructured content—such as documents, images, video, and audio. Come learn how BDA leverages generative AI to automate the transformation of multi-modal data into structured formats, enabling developers to build applications and automate complex workflows with greater speed and accuracy.
UiPath Automation Developer Associate Training Series 2025 - Session 1DianaGray10
Welcome to UiPath Automation Developer Associate Training Series 2025 - Session 1.
In this session, we will cover the following topics:
Introduction to RPA & UiPath Studio
Overview of RPA and its applications
Introduction to UiPath Studio
Variables & Data Types
Control Flows
You are requested to finish the following self-paced training for this session:
Variables, Constants and Arguments in Studio 2 modules - 1h 30m - https://academy.uipath.com/courses/variables-constants-and-arguments-in-studio
Control Flow in Studio 2 modules - 2h 15m - https:/academy.uipath.com/courses/control-flow-in-studio
⁉️ For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
UiPath Document Understanding - Generative AI and Active learning capabilitiesDianaGray10
This session focus on Generative AI features and Active learning modern experience with Document understanding.
Topics Covered:
Overview of Document Understanding
How Generative Annotation works?
What is Generative Classification?
How to use Generative Extraction activities?
What is Generative Validation?
How Active learning modern experience accelerate model training?
Q/A
❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
Not a Kubernetes fan? The state of PaaS in 2025Anthony Dahanne
Kubernetes won the containers orchestration war. But has it made deploying your apps easier?
Let's explore some of Kubernetes extensive app developer tooling, but mainly what the PaaS space looks like in 2025; 18 years after Heroku made it popular.
Is Heroku still around? What about Cloud Foundry?
And what are those new comers (fly.io, railway, porter.sh, etc.) worth?
Did the Cloud giants replace them all?
This is session #3 of the 5-session online study series with Google Cloud, where we take you onto the journey learning generative AI. You’ll explore the dynamic landscape of Generative AI, gaining both theoretical insights and practical know-how of Google Cloud GenAI tools such as Gemini, Vertex AI, AI agents and Imagen 3.
UiPath Agentic Automation Capabilities and OpportunitiesDianaGray10
Learn what UiPath Agentic Automation capabilities are and how you can empower your agents with dynamic decision making. In this session we will cover these topics:
What do we mean by Agents
Components of Agents
Agentic Automation capabilities
What Agentic automation delivers and AI Tools
Identifying Agent opportunities
❓ If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
UiPath Agentic Automation Capabilities and OpportunitiesDianaGray10
How AlphaGo Works
1. Presenter: Shane
(Seungwhan)
Moon
PhD
student
Language
Technologies
Institute,
School
of
Computer
Science
Carnegie
Mellon
University
3/2/2016
How
it
works
2. AlphaGo vs
European
Champion
(Fan
Hui 2-‐Dan)
October
5
– 9,
2015
<Official
match>
-‐ Time
limit:
1
hour
-‐ AlphaGo Wins (5:0)
*
rank
3. AlphaGo vs
World
Champion
(Lee
Sedol 9-‐Dan)
March
9
– 15,
2016
<Official
match>
-‐ Time
limit:
2
hours
Venue:
Seoul,
Four
Seasons
Hotel
Image
Source: Josun
Times Jan
28th
2015
6. Computer
Go
AI – Definition
s (state)
d
=
1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
=
(e.g.
we
can
represent
the
board
into
a
matrix-‐like
form)
*
The
actual
model
uses
other
features
than
board
positions
as
well
7. Computer
Go
AI
– Definition
s (state)
d
=
1 d
=
2
a (action)
Given
s,
pick
the
best
a
Computer
Go
Artificial
Intelligence
s a s'
8. Computer
Go
AI – An
Implementation
Idea?
d
=
1 d
=
2
…
How
about
simulating
all
possible
board
positions?
9. Computer
Go
AI
– An
Implementation
Idea?
d
=
1 d
=
2
…
d
=
3
…
…
…
…
10. Computer
Go
AI
– An
Implementation
Idea?
d
=
1 d
=
2
…
d
=
3
…
…
…
…
… d
=
maxD
Process
the
simulation
until
the
game
ends,
then
report
win
/
lose
results
11. Computer
Go
AI
– An
Implementation
Idea?
d
=
1 d
=
2
…
d
=
3
…
…
…
…
… d
=
maxD
Process
the
simulation
until
the
game
ends,
then
report
win
/
lose
results
e.g. it
wins
13
times
if
the
next
stone
gets
placed
here
37,839
times
431,320
times
Choose
the
“next
action
/
stone”
that
has
the
most
win-‐counts
in
the
full-‐scale
simulation
12. This
is
NOT
possible;
it
is
said
the
possible
configurations
of
the
board
exceeds
the
number
of
atoms
in
the
universe
14. Reducing
Search
Space
1.
Reducing
“action
candidates”
(Breadth
Reduction)
d
=
1 d
=
2
…
d
=
3
…
…
…
… d
=
maxD
Win?
Loss?
IF
there
is
a
model
that
can
tell
you
that
these
moves
are
not
common
/
probable
(e.g.
by
experts,
etc.)
…
15. Reducing
Search
Space
1.
Reducing
“action
candidates”
(Breadth
Reduction)
d
=
1 d
=
2
…
d
=
3
…
… d
=
maxD
Win?
Loss?
Remove
these
from
search
candidates
in
advance (breadth
reduction)
16. Reducing
Search
Space
2.
Position
evaluation
ahead
of
time
(Depth
Reduction)
d
=
1 d
=
2
…
d
=
3
…
… d
=
maxD
Win?
Loss?
Instead
of
simulating
until
the
maximum
depth ..
17. Reducing
Search
Space
2.
Position
evaluation
ahead
of
time
(Depth
Reduction)
d
=
1 d
=
2
…
d
=
3
…
V
=
1
V
=
2
V
=
10
IF
there
is
a
function
that
can
measure:
V(s):
“board
evaluation
of
state
s”
18. Reducing
Search
Space
1. Reducing
“action
candidates”
(Breadth
Reduction)
2. Position
evaluation
ahead
of
time
(Depth
Reduction)
19. 1.
Reducing
“action
candidates”
Learning:
P
(
next
action
|
current
state
)
=
P
(
a
|
s
)
20. 1.
Reducing
“action
candidates”
(1) Imitating
expert
moves
(supervised
learning)
Current
State
Prediction
Model
Next
State
s1 s2
s2 s3
s3 s4
Data:
Online
Go experts (5~9
dan)
160K games, 30M
board
positions
21. 1.
Reducing
“action
candidates”
(1) Imitating
expert
moves
(supervised
learning)
Prediction
Model
Current
Board Next
Board
22. 1.
Reducing
“action
candidates”
(1) Imitating
expert
moves
(supervised
learning)
Prediction
Model
Current
Board Next
Action
There
are
19
X
19
=
361
possible
actions
(with
different
probabilities)
27. Convolutional
Neural
Network
(CNN)
CNN
is
a
powerful
model
for
image
recognition
tasks;
it
abstracts
out
the
input
image
through
convolution
layers
Image
source
28. Convolutional
Neural
Network
(CNN)
And
they
use
this
CNN
model
(similar
architecture)
to
evaluate
the
board
position;
which learns
“some”
spatial
invariance
30. 1.
Reducing
“action
candidates”
(1) Imitating
expert
moves
(supervised
learning)
Expert
Moves
Imitator
Model
(w/
CNN)
Current
Board Next
Action
Training:
31. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Expert
Moves
Imitator
Model
(w/
CNN)
Expert
Moves
Imitator
Model
(w/
CNN)
VS
Improving
by
playing
against
itself
32. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Expert
Moves
Imitator
Model
(w/
CNN)
Expert
Moves
Imitator
Model
(w/
CNN)
VS
Return:
board
positions, win/lose info
33. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Expert
Moves
Imitator
Model
(w/
CNN)
Board
position win/loss
Training:
Loss
z
=
-‐1
34. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Expert
Moves
Imitator
Model
(w/
CNN)
Training:
z
=
+1
Board
position win/loss
Win
35. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Updated
Model
ver 1.1
Updated
Model
ver 1.3VS
Return:
board
positions, win/lose info
It
uses
the
same
topology
as
the
expert
moves
imitator
model,
and
just
uses
the
updated parameters
Older
models
vs.
newer
models
36. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Updated
Model
ver 1.3
Updated
Model
ver 1.7VS
Return:
board
positions, win/lose info
37. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Updated
Model
ver 1.5
Updated
Model
ver 2.0VS
Return:
board
positions, win/lose info
38. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Updated
Model
ver 3204.1
Updated
Model
ver 46235.2VS
Return:
board
positions, win/lose info
39. 1.
Reducing
“action
candidates”
(2) Improving
through
self-‐plays
(reinforcement
learning)
Updated
Model
ver 1,000,000VS
The
final
model
wins 80%
of
the time
when
playing
against
the
first
model
Expert
Moves
Imitator
Model
41. 2.
Board
Evaluation
Updated
Model
ver 1,000,000
Board
Position
Training:
Win
/
Loss
Win
(0~1)
Value
Prediction
Model
(Regression)
Adds
a regression
layer
to
the
model
Predicts
values
between
0~1
Close
to
1:
a
good
board
position
Close
to
0:
a
bad
board
position
42. Reducing
Search
Space
1. Reducing
“action
candidates”
(Breadth
Reduction)
2. Board
Evaluation (Depth
Reduction)
Policy
Network
Value
Network
43. Looking
ahead
(w/
Monte
Carlo
Search
Tree)
Action
Candidates
Reduction
(Policy
Network)
Board
Evaluation
(Value
Network)
(Rollout):
Faster
version
of
estimating
p(a|s)
à uses shallow
networks
(3
ms à 2µs)
47. Lee
Sedol 9-‐dan vs
AlphaGo
Energy
Consumption
Lee
Sedol AlphaGo
-‐ Recommended calories
for
a man per
day
: ~2,500 kCal
-‐ Assumption: Lee consumes
the
entire
amount
of
per-‐day calories
in
this
one
game
2,500
kCal *
4,184
J/kCal
~=
10M
[J]
-‐ Assumption: CPU:
~100
W,
GPU:
~300 W
-‐ 1,202 CPUs, 176 GPUs
170,000
J/sec
*
5
hr *
3,600
sec/hr
~=
3,000M
[J]
A
very,
very
rough
calculation
;)
48. AlphaGo is
estimated
to
be
around
~5-‐dan
=
multiple
machines European
champion
49. Taking
CPU
/
GPU resources
to
virtually
infinity?
But
Google
has
promised
not
to
use
more
CPU/GPUs
than
they
used
for
Fan
Hui
for
the
game
with
Lee
No
one
knows
how it
will
converge
50. AlphaGo learns
millions
of
Go
games
every
day
AlphaGo will
presumably
converge
to
some
point
eventually.
However,
in
the
Nature
paper
they
don’t
report
how
AlphaGo’s performance
improves
as
a
function
of
times
AlphaGo plays
against
itself
(self-‐plays).
51. What
if
AlphaGo learns
Lee’s
game
strategy
Google
said
they
won’t
use
Lee’s
game
plays
as
AlphaGo’s training
data
Even
if
it
does,
it
won’t
be
easy
to
modify
the
model
trained
over
millions
of
data
points
with
just
a
few
game
plays
with
Lee
(prone
to
over-‐fitting,
etc.)
53. AlphaGo – How
It
Works
Presenter: Shane
(Seungwhan)
Moon
PhD
student
Language
Technologies
Institute,
School
of
Computer
Science
Carnegie
Mellon
University
[email protected]
3/2/2016
54. Reference
• Silver,
David,
et
al.
"Mastering
the
game
of
Go
with
deep
neural
networks
and
tree
search." Nature 529.7587
(2016):
484-‐489.