NeurIPS-2019 Workshop on
Safety and Robustness in Decision Making
December 13th, East Ballroom A
Workshop Summary
Interacting with increasingly sophisticated decision-making systems is becoming more and more a part of our daily life. This creates an immense responsibility for designers of these systems to build them in a way to guarantee safe interaction with their users and good performance, in the presence of noise and changes in the environment, and/or of model misspecification and uncertainty. Any progress in this area will be a huge step forward in using decision-making algorithms in emerging high stakes applications, such as autonomous driving, robotics, power systems, health care, recommendation systems, and finance.
This workshop aims to bring together researchers from academia and industry in order to discuss main challenges, describe recent advances, and highlight future research directions pertaining to develop safe and robust decision-making systems. We aim to highlight new and emerging theoretical and applied research opportunities for the community that arise from the evolving needs for decision-making systems and algorithms that guarantee safe interaction and good performance under a wide range of uncertainties in the environment.
The research challenges we are interested in addressing in this workshop include (but are not limited to):
- Counterfactual reasoning and off-policy evaluation.
- Specifying appropriate definitions of safety and robustness, and quantifying/certifying them under model uncertainty.
- Safe exploration.
- Trading off robustness/safety with performance.
- Understanding the connections between risk and robustness, and other related definitions.
Call for Papers
IMPORTANT DATES
Paper Submission Deadline: September 22, 2019Notification of Acceptance: September 30, 2019- Workshop: Friday December 13, 2019
SUBMISSION INSTRUCTIONS
Papers submitted to the workshop should be between 4 to 8 pages long, excluding references and appendix, and in NeurIPS 2019 format (NOT ANONYMIZED). Accepted papers will be presented as posters or contributed oral presentations.
Submissions should be sent as a pdf file by email to [email protected]
Organizers
(Google Research)
(Facebook AI Research)
(Technion)
(University of New Hampshire)
(Caltech)
Invited Speakers
(Harvard University)
Title: Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings
Abstract
Statistical methods for off-policy evaluation and counterfactual reasoning will have fundamental limitations based on what assumptions can be made and what kind of exploration is present in the data (some of which is being presented here by other speakers!). In this talk, I'll discuss some recent directions in our lab regarding ways to integrate human experts into the process of policy evaluation and selection in batch settings. The first deals with statistical limitations by seeking a diverse collection of statistically-indistinguishable (with respect to outcome) policies for humans to eventually decide from. The second involves directly integrating human feedback to eliminate or validate specific sources of sensitivity in an off-policy evaluation to get more robust estimates (or at least better understand the source of their non-robustness). More broadly, I will discuss open directions for moving from purely-statistical (e.g. off-policy evaluation) or purely-human (e.g. interpretability-based) approaches for robust/safe decision-making toward combining the advantages of both.
(Henry Ford Technical Fellow at Ford Motor Company)
Title: Practical Approaches to Driving Policy Design for Autonomous Vehicles
Abstract
The presentation deals with some practical facets of application of AI methods to designing driving policy for autonomous vehicles. Relationship between the reinforcement learning (RL) based solutions and the use of rule-based and model-based techniques for improving their robustness and safety are discussed. One approach to obtaining explainable RL models by learning alternative rule-based representations is proposed. The presentation also elaborates on the opportunities for extending the AI driving policy approaches by applying game theory inspired methodology to addressing diverse and unforeseen scenarios, and representing the negotiation aspects of decision making in autonomous driving.
(Cornell University)
Title: Fair Ranking with Biased Data
Abstract
Search engines and recommender systems have become the dominant matchmaker for a wide range of human endeavors -- from online retail to finding romantic partners. Consequently, they carry immense power in shaping markets and allocating opportunity to the participants. In this talk, I will discuss how the machine learning algorithms underlying these system can produce unfair ranking policies for both exogenous and endogenous reasons. Exogenous reasons often manifest themselves as biases in the training data, which then get reflected in the learned ranking policy and lead to rich-get-richer dynamics. But even when trained with unbiased data, reasons endogenous to the algorithms can lead to unfair allocation of opportunity. To overcome these challenges, I will present new machine learning algorithms that directly address both endogenous and exogenous unfairness.
(Cornell University)
Title: Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning
Abstract
Off-policy evaluation (OPE) is crucial for reinforcement learning in domains like medicine with limited exploration, but OPE is also notoriously difficult because the similarity between trajectories generated by any proposed policy and the observed data diminishes exponentially as horizons grow, known as the curse of horizon. To understand precisely when this curse bites, we consider for the first time the semi-parametric efficiency limits of OPE in Markov decision processes (MDP), establishing the best-possible estimation errors and characterizing the curse as a problem-dependent phenomenon rather than method-dependent. Efficiency in OPE is crucial because, without exploration, we must use the available data to its fullest. In finite horizons, this shows standard doubly-robust (DR) estimators are in fact inefficient for MDPs. In infinite horizons, while the curse renders certain problems fundamentally intractable, OPE may be feasible in ergodic time-invariant MDPs. We develop the first OPE estimator that achieves the efficiency limits in both setting, termed Double Reinforcement Learning (DRL). In both finite and infinite horizons, DRL improves upon existing estimators, which we show are inefficient, and leverages problem structure to its fullest in the face of the curse of horizon. We establish many favorable characteristics for DRL including efficiency even when nuisances are estimated slowly by blackbox models, finite-sample guarantees, and model double robustness.
(EPFL)
Title: From Data to Decisions: Distributionally Robust Optimization is Optimal
Abstract
We study stochastic optimization problems where the decision-maker cannot observe the distribution of the exogenous uncertainties but has access to a finite set of independent training samples. In this setting, the goal is to find a procedure that transforms the data to an estimate of the expected cost function under the unknown data-generating distribution, i.e., a predictor, and an optimizer of the estimated cost function that serves as a near-optimal candidate decision, i.e., a prescriptor. As functions of the data, predictors and prescriptors constitute statistical estimators. We propose a meta-optimization problem to find the least conservative predictors and prescriptors subject to constraints on their out-of-sample disappointment. The out-of-sample disappointment quantifies the probability that the actual expected cost of the candidate decision under the unknown true distribution exceeds its predicted cost. Leveraging tools from large deviations theory, we prove that this meta-optimization problem admits a unique solution: The best predictor-prescriptor pair is obtained by solving a distributionally robust optimization problem over all distributions within a given relative entropy distance from the empirical distribution of the data.
(University of Texas at Austin)
Title: Scaling Probabilistically Safe Learning to Robotics
Abstract
In recent years, high-confidence reinforcement learning algorithms have enjoyed success in application areas with high-quality models and plentiful data, but robotics remains a challenging domain for scaling up such approaches. Furthermore, very little work has been done on the even more difficult problem of safe imitation learning, in which the demonstrator's reward function is not known. This talk focuses on three recent developments in this emerging area of research: (1) a theory of safe imitation learning; (2) scalable reward inference in the absence of models; (3) efficient off-policy policy evaluation. The proposed algorithms offer a blend of safety and practicality, making a significant step towards safe robot learning with modest amounts of real-world data.
(Stanford University)
Title: On Safe and Efficient Human-robot Interactions via Multi-modal Intent Modeling and Reachability-based Safety Assurance
Abstract
In this talk I will present a decision-making and control stack for human-robot interactions by using autonomous driving as a motivating example. Specifically, I will first discuss a data-driven approach for learning multimodal interaction dynamics between robot-driven and human-driven vehicles based on recent advances in deep generative modeling. Then, I will discuss how to incorporate such a learned interaction model into a real-time, interaction-aware decision-making framework. The framework is designed to be minimally interventional; in particular, by leveraging backward reachability analysis, it ensures safety even when other cars defy the robot's expectations without unduly sacrificing performance. I will present recent results from experiments on a full-scale steer-by-wire platform, validating the framework and providing practical insights. I will conclude the talk by providing an overview of related efforts from my group on infusing safety assurances in robot autonomy stacks equipped with learning-based components, with an emphasis on adding structure within robot learning via control-theoretical and formal methods
(Georgia Tech)
Title: Recent Advances in Multistage Decision-making under Uncertainty: New Algorithms and Complexity Analysis
Abstract
In this talk, we will review some recent advances in the area of multistage decision making under uncertainty, especially in the domain of stochastic and robust optimization. We will present some new algorithmic development that allows for exactly solving huge-scale stochastic programs with integer recourse decisions, and algorithms in a dual perspective that can deal with infeasibility in problems. This significantly extends the scope of stochastic dual dynamic programming (SDDP) algorithms from convex or binary state variable cases to general nonconvex problems. We will also present a new analysis of the iteration complexity of the proposed algorithms. This settles some open questions in regards of the complexity of SDDP.
(Technion)
Title: Visual Plan Imagination - An Interpretable Robot Learning Framework
Abstract
How can we build autonomous robots that operate in unstructured and dynamic environments such as homes or hospitals? This problem has been investigated under several disciplines, including planning (motion planning, task planning, etc.), and reinforcement learning. While both of these fields have witnessed tremendous progress, each have fundamental drawbacks: planning approaches require substantial manual engineering in mapping perception to a formal planning problem, while RL, which can operate directly on raw percepts, is data hungry, cannot generalize to new tasks, and is ‘black box’ in nature.
Motivated by humans’ remarkable capability to imagine and plan complex manipulations of objects, and recent advances in imagining images such as GANs, we present Visual Plan Imagination (VPI) — a new computational problem that combines image imagination and planning. In VPI, given off-policy image data from a dynamical system, the task is to ‘imagine’ image sequences that transition the system from start to goal. Thus, VPI focuses on the essence of planning with high-dim perception, and abstracts away low level control and reward engineering. More importantly, VPI provides a safe and interpretable basis for robotic control — before the robot acts, a human inspects the imagined plan the robot will act upon, and can intervene if necessary.
I will describe our approach to VPI based on Causal InfoGAN, a deep generative model that learns features that are compatible with strong planning algorithms. We show that Causal InfoGAN can generate convincing visual plans, and we demonstrate learning to imagine and execute real robot rope manipulation from image data. I will also discuss our VPI simulation benchmarks, and recent efforts in novelty detection, an important component in VPI, and in safe decision making in general.
Workshop Schedule
List of Accepted Papers
- Andrew Bennett, Nathan Kallus. Policy Evaluation with Latent Confounders via Optimal Balance
- Carroll L. Wainwright, Peter Eckersley. SafeLife 1.0: Exploring Side Effects in Complex Environments
- Chao Tang, Yifei Fan, Anthony Yezzi. An Adaptive View of Adversarial Robustness from Test-time Smoothing Defense
- Daniel S. Brown, Scott Niekum. Deep Bayesian Reward Learning from Preferences
- David I. Inouye, Liu Leqi, Joon Sik Kim, Bryon Aragam, Pradeep Ravikumar. Diagnostic Curves for Black Box Models
- David Venuto, Léonard Boussioux,Junhao Wang, Rola Dali, Jhelum Chakravorty, Yoshua Bengio, Doina Precup. Avoidance Learning Using Observational Reinforcement Learning
- Divyam Madaan, Sung Ju Hwang. Adversarial Neural Pruning
- Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof. Combining Reward Information from Multiple Sources
- Eleanor Quint, Dong Xu, Haluk Dogan, Zeynep Hakguder, Stephen Scott, Matthew Dwyer. Formal Language Constraints for Markov Decision Processes
- Gavin S. Hartnett, Andrew J. Lohn, Alexander P. Sedlack. Adversarial Examples for Cost-Sensitive Classifiers
- Geoffroy Dubourg-Felonneau, Omar Darwish, Christopher Parsons, Dàmi Rebergen, John W Cassidy, Nirmesh Patel, Harry Clifford. Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer
- Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan. Scaled Autonomy: Enabling Human Operators to Control Robot Fleets
- Jason Carter, Marek Petrik. Robust Risk-Averse Sequential Decision Making
- Javier Garcia-Barcos, Ruben Martinez-Cantin. Parallel Robust Bayesian Optimization with Off-policy Evaluations
- Jesse Zhang, Brian Cheung, Chelsea Finn, Dinesh Jayaraman, Sergey Levine. Hope For The Best But Prepare For The Worst: Cautious Adaptation In RL Agents
- Julian Bolick, Gino Brunner, Oliver Richter, Roger Wattenhofer. Tunnel Vision Attack on IMPALA - Questioning the Robustness of Reinforcement Learning Agents
- Justin Cosentino, Federico Zaiter, Dan Pei, Jun Zhu. The Search for Sparse, Robust Neural Networks
- Kai Y. Xiao, Sven Gowal, Todd Hester, Rae Jeong, Daniel J. Mankowitz, Yuanyuan Shi, Tsui-Wei Weng. Learning Neural Dynamics Simulators with Adversarial Specification Training
- Mahdieh Abbasi, Changjian Shui, Arezoo Rajabi, Christian Gagné, Rakesh B. Bobba. Metrics for Differentiating Out-of-Distribution Sets
- Marco Gallieri, Seyed Sina Mirrazavi Salehian, Nihat Engin Toklu, Alessio Quaglino, Jonathan Masci, Jan Koutník, Faustino Gomez. Safe Interactive Model-Based Learning
- Mathieu Seurin, Philippe Preux, Olivier Pietquin. "I’m sorry Dave, I’m afraid I can’t do that" Deep-Q Learning From Forbidden Actions
- Matthew Sotoudeh, Aditya V. Thakur. Correcting Deep Neural Networks with Small, Generalizing Patches
- Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. Striving for Simplicity in Off-Policy Deep Reinforcement Learning
- Sachin Vernekar, Ashish Gaurav, Vahdat Abdelzad, Taylor Denouden, Rick Salay, Krzysztof Czarnecki. Out-of-distribution Detection in Classifiers via Generation
- Sai Kiran Narayanaswami, Nandan Sudarsanam, Balaraman Ravindran. An Active Learning Framework for Efficient Robust Policy Search
- Samuel Daulton, Shaun Singh, Vashist Avadhanula, Drew Dimmery, Eytan Bakshy. Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints
- Stephen Mell, Olivia Brown, Justin Goodwin and Sung-Hyun Son. Safe Predictors for Enforcing Input-Output Specifications
- Vedant Nanda, Junaid Ali, Krishna P. Gummadi, Muhammad Bilal Zafar. Unifying Model Explainability and Accuracy through Reasoning Labels
- .Victoria Krakovna, Laurent Orseau, Miljan Martic, Shane Legg. Avoiding Side Effects By Considering Future Tasks
- Yuanyuan Shi, Kai Y. Xiao, Daniel J. Mankowitz, Rae Jeong, Nir Levine, Sven Gowal, Timothy Mann, Todd Hester. Data-Driven Robust Reinforcement Learning for Continuous Control
- Ahana Ghosh, Sebastian Tschiatschek, Hamed Mahdavi, Adish Singla. Towards Deployment of Robust AI Agents for Human-Machine Partnerships
- Ahmadreza Jeddi, Mohammad Javad Shafiee, Michelle Karg, Christian Scharfenberger, Alexander Wong. Learn2Perturb: Improving Adversarial Robustness on Deep Neural Networks through End-to-end Feature Perturbation Learning
- Akhilan Boopathy, Tsui-Wei Weng, Sijia Liu, Pin-Yu Chen, Luca Daniel. DoubleMargin: Efficient Training of Robust and Verifiable Neural Networks
- Alex Tamkin, Ramtin Keramati, Christoph Dann, Emma Brunskill. Distributionally-Aware Exploration for CVaR Bandits
- Ali Baheri, Subramanya Nageshrao,Ilya Kolmanovsky, Anouck Girard, H. Eric Tseng, Dimitar Filev. Deep Q-Learning with Dynamically-Learned Safety Module: A Case Study in Autonomous Driving
- Amir Nazemi, Paul Fieguth. Identifying threatening samples for adversarial robustness
- Anqi Liu, Guanya Shi, Soon-Jo Chung, Anima Anandkumar, Yisong Yue. Robust Regression for Safe Exploration in Control
- Anqi Liu, Hao Liu, Anima Anandkumar, Yisong Yue. Triply Robust Off-Policy Evaluation
- Arushi Jain, Doina Precup. Learning Options using Constrained Return Variance
- Benjamin Eysenbach, Jacob Tyo, Shane Gu, Ruslan Salakhutdinov, Zachary Lipton, Sergey Levine. Reinforcement Learning with Unknown Reward Functions
- Benjie Wang, Stefan Webb, Tom Rainforth. Statistically Robust Neural Network Classification
- Boxiao Chen, Selvaprabu Nadarajah, Stefanus Jasin. Robust Demand Learning
- Chandramouli S Sastry, Sageev Oore. Detecting Out-of-Distribution Examples with In-distribution Examples and Gram Matrices
- Dhruv Ramani, Benjamin Eysenbach. Avoiding Negative Side-Effects And Promoting Safe Exploration With Imaginative Planning
- Dimitrios I. Diochnos, Saeed Mahloujifar, Mohammad Mahmoody. Lower Bounds for Adversarially Robust PAC Learning
- Dotan Di Castro, Joel Oren, Shie Mannor. Practical Risk Measures in Reinforcement Learning
- Doyup Lee, Yeongjae Cheon, Woo-sung Jung. Anomaly Detection in Visual Question Answering
- Elmira Amirloo Abolfathi, Jun Luo, Peyman Yadmellat, Kasra Rezaee. CoachNet: An Adversarial Sampling Approach for Reinforcement Learning
- Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric and Matteo Pirotta. Improved Algorithms for Conservative Exploration in Bandits
- Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric and Matteo Pirotta. Conservative Exploration in Finite Horizon Markov Decision Processes
- I-Te Danny Hung, Chao-Han Huck Yang, Yi Ouyang, Pin-Yu Chen, Chin-Hui Lee, Xiaoli Ma. Toward Resilient Reinforcement Learning: Causal Inference Q-Networks
- Ilija Bogunovic, Andreas Krause, Jonathan Scarlett. Bayesian Optimization with Adversarial Corruption
- Jeet Mohapatra, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel. Towards Verifying Robustness of Neural Networks Against Semantic Perturbations
- Jian Qian, Merwan Barlier, Igor Colin, Ludovic Dos Santos. Safe Reinforcement Learning with Probabilistic Constraints
- John D. Martin, Michal Lyskawinski, Xiaohu Li, Brendan Englot. Stochastically Dominant Distributional Reinforcement Learning
- Karthik Abinav Sankararaman, Anand Louis, Navin Goyal. Robust Identifiability in Linear Structural Equation Models of Causal Inference
- Kyriakos Polymenakos, Luca Laurenti, Andrea Patane, Jan-Peter Calliess, Luca Cardelli, Marta Kwiatkowska, Alessandro Abate, Stephen Roberts. Safety Guarantees for Planning Based on Iterative Gaussian Process
- Lan Hoang, Edward Pyzer-Knapp. Distributional Actor-Critic for Risk-Sensitive Multi-Agent Reinforcement Learning
- Matteo Papini, Andrea Battistello, Marcello Restelli. Safe Exploration in Gaussian Policy Gradient
- Matteo Turchetta, Andreas Krause, Sebastian Trimpe. Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization
- Nader Asadi, Amir M. Sarfi, Mehrdad Hosseinzadeh, Sahba Tahsini, Mahdi Eftekhari. Diminishing the Effect of Adversarial Perturbations via Refining Feature Representation
- Nathan Fulton, Nathan Hunt, Nghia Hoang, Subhro Das. Formal Verification of End-to-End Learning in Cyber-Physical Systems: Progress and Challenges
- Nathan Kallus, Masatoshi Uehara. Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes
- Niranjani Prasad, Barbara E Engelhardt, Finale Doshi-Velez. Defining Admissible Rewards for High-Confidence Policy Evaluation
- Patrik Kolaric, Devesh K. Jha, Diego Romeres, Arvind U Raghunathan, Mouhacine Benosman, Daniel Nikovski. Robust Optimization for Trajectory-Centric Model-based Reinforcement Learning
- Philipp Renz, Sepp Hochreiter, Gunter Klaumauer. Uncertainty Estimation Methods to Support Decision-Making in Early Phases of Drug Discovery
- Prateek Jaiswal, Harsha Honnappa, Vinayak A. Rao. Variational Inference for Risk-Sensitive Decision-Making
- Reazul Hasan Russel, Bahram Behzadian, Marek Petrik. Optimizing Norm-bounded Weighted Ambiguity Sets for Robust MDPs
- Riashat Islam, Raihan Seraj, Samin Yeasar Arnob, Doina Precup. Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning
- Ryan-Rhys Griffiths, Miguel Garcia-Ortegon, Alexander A. Aldrick, Alpha A. Lee. Achieving Robustness to Aleatoric Uncertainty with Heteroscedastic Bayesian Optimisation
- Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar. Regret Analysis of Partially Observable Linear Quadratic Systems
- Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody. Adversarially Robust Learning Could Leverage Computational Hardness
- Sebastian East, Marco Gallieri, Jonathan Masci, Jan Koutník, Mark Cannon. Infinite-Horizon Differentiable Model Predictive Control
- Shun Zhang, Edmund H. Durfee, and Satinder Singh. Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Processes
- Soheil Sibdari. On the Robustness of Data-driven Pricing Decisions with Revenue Maximization
- Theodore Vasiloudis, Gianmarco De Francisci Morales, Henrik Bostrom. Quantifying Uncertainty in Online Regression Forests
- Wenhao Luo, Ashish Kapoor. Airborne Collision Avoidance Systems with Probabilistic Safety Barrier Certificates
- Wesley Chung, Shin-ichi Maeda, Yasuhiro Fujita. Safe Reinforcement Learning with Adversarial Threat Functions
- Yuh-Shyang Wang, Tsui-Wei Weng, Luca Daniel. Verification of Neural Network Control Policy Under Persistent Adversarial Perturbation
- Z. Cranko, Z. Shi, X. Zhang, S. Kornblith, R. Nock, M. Li, Y. Ma, H. Jin. Certifying Distributional Robustness using Lipschitz Regularisation
- Ziping Xu, Ambuj Tewari . Worst-Case Regret Bound for Perturbation Based Robust Exploration in Reinforcement Learning
- Ibrahim Ben Daya, Mohammad Javad Shafiee, Michelle Karg, Christian Scharfenberger, Alexander Wong. SANER: Efficient Stochastically Activated Network Ensembles for Adversarial Robustness Through Randomized Assembly
- Maxime Wabartha, Andrey Durand, Vincent Francois-Lavet, Joelle Pineau. Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks
- Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom. Sequence-to-Sequence Adversarial Attack for Detecting Inconsistent Neural Explanations
- Oana-Maria Camburu, Brendan Shillingford, Pasquale Minervini, Thomas Lukasiewicz, Phil Blunsom. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations