Tag Archives: Reinforcement Learning

Using RL as a framework to study political issues

Lion Schulz, Rahul Bhui, Political reinforcement learners, Trends in Cognitive Sciences, Volume 28, Issue 3, 2024, Pages 210-222 DOI: 10.1016/j.tics.2023.12.001.

Politics can seem home to the most calculating and yet least rational elements of humanity. How might we systematically characterize this spectrum of political cognition? Here, we propose reinforcement learning (RL) as a unified framework to dissect the political mind. RL describes how agents algorithmically navigate complex and uncertain domains like politics. Through this computational lens, we outline three routes to political differences, stemming from variability in agents\u2019 conceptions of a problem, the cognitive operations applied to solve the problem, or the backdrop of information available from the environment. A computational vantage on maladies of the political mind offers enhanced precision in assessing their causes, consequences, and cures.

On the complexities of RL when it confronts the real (natural) world

Toby Wise, Kara Emery, Angela Radulescu, Naturalistic reinforcement learning, Trends in Cognitive Sciences, Volume 28, Issue 2, 2024, Pages 144-158 DOI: 10.1016/j.tics.2023.08.016.

Humans possess a remarkable ability to make decisions within real-world environments that are expansive, complex, and multidimensional. Human cognitive computational neuroscience has sought to exploit reinforcement learning (RL) as a framework within which to explain human decision-making, often focusing on constrained, artificial experimental tasks. In this article, we review recent efforts that use naturalistic approaches to determine how humans make decisions in complex environments that better approximate the real world, providing a clearer picture of how humans navigate the challenges posed by real-world decisions. These studies purposely embed elements of naturalistic complexity within experimental paradigms, rather than focusing on simplification, generating insights into the processes that likely underpin humans\u2019 ability to navigate complex, multidimensional real-world environments so successfully.

Including safety learning in RL for improving the sim-to-lab gap

Kai-Chieh Hsu, Allen Z. Ren, Duy P. Nguyen, Anirudha Majumdar, Jaime F. Fisac, Sim-to-Lab-to-Real: Safe reinforcement learning with shielding and generalization guarantees, Artificial Intelligence, Volume 314, 2023 DOI: 10.1016/j.artint.2022.103811.

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the Safety Bellman Equation based on Hamilton-Jacobi (HJ) reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. Additionally, inheriting from the HJ reachability analysis, the bound accounts for the expectation over the worst-case safety in each environment. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments with varying degrees of photorealism. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.

Adaptation of industrial robots to variations in tasks through RL

Tian Yu, Qing Chang, User-guided motion planning with reinforcement learning for human-robot collaboration in smart manufacturing, Expert Systems with Applications, Volume 209, 2022 DOI: 10.1016/j.eswa.2022.118291.

In today\u2019s manufacturing system, robots are expected to perform increasingly complex manipulation tasks in collaboration with humans. However, current industrial robots are still largely preprogrammed with very little autonomy and still required to be reprogramed by robotics experts for even slightly changed tasks. Therefore, it is highly desirable that robots can adapt to certain task changes with motion planning strategies to easily work with non-robotic experts in manufacturing environments. In this paper, we propose a user-guided motion planning algorithm in combination with reinforcement learning (RL) method to enable robots automatically generate their motion plans for new tasks by learning from a few kinesthetic human demonstrations. Features of common human demonstrated tasks in a specific application environment, e.g., desk assembly or warehouse loading/unloading are abstracted and saved in a library. The definition of semantical similarity between features in the library and features of a new task is proposed and further used to construct the reward function in RL. To achieve an adaptive motion plan facing task changes or new task requirements, features embedded in the library are mapped to appropriate task segments based on the trained motion planning policy using Q-learning. A new task can be either learned as a combination of a few features in the library or a requirement for further human demonstration if the current library is insufficient for the new task. We evaluate our approach on a 6 DOF UR5e robot on multiple tasks and scenarios and show the effectiveness of our method with respect to different scenarios.

On the extended use of RL for navigation in UAVs

Fadi AlMahamid, Katarina Grolinger, Autonomous Unmanned Aerial Vehicle navigation using Reinforcement Learning: A systematic review, Engineering Applications of Artificial Intelligence, Volume 115, 2022 DOI: 10.1016/j.engappai.2022.105321.

There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones, in different applications such as packages delivery, traffic monitoring, search and rescue operations, and military combat engagements. In all of these applications, the UAV is used to navigate the environment autonomously \u2014 without human interaction, perform specific tasks and avoid obstacles. Autonomous UAV navigation is commonly accomplished using Reinforcement Learning (RL), where agents act as experts in a domain to navigate the environment while avoiding obstacles. Understanding the navigation environment and algorithmic limitations plays an essential role in choosing the appropriate RL algorithm to solve the navigation problem effectively. Consequently, this study first identifies the main UAV navigation tasks and discusses navigation frameworks and simulation software. Next, RL algorithms are classified and discussed based on the environment, algorithm characteristics, abilities, and applications in different UAV navigation problems, which will help the practitioners and researchers select the appropriate RL algorithms for their UAV navigation use cases. Moreover, identified gaps and opportunities will drive UAV navigation research.

Survey on RL applied to cyber-security

Amrin Maria Khan Adawadkar, Nilima Kulkarni, Cyber-security and reinforcement learning \u2014 A brief survey, Engineering Applications of Artificial Intelligence, Volume 114, 2022, DOI: 10.1016/j.engappai.2022.105116.

This paper presents a comprehensive literature review on Reinforcement Learning (RL) techniques used in Intrusion Detection Systems (IDS), Intrusion Prevention Systems (IPS), Internet of Things (IoT) and Identity and Access Management (IAM). This study reviews scientific documents such as journals and articles, from 2010 to 2021, extracted from the Science Direct, ACM, IEEEXplore, and Springer database. Most of the research articles published in 2020 and 2021, for cybersecurity and RL are for IDS classifiers and resource optimization in IoTs. Some datasets used for training RL-based IDS algorithms are NSL-KDD, CICIDS, and AWID. There are few datasets and publications for IAM. The few that exist focus on the physical layer authentication. The current state of the art lacks standard evaluation criteria, however, we have identified parameters like detection rate, precision, and accuracy which can be used to compare the algorithms employing RL. This paper is suitable for new researchers, students, and beginners in the field of RL who want to learn about the field and identify problem areas.

Unexpected consequences of training smarthome systems with reinforcement learning: effects on human behaviours

S. Suman, A. Etemad and F. Rivest, TPotential Impacts of Smart Homes on Human Behavior: A Reinforcement Learning Approach, IEEE Transactions on Artificial Intelligence, vol. 3, no. 4, pp. 567-580, Aug. 2022 DOI: 10.1109/TAI.2021.3127483.

Smart homes are becoming increasingly popular as a result of advances in machine learning and cloud computing. Devices, such as smart thermostats and speakers, are now capable of learning from user feedback and adaptively adjust their settings to human preferences. Nonetheless, these devices might in turn impact human behavior. To investigate the potential impacts of smart homes on human behavior, we simulate a series of hierarchical-reinforcement learning-based human models capable of performing various activities\u2014namely, setting temperature and humidity for thermal comfort inside a Q-Learning-based smart home model. We then investigate the possibility of the human models\u2019 behaviors being altered as a result of the smart home and the human model adapting to one another. For our human model, the activities are based on hierarchical-reinforcement learning. This allows the human to learn how long it must continue a given activity and decide when to leave it. We then integrate our human model in the environment along with the smart home model and perform rigorous experiments considering various scenarios involving a model of a single human and models of two different humans with the smart home. Our experiments show that with the smart home, the human model can exhibit unexpected behaviors such as frequent changing of activities and an increase in the time required to modify the thermal preferences. With two human models, we interestingly observe that certain combinations of models result in normal behaviors, while other combinations exhibit the same unexpected behaviors as those observed from the single human experiment.

Human+machine sequential decision making

Q. Zhang, Y. Kang, Y. -B. Zhao, P. Li and S. You, Traded Control of Human\u2013Machine Systems for Sequential Decision-Making Based on Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 3, no. 4, pp. 553-566, Aug. 2022 DOI: 10.1109/TAI.2021.3127857.

Sequential decision-making (SDM) is a common type of decision-making problem with sequential and multistage characteristics. Among them, the learning and updating of policy are the main challenges in solving SDM problems. Unlike previous machine autonomy driven by artificial intelligence alone, we improve the control performance of SDM tasks by combining human intelligence and machine intelligence. Specifically, this article presents a paradigm of a human\u2013machine traded control systems based on reinforcement learning methods to optimize the solution process of sequential decision problems. By designing the idea of autonomous boundary and credibility assessment, we enable humans and machines at the decision-making level of the systems to collaborate more effectively. And the arbitration in the human\u2013machine traded control systems introduces the Bayesian neural network and the dropout mechanism to consider the uncertainty and security constraints. Finally, experiments involving machine traded control, human traded control were implemented. The preliminary experimental results of this article show that our traded control method improves decision-making performance and verifies the effectiveness for SDM problems.

Action selection strategy for model-free RL based on neurophysiology

D. Wang, S. Chen, Y. Hu, L. Liu and H. Wang, Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 1, pp. 219-233, March 2022 DOI: 10.1109/TCDS.2020.3035778.

Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor\u2013critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

Using a physical simulator for sampled rollouts in stochastic optimal control

Carius J, Ranftl R, Farshidian F, Hutter M. Constrained stochastic optimal control with learned importance sampling: A path integral approach, The International Journal of Robotics Research. 2022;41(2):189-209, DOI: 10.1177/02783649211047890.

Modern robotic systems are expected to operate robustly in partially unknown environments. This article proposes an algorithm capable of controlling a wide range of high-dimensional robotic systems in such challenging scenarios. Our method is based on the path integral formulation of stochastic optimal control, which we extend with constraint-handling capabilities. Under our control law, the optimal input is inferred from a set of stochastic rollouts of the system dynamics. These rollouts are simulated by a physics engine, placing minimal restrictions on the types of systems and environments that can be modeled. Although sampling-based algorithms are typically not suitable for online control, we demonstrate in this work how importance sampling and constraints can be used to effectively curb the sampling complexity and enable real-time control applications. Furthermore, the path integral framework provides a natural way of incorporating existing control architectures as ancillary controllers for shaping the sampling distribution. Our results reveal that even in cases where the ancillary controller would fail, our stochastic control algorithm provides an additional safety and robustness layer. Moreover, in the absence of an existing ancillary controller, our method can be used to train a parametrized importance sampling policy using data from the stochastic rollouts. The algorithm may thereby bootstrap itself by learning an importance sampling policy offline and then refining it to unseen environments during online control. We validate our results on three robotic systems, including hardware experiments on a quadrupedal robot.