Pure pursuit with linear velocity regulation

Macenski, S., Singh, S., Mart�n, F. et al. Regulated pure pursuit for robot path tracking, Auton Robot 47, 685\u2013694 (2023) DOI: 10.1007/s10514-023-10097-6.

The accelerated deployment of service robots have spawned a number of algorithm variations to better handle real-world conditions. Many local trajectory planning techniques have been deployed on practical robot systems successfully. While most formulations of Dynamic Window Approach and Model Predictive Control can progress along paths and optimize for additional criteria, the use of pure path tracking algorithms is still commonplace. Decades later, Pure Pursuit and its variants continues to be one of the most commonly utilized classes of local trajectory planners. However, few Pure Pursuit variants have been proposed with schema for variable linear velocities\u2014they either assume a constant velocity or fails to address the point at all. This paper presents a variant of Pure Pursuit designed with additional heuristics to regulate linear velocities, built atop the existing Adaptive variant. The Regulated Pure Pursuit algorithm makes incremental improvements on state of the art by adjusting linear velocities with particular focus on safety in constrained and partially observable spaces commonly negotiated by deployed robots. We present experiments with the Regulated Pure Pursuit algorithm on industrial-grade service robots. We also provide a high-quality reference implementation that is freely included ROS 2 Nav2 framework at https://github.com/ros-planning/navigation2 for fast evaluation.

UWB for SLAM

H. A. G. C. Premachandra, R. Liu, C. Yuen and U. -X. Tan, UWB Radar SLAM: An Anchorless Approach in Vision Denied Indoor Environments, IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5299-5306, Sept. 2023 DOI: 10.1109/LRA.2023.3293354.

LiDAR and cameras are frequently used as sensors for simultaneous localization and mapping (SLAM). However, these sensors are prone to failure under low visibility (e.g. smoke) or places with reflective surfaces (e.g. mirrors). On the other hand, electromagnetic waves exhibit better penetration properties when the wavelength increases, thus are not affected by low visibility. Hence, this letter presents ultra-wideband (UWB) radar as an alternative to the existing sensors. UWB is generally known to be used in anchor-tag SLAM systems. One or more anchors are installed in the environment and the tags are attached to the robots. Although this method performs well under low visibility, modifying the existing infrastructure is not always feasible. UWB has also been used in peer-to-peer ranging collaborative SLAM systems. However, this requires more than a single robot and does not include mapping in the mentioned environment like smoke. Therefore, the presented approach in this letter solely depends on the UWB transceivers mounted on-board. In addition, an extended Kalman filter (EKF) SLAM is used to solve the SLAM problem at the back-end. Experiments were conducted and demonstrated that the proposed UWB-based radar SLAM is able to map natural point landmarks inside an indoor environment while improving robot localization.

They had to do it: Certified RL (through online reward shaping/definition)

Hosein Hasanbeig, Daniel Kroening, Alessandro Abate, Certified reinforcement learning with logic guidance, Artificial Intelligence, Volume 322, 2023 DOI: 10.1016/j.artint.2023.103949.

Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied to a variety of control problems. However, applications in safety-critical domains require a systematic and formal approach to specifying requirements as tasks or goals. We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs). The given LTL property is translated into a Limit-Deterministic Generalised B�chi Automaton (LDGBA), which is then used to shape a synchronous reward function on-the-fly. Under certain assumptions, the algorithm is guaranteed to synthesise a control policy whose traces satisfy the LTL specification with maximal probability.

Meta-RL: given a distribution of tasks, learn a policy capable of adapting to any new task from the task distribution with as little data as possible

Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson, A Survey of Meta-Reinforcement Learning, arXiv:2301.08028 [cs.LG], 2023 DOI: 10.48550/arXiv.2301.08028.

While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.

Using “empowerment” to better select actions in RL when there are only sparse rewards

Dai, S., Xu, W., Hofmann, A. et al. An empowerment-based solution to robotic manipulation tasks with sparse rewards, Auton Robot 47, 617\u2013633 (2023) DOI: 10.1007/s10514-023-10087-8.

In order to provide adaptive and user-friendly solutions to robotic manipulation, it is important that the agent can learn to accomplish tasks even if they are only provided with very sparse instruction signals. To address the issues reinforcement learning algorithms face when task rewards are sparse, this paper proposes an intrinsic motivation approach that can be easily integrated into any standard reinforcement learning algorithm and can allow robotic manipulators to learn useful manipulation skills with only sparse extrinsic rewards. Through integrating and balancing empowerment and curiosity, this approach shows superior performance compared to other state-of-the-art intrinsic exploration approaches during extensive empirical testing. When combined with other strategies for tackling the exploration challenge, e.g. curriculum learning, our approach is able to further improve the exploration efficiency and task success rate. Qualitative analysis also shows that when combined with diversity-driven intrinsic motivations, this approach can help manipulators learn a set of diverse skills which could potentially be applied to other more complicated manipulation tasks and accelerate their learning process.

Using Deep RL (TRPO) for selecting best interest points in the environment for path planning

Jie Fan, Xudong Zhang, Yuan Zou, Hierarchical path planner for unknown space exploration using reinforcement learning-based intelligent frontier selection, Expert Systems with Applications, Volume 230, 2023 DOI: 10.1016/j.eswa.2023.120630.

Path planning in unknown environments is extremely useful for some specific tasks, such as exploration of outer space planets, search and rescue in disaster areas, home sweeping services, etc. However, existing frontier-based path planners suffer from insufficient exploration, while reinforcement learning (RL)-based ones are confronted with problems in efficient training and effective searching. To overcome the above problems, this paper proposes a novel hierarchical path planner for unknown space exploration using RL-based intelligent frontier selection. Firstly, by decomposing the path planner into three-layered architecture (including the perception layer, planning layer, and control layer) and using edge detection to find potential frontiers to track, the path search space is shrunk from the whole map to a handful of points of interest, which significantly saves the computational resources in both training and execution processes. Secondly, one of the advanced RL algorithms, trust region policy optimization (TRPO), is used as a judge to select the best frontier for the robot to track, which ensures the optimality of the path planner with a shorter path length. The proposed method is validated through simulation and compared with both classic and state-of-the-art methods. Results show that the training process could be greatly accelerated compared with the traditional deep-Q network (DQN). Moreover, the proposed method has 4.2%\u201314.3% improvement in exploration region rate and achieves the highest exploration completeness.

Monte Carlo Tree Search (MTCS) with hybrid discrete-continuous beliefs, applied to robotics

M. Barenboim, M. Shienman and V. Indelman, Monte Carlo Planning in Hybrid Belief POMDPs, IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4410-4417, Aug. 2023 DOI: 10.1109/LRA.2023.3282773.

Real-world problems often require reasoning about hybrid beliefs, over both discrete and continuous random variables. Yet, such a setting has hardly been investigated in the context of planning. Moreover, existing online partially observable Markov decision processes (POMDPs) solvers do not support hybrid beliefs directly. In particular, these solvers do not address the added computational burden due to an increasing number of hypotheses with the planning horizon, which can grow exponentially. As part of this work, we present a novel algorithm, Hybrid Belief Monte Carlo Planning (HB-MCP) that utilizes the Monte Carlo Tree Search (MCTS) algorithm to solve a POMDP while maintaining a hybrid belief. We illustrate how the upper confidence bound (UCB) exploration bonus can be leveraged to guide the growth of hypotheses trees alongside the belief trees. We then evaluate our approach in highly aliased simulated environments where unresolved data association leads to multi-modal belief hypotheses.

A survey of guided RL for improving its application on robotics

J. E�er, N. Bach, C. Jestel, O. Urbann and S. Kerner, Guided Reinforcement Learning: A Review and Evaluation for Efficient and Effective Real-World Robotics [Survey], IEEE Robotics & Automation Magazine, vol. 30, no. 2, pp. 67-85, June 2023 DOI: 10.1109/MRA.2022.3207664.

Recent successes aside, reinforcement learning (RL) still faces significant challenges in its application to the real-world robotics domain. Guiding the learning process with additional knowledge offers a potential solution, thus leveraging the strengths of data- and knowledge-driven approaches. However, this field of research encompasses several disciplines and hence would benefit from a structured overview.

In this article, we propose a concept of guided RL that provides a systematic approach toward accelerating the training process and improving performance for real-world robotics settings. We introduce a taxonomy that structures guided RL approaches and shows how different sources of knowledge can be integrated into the learning pipeline in a practical way. Based on this, we describe available approaches in this field and quantitatively evaluate their specific impact in terms of efficiency, effectiveness, and sim-to-real transfer within the robotics domain.

Comprehensive survey of the history and state of the art of active SLAM

J. A. Placed et al., A Survey on Active Simultaneous Localization and Mapping: State of the Art and New Frontiers, IEEE Transactions on Robotics, vol. 39, no. 3, pp. 1686-1705 DOI: 10.1109/TRO.2023.3248510.

Active simultaneous localization and mapping (SLAM) is the problem of planning and controlling the motion of a robot to build the most accurate and complete model of the surrounding environment. Since the first foundational work in active perception appeared, more than three decades ago, this field has received increasing attention across different scientific communities. This has brought about many different approaches and formulations, and makes a review of the current trends necessary and extremely valuable for both new and experienced researchers. In this article, we survey the state of the art in active SLAM and take an in-depth look at the open challenges that still require attention to meet the needs of modern applications. After providing a historical perspective, we present a unified problem formulation and review the well-established modular solution scheme, which decouples the problem into three stages that identify, select, and execute potential navigation actions. We then analyze alternative approaches, including belief-space planning and deep reinforcement learning techniques, and review related work on multirobot coordination. This article concludes with a discussion of new research directions, addressing reproducible research, active spatial perception, and practical applications, among other topics.

Leveraging the unexplainability and opacity of NNs to generate random numbers

Y. Almardeny, A. Benavoli, N. Boujnah and E. Naredo, A Reinforcement Learning System for Generating Instantaneous Quality Random Sequences, IEEE Transactions on Artificial Intelligence, vol. 4, no. 3, pp. 402-415, June 2023 DOI: 10.1109/TAI.2022.3161893.

Random numbers are essential to most computer applications. Still, producing high-quality random sequences is a big challenge. Inspired by the success of artificial neural networks and reinforcement learning, we propose a novel and effective end-to-end learning system to generate pseudorandom sequences that operates under the upside-down reinforcement learning framework. It is based on manipulating the generalized information entropy metric to derive commands that instantaneously guide the agent toward the optimal random behavior. Using a wide range of evaluation tests, the proposed approach is compared against three state-of-the-art accredited pseudorandom number generators (PRNGs). The experimental results agree with our theoretical study and show that the proposed framework is a promising candidate for a wide range of applications.