Category Archives: Robotics

Continuous POMDPs through belief state sparsification, applied to active SLAM

Elimelech K, Indelman V. Simplified decision making in the belief space using belief sparsification. The International Journal of Robotics Research. 2022;41(5):470-496 DOI: 10.1177/02783649221076381.

In this work, we introduce a new and efficient solution approach for the problem of decision making under uncertainty, which can be formulated as decision making in a belief space, over a possibly high-dimensional state space. Typically, to solve a decision problem, one should identify the optimal action from a set of candidates, according to some objective. We claim that one can often generate and solve an analogous yet simplified decision problem, which can be solved more efficiently. A wise simplification method can lead to the same action selection, or one for which the maximal loss in optimality can be guaranteed. Furthermore, such simplification is separated from the state inference and does not compromise its accuracy, as the selected action would finally be applied on the original state. First, we present the concept for general decision problems and provide a theoretical framework for a coherent formulation of the approach. We then practically apply these ideas to decision problems in the belief space, which can be simplified by considering a sparse approximation of their initial belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a realistic active-SLAM problem and manage to significantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical and holds numerous possible extensions.

Hybridizing model-free and model-based in continuous RL, and a nice review of current research and benchmarks in robotics

Pinosky A, Abraham I, Broad A, Argall B, Murphey TD. Hybrid control for combining model-based and model-free reinforcement learning The International Journal of Robotics Research. 2023;42(6):337-355 DOI: 10.1177/02783649221083331.

We develop an approach to improve the learning capabilities of robotic systems by combining learned predictive models with experience-based state-action policy mappings. Predictive models provide an understanding of the task and the dynamics, while experience-based (model-free) policy mappings encode favorable actions that override planned actions. We refer to our approach of systematically combining model-based and model-free learning methods as hybrid learning. Our approach efficiently learns motor skills and improves the performance of predictive models and experience-based policies. Moreover, our approach enables policies (both model-based and model-free) to be updated using any off-policy reinforcement learning method. We derive a deterministic method of hybrid learning by optimally switching between learning modalities. We adapt our method to a stochastic variation that relaxes some of the key assumptions in the original derivation. Our deterministic and stochastic variations are tested on a variety of robot control benchmark tasks in simulation as well as a hardware manipulation task. We extend our approach for use with imitation learning methods, where experience is provided through demonstrations, and we test the expanded capability with a real-world pick-and-place task. The results show that our method is capable of improving the performance and sample efficiency of learning motor skills in a variety of experimental domains.

How plans influence sensors

McFassel G, Shell DA. Reactivity and statefulness: Action-based sensors, plans, and necessary state. The International Journal of Robotics Research. 2023;42(6):385-411 DOI: 10.1177/02783649221078874.

Typically to a roboticist, a plan is the outcome of other work, a synthesized object that realizes ends defined by some problem; plans qua plans are seldom treated as first-class objects of study. Plans designate functionality: a plan can be viewed as defining a robot\u2019s behavior throughout its execution. This informs and reveals many other aspects of the robot\u2019s design, including: necessary sensors and action choices, history, state, task structure, and how to define progress. Interrogating sets of plans helps in comprehending the ways in which differing executions influence the interrelationships between these various aspects. Revisiting Erdmann\u2019s theory of action-based sensors, a classical approach for characterizing fundamental information requirements, we show how plans (in their role of designating behavior) influence sensing requirements. Using an algorithm for enumerating plans, we examine how some plans for which no action-based sensor exists can be transformed into sets of sensors through the identification and handling of features that preclude the existence of action-based sensors. We are not aware of those obstructing features having been previously identified. Action-based sensors may be treated as standalone reactive plans; we relate them to the set of all possible plans through a lattice structure. This lattice reveals a boundary between plans with action-based sensors and those without. Some plans, specifically those that are not reactive plans and require some notion of internal state, can never have associated action-based sensors. Even so, action-based sensors can serve as a framework to explore and interpret how such plans make use of state.

POMDPs in robotics: QMDP-Net as a counterpart for the Partially Observable Markov Decision Process (POMDP) whose transition, observation, and reward functions are initially unknown

Collins N, Kurniawati H. Locally connected interrelated network: A forward propagation primitive, The International Journal of Robotics Research. 2023;42(6):371-384 DOI: 10.1177/02783649221093092.

End-to-end learning for planning is a promising approach for finding good robot strategies in situations where the state transition, observation, and reward functions are initially unknown. Many neural network architectures for this approach have shown positive results. Across these networks, seemingly small components have been used repeatedly in different architectures, which means improving the efficiency of these components has great potential to improve the overall performance of the network. This paper aims to improve one such component: The forward propagation module. In particular, we propose Locally Connected Interrelated Network (LCI-Net) \u2013 a novel type of locally connected layer with unshared but interrelated weights \u2013 to improve the efficiency of learning stochastic transition models for planning and propagating information via the learned transition models. LCI-Net is a small differentiable neural network module that can be plugged into various existing architectures. For evaluation purposes, we apply LCI-Net to VIN and QMDP-Net. VIN is an end-to-end neural network for solving Markov Decision Processes (MDPs) whose transition and reward functions are initially unknown, while QMDP-Net is its counterpart for the Partially Observable Markov Decision Process (POMDP) whose transition, observation, and reward functions are initially unknown. Simulation tests on benchmark problems involving 2D and 3D navigation and grasping indicate promising results: Changing only the forward propagation module alone with LCI-Net improves VIN\u2019s and QMDP-Net generalisation capability by more than 3� and 10�, respectively.

Modifications of Q-learning for better learning of robot navigation

Ee Soong Low, Pauline Ong, Cheng Yee Low, Rosli Omar, Modified Q-learning with distance metric and virtual target on path planning of mobile robot, Expert Systems with Applications, Volume 199, 2022, DOI: 10.1016/j.eswa.2022.117191.

Path planning is an essential element in mobile robot navigation. One of the popular path planners is Q-learning \u2013 a type of reinforcement learning that learns with little or no prior knowledge of the environment. Despite the successful implementation of Q-learning reported in numerous studies, its slow convergence associated with the curse of dimensionality may limit the performance in practice. To solve this problem, an Improved Q-learning (IQL) with three modifications is introduced in this study. First, a distance metric is added to Q-learning to guide the agent moves towards the target. Second, the Q function of Q-learning is modified to overcome dead-ends more effectively. Lastly, the virtual target concept is introduced in Q-learning to bypass dead-ends. Experimental results across twenty types of navigation maps show that the proposed strategies accelerate the learning speed of IQL in comparison with the Q-learning. Besides, performance comparison with seven well-known path planners indicates its efficiency in terms of the path smoothness, time taken, shortest distance and total distance used.

A nice summary of SLAM in robotics with Lidar and Cameras

Chghaf, M., Rodriguez, S. & Ouardi, A.E. Camera, LiDAR and Multi-modal SLAM Systems for Autonomous Ground Vehicles: a Survey J Intell Robot Syst 105, 2 (2022) DOI: 10.1007/s10846-022-01582-8.

Simultaneous Localization and Mapping (SLAM) have been widely studied over the last years for autonomous vehicles. SLAM achieves its purpose by constructing a map of the unknown environment while keeping track of the location. A major challenge, which is paramount during the design of SLAM systems, lies in the efficient use of onboard sensors to perceive the environment. The most widely applied algorithms are camera-based SLAM and LiDAR-based SLAM. Recent research focuses on the fusion of camera-based and LiDAR-based frameworks that show promising results. In this paper, we present a study of commonly used sensors and the fundamental theories behind SLAM algorithms. The study then presents the hardware architectures used to process these algorithms and the performance obtained when possible. Secondly, we highlight state-of-the-art methodologies in each modality and in the multi-modal framework. A brief comparison followed by future challenges is then underlined. Additionally, we provide insights to possible fusion approaches that can increase the robustness and accuracy of modern SLAM algorithms; hence allowing the hardware-software co-design of embedded systems taking into account the algorithmic complexity and the embedded architectures and real-time constraints.

A nice summary of RL applied to robot navigation

N. Khlif, N. Khraief and S. Belghith, Reinforcement Learning for Mobile Robot Navigation: An overview IEEE Information Technologies & Smart Industrial Systems (ITSIS), Paris, France, 2022, pp. 1-7 DOI: 10.1109/ITSIS56166.2022.10118362.

For several years, research shows that interest in autonomous mobile robots is increasing and it has more and more grown. Autonomous mobile robots is an object of discussion but nowadays it’s an emerging topic due to the all progress related to field like autonomous driving and UAV (drones). Integrating intelligence into robotic systems requires solving various research problems, including one of the most important problems of mobile robotic systems: navigation. Find the answers to the following three questions: What is the localisation of the robot? Where are the robot going? How can it get there? presenting the solution of mobile robot navigation problem. These questions are answered by basic navigation parts which are localization, mapping and path planning. The paper present an overview of research on autonomous mobile robot navigation. First, a quick introduction to the various features of navigation. We also discuss machine learning and reinforcement learning in mobile robotics. Furthermore, we will discuss some path planning techniques. Some future directions are also suggested.

Mixing rule-based and reinforcement learning navigation for robots

Y. Zhu, Z. Wang, C. Chen and D. Dong, Rule-Based Reinforcement Learning for Efficient Robot Navigation With Space Reduction, IEEE/ASME Transactions on Mechatronics, vol. 27, no. 2, pp. 846-857, April 2022 DOI: 10.1109/TMECH.2021.3072675.

For real-world deployments, it is critical to allow robots to navigate in complex environments autonomously. Traditional methods usually maintain an internal map of the environment, and then design several simple rules, in conjunction with a localization and planning approach, to navigate through the internal map. These approaches often involve a variety of assumptions and prior knowledge. In contrast, recent reinforcement learning (RL) methods can provide a model-free, self-learning mechanism as the robot interacts with an initially unknown environment, but are expensive to deploy in real-world scenarios due to inefficient exploration. In this article, we focus on efficient navigation with the RL technique and combine the advantages of these two kinds of methods into a rule-based RL (RuRL) algorithm for reducing the sample complexity and cost of time. First, we use the rule of wall-following to generate a closed-loop trajectory. Second, we employ a reduction rule to shrink the trajectory, which in turn effectively reduces the redundant exploration space. Besides, we give the detailed theoretical guarantee that the optimal navigation path is still in the reduced space. Third, in the reduced space, we utilize the Pledge rule to guide the exploration strategy for accelerating the RL process at the early stage. Experiments conducted on real robot navigation problems in hex-grid environments demonstrate that RuRL can achieve improved navigation performance.

Incremental learning (i.e., non-stationary environments, online -live- learning, task adaptation, life-long learning,…) for robots with Q-learning

Y. Hu, D. Li, Y. He and J. Han, Incremental Learning Framework for Autonomous Robots Based on Q-Learning and the Adaptive Kernel Linear Model IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 1, pp. 64-74, March 2022 DOI: 10.1109/TCDS.2019.2962228.

The performance of autonomous robots in varying environments needs to be improved. For such incremental improvement, here we propose an incremental learning framework based on Q -learning and the adaptive kernel linear (AKL) model. The AKL model is used for storing behavioral policies that are learned by Q -learning. Both the structure and parameters of the AKL model can be trained using a novel L2-norm kernel recursive least squares (L2-KRLS) algorithm. The AKL model initially without nodes and gradually accumulates content. The proposed framework allows to learn new behaviors without forgetting the previous ones. A novel local \u03b5 -greedy policy is proposed to speed the convergence rate of Q -learning. It calculates the exploration probability of each state for generating and selecting more important training samples. The performance of our incremental learning framework was validated in two experiments. A curve-fitting example shows that the L2-KRLS-based AKL model is suitable for incremental learning. The second experiment is based on robot learning tasks. The results show that our framework can incrementally learn behaviors in varying environments. Local \u03b5 -greedy policy-based Q -learning is faster than the existing Q -learning algorithms.

Adaptation of model-free RL to variations in the task under continuous state and action spaces applied to robot grasping

Shahid, A.A., Piga, D., Braghin, F. et al. Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning, Auton Robot 46, 483\u2013498 (2022) DOI: 10.1007/s10514-022-10034-z.

This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). In order to accelerate the learning process, the fine-tuning procedure is proposed that demonstrates the continuous adaptation of on-policy RL to new environments, allowing the learned policy to adapt and execute the (partially) modified task. A dense reward function is designed for the task to enable an efficient learning of the agent. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The learned control policy is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations. The approach is finally tested on a real Franka Emika Panda robot, showing the possibility to transfer the learned behavior from simulation. Experimental results show 100% of successful grasping tasks, making the proposed approach applicable to real applications.