Monthly Archives: June 2023

You are browsing the site archives by month.

Learning the parameters of a robot navigator through Q-learning

Chang, L., Shan, L., Jiang, C. et al, Reinforcement based mobile robot path planning with improved dynamic window approach in unknown environment, . Auton Robot 45, 51–76 (2021) DOI: 10.1007/s10514-020-09947-4.

Mobile robot path planning in an unknown environment is a fundamental and challenging problem in the field of robotics. Dynamic window approach (DWA) is an effective method of local path planning, however some of its evaluation functions are inadequate and the algorithm for choosing the weights of these functions is lacking, which makes it highly dependent on the global reference and prone to fail in an unknown environment. In this paper, an improved DWA based on Q-learning is proposed. First, the original evaluation functions are modified and extended by adding two new evaluation functions to enhance the performance of global navigation. Then, considering the balance of effectiveness and speed, we define the state space, action space and reward function of the adopted Q-learning algorithm for the robot motion planning. After that, the parameters of the proposed DWA are adaptively learned by Q-learning and a trained agent is obtained to adapt to the unknown environment. At last, by a series of comparative simulations, the proposed method shows higher navigation efficiency and successful rate in the complex unknown environment. The proposed method is also validated in experiments based on XQ-4 Pro robot to verify its navigation capability in both static and dynamic environment.

Classification with decision trees based on POMDPs

Shlomi Maliah, Guy Shani, Using POMDPs for learning cost sensitive decision trees, . Artificial Intelligence, Volume 292, 2021 DOI: 10.1016/j.artint.2020.103400.

In classification, an algorithm learns to classify a given instance based on a set of observed attribute values. In many real world cases testing the value of an attribute incurs a cost. Furthermore, there can also be a cost associated with the misclassification of an instance. Cost sensitive classification attempts to minimize the expected cost of classification, by deciding after each observed attribute value, which attribute to measure next. In this paper we suggest Partially Observable Markov Decision Processes (POMDPs) as a modeling tool for cost sensitive classification. POMDPs are typically solved through a policy over belief states. We show how a relatively small set of potentially important belief states can be identified, and define an MDP over these belief states. To identify these potentially important belief states, we construct standard decision trees over all attribute subsets, and the leaves of these trees become the state space of our tree-based MDP. At each phase we decide on the next attribute to measure, balancing the cost of the measurement and the classification accuracy. We compare our approach to a set of previous approaches, showing our approach to work better for a range of misclassification costs.

Localizing robots within pipes through RF signals

Carlos Rizzo, Teresa Seco, Jesús Espelosín, Francisco Lera, José Luis Villarroel, An alternative approach for robot localization inside pipes using RF spatial fadings, . Robotics and Autonomous Systems, Volume 136, 2021 DOI: 10.1016/j.robot.2020.103702.

Accurate robot localization represents a challenge inside pipes due to the particular conditions that characterize this type of environment. Outdoor techniques (GPS in particular) do not work at all inside metal pipes, while traditional indoor localization methods based on camera or laser sensors do not perform well mainly due to a lack of external illumination and distinctive features along pipes. Moreover, humidity and slippery surfaces make wheel odometry unreliable. In this paper, we estimate the localization of a robot along a pipe with an alternative Radio Frequency (RF) approach. We first analyze wireless propagation in metallic pipes and propose a series of setups that allow us to obtain periodic RF spatial fadings (a sort of standing wave periodic pattern), together with the influence of the antenna position and orientation over these fadings. Subsequently, we propose a discrete RF odometry-like method, by means of counting the fadings while traversing them. The transversal fading analysis (number of antennas and cross-section position) makes it possible to increase the resolution of this method. Lastly, the model of the signal is used in a continuous approach serving as an RF map. The proposed localization methods outperform our previous contributions in terms of resolution, accuracy, reliability and robustness. Experimental results demonstrate the effectiveness of the RF-based strategy without the need for a previously known map of the scenario or any substantial modification of the existing infrastructure.

Learning robot simulators

Grant W. Woodford, Mathys C. du Plessis, Bootstrapped Neuro-Simulation for complex robots, . Robotics and Autonomous Systems, Volume 136, 2021 DOI: 10.1016/j.robot.2020.103708.

Robotic simulators are often used to speed up the Evolutionary Robotics (ER) process. Most simulation approaches are based on physics modelling. However, physics-based simulators can become complex to develop and require prior knowledge of the robotic system. Robotics simulators can be constructed using Machine Learning techniques, such as Artificial Neural Networks (ANNs). ANN-based simulator development usually requires a lengthy behavioural data collection period before the simulator can be trained and used to evaluate controllers during the ER process. The Bootstrapped Neuro-Simulation (BNS) approach can be used to simultaneously collect behavioural data, train an ANN-based simulator and evolve controllers for a particular robotic problem. This paper investigates proposed improvements to the BNS approach and demonstrates the viability of the approach by optimising gait controllers for a Hexapod and Snake robot platform.

Mixing Monte-Carlo Tree Search with Q-learning for robot learning

Francesco Riccio, Roberto Capobianco, Daniele Nardi, LoOP: Iterative learning for optimistic planning on robots, . Robotics and Autonomous Systems, Volume 36, 2021 DOI: 10.1016/j.robot.2020.103693.

Efficient robotic behaviors require robustness and adaptation to dynamic changes of the environment, whose characteristics rapidly vary during robot operation. To generate effective robot action policies, planning and learning techniques have shown the most promising results. However, if considered individually, they present different limitations. Planning techniques lack generalization among similar states and require experts to define behavioral routines at different levels of abstraction. Conversely, learning methods usually require a considerable number of training samples and iterations of the algorithm. To overcome these issues, and to efficiently generate robot behaviors, we introduce LoOP, an iterative learning algorithm for optimistic planning that combines state-of-the-art planning and learning techniques to generate action policies. The main contribution of LoOP is the combination of Monte-Carlo Search Planning and Q-learning, which enables focused exploration during policy refinement in different robotic applications. We demonstrate the robustness and flexibility of LoOP in various domains and multiple robotic platforms, by validating the proposed approach with an extensive experimental evaluation.

Deep learning RL methods for robot navigation

Luong, M., Pham, C., Incremental Learning for Autonomous Navigation of Mobile Robots based on Deep Reinforcement Learning, . J Intell Robot Syst 101, 1 (2021) DOI: 10.1007/s10846-020-01262-5.

This paper presents an incremental learning method and system for autonomous robot navigation. The range finder laser sensor and online deep reinforcement learning are utilized for generating the navigation policy, which is effective for avoiding obstacles along the robot’s trajectories as well as for robot’s reaching the destination. An empirical experiment is conducted under simulation and real-world settings. Under the simulation environment, the results show that the proposed method can generate a highly effective navigation policy (more than 90% accuracy) after only 150k training iterations. Moreover, our system has slightly outperformed deep-Q, while having considerably surpassed Proximal Policy Optimization, two recent state-of-the art robot navigation systems. Finally, two experiments are performed to demonstrate the feasibility and effectiveness of our robot’s proposed navigation system in real-time under real-world settings.

Qualitative modelling of quadcopters that is claimed to be better than reinforcement learning

Šoberl, D., Bratko, I. & Žabkar, Learning to Control a Quadcopter Qualitatively., . J Intell Robot Syst 100, 1097–1110 (2020) DOI: 10.1007/s10846-020-01228-7.

Qualitative modeling allows autonomous agents to learn comprehensible control models, formulated in a way that is close to human intuition. By abstracting away certain numerical information, qualitative models can provide better insights into operating principles of a dynamic system in comparison to traditional numerical models. We show that qualitative models, learned from numerical traces, contain enough information to allow motion planning and path following. We demonstrate our methods on the task of flying a quadcopter. A qualitative control model is learned through motor babbling. Training is significantly faster than training times reported in papers using reinforcement learning with similar quadcopter experiments. A qualitative collision-free trajectory is computed by means of qualitative simulation, and executed reactively while dynamically adapting to numerical characteristics of the system. Experiments have been conducted and assessed in the V-REP robotic simulator.

Using abstraction of dimensions in RRT motion planning

Xanthidis, M., Esposito, J.M., Rekleitis, I. et al., Motion Planning by Sampling in Subspaces of Progressively Increasing Dimension, . J Intell Robot Syst 100, 777–789 (2020) DOI: 10.1007/s10846-020-01217-w.

This paper introduces an enhancement to traditional sampling-based planners, resulting in efficiency increases for high-dimensional holonomic systems such as hyper-redundant manipulators, snake-like robots, and humanoids. Despite the performance advantages of modern sampling-based motion planners, solving high dimensional planning problems in near real-time remains a considerable challenge. The proposed enhancement to popular sampling-based planning algorithms is aimed at circumventing the exponential dependence on dimensionality, by progressively exploring lower dimensional volumes of the configuration space. Extensive experiments comparing the enhanced and traditional version of RRT, RRT-Connect, and Bidirectional T-RRT on both a planar hyper-redundant manipulator and the Baxter humanoid robot show significant acceleration, up to two orders of magnitude, on computing a solution. We also explore important implementation issues in the sampling process and discuss the limitations of this method.

A new clustering algorithm based on swarm intelligence that is alleged to require no parameterization

Michael C. Thrun, Alfred Ultsch, Swarm intelligence for self-organized clustering, . Artificial Intelligence, Volume 290, 2021, DOI: 10.1016/j.artint.2020.103237.

Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called Databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data characterized by distance and/or density-based structures in the data space. By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering. The swarm omits the usage of a global objective function and is parameter-free because it searches for the Nash equilibrium during its annealing process. To our knowledge, DBS is the first swarm combining these approaches. Its clustering can outperform common clustering methods such as K-means, PAM, single linkage, spectral clustering, model-based clustering, and Ward, if no prior knowledge about the data is available. A central problem in clustering is the correct estimation of the number of clusters. This is addressed by a DBS visualization called topographic map which allows assessing the number of clusters. It is known that all clustering algorithms construct clusters, irrespective of the data set contains clusters or not. In contrast to most other clustering algorithms, the topographic map identifies, that clustering of the data is meaningless if the data contains no (natural) clusters. The performance of DBS is demonstrated on a set of benchmark data, which are constructed to pose difficult clustering problems and in two real-world applications.

Linear regression when not only Y is perturbed by noise, but also the very model is assumed to have noise

Sophie M. Fosson, Vito Cerone, Diego Regruto, Sparse linear regression from perturbed data, . Automatica, Volume 122, 2020, DOI: 10.1016/j.automatica.2020.109284.

The problem of sparse linear regression is relevant in the context of linear system identification from large datasets. When data are collected from real-world experiments, measurements are always affected by perturbations or low-precision representations. However, the problem of sparse linear regression from fully-perturbed data is scarcely studied in the literature, due to its mathematical complexity. In this paper, we show that, by assuming bounded perturbations, this problem can be tackled by solving low-complex ℓ2 and ℓ1 minimization problems. Both theoretical guarantees and numerical results are illustrated.