Author Archives: Juan-antonio Fernández-madrigal

Survey of machine learning applied to robot navigation, including a brief survey of classic navigation

Xiao, X., Liu, B., Warnell, G. et al. Motion planning and control for mobile robot navigation using machine learning: a survey, Auton Robot 46, 569\u2013597 (2022) DOI: 10.1007/s10514-022-10039-8.

Moving in complex environments is an essential capability of intelligent mobile robots. Decades of research and engineering have been dedicated to developing sophisticated navigation systems to move mobile robots from one point to another. Despite their overall success, a recently emerging research thrust is devoted to developing machine learning techniques to address the same problem, based in large part on the success of deep learning. However, to date, there has not been much direct comparison between the classical and emerging paradigms to this problem. In this article, we survey recent works that apply machine learning for motion planning and control in mobile robot navigation, within the context of classical navigation systems. The surveyed works are classified into different categories, which delineate the relationship of the learning approaches to classical methods. Based on this classification, we identify common challenges and promising future directions.

POMDP Planner that uses multiple levels of approximation to the system dynamics to reduce the number and complexity of forward simulations

Hoerger M, Kurniawati H, Elfes A. , Multilevel Monte Carlo for solving POMDPs on-line, The International Journal of Robotics Research. 2023;42(4-5):196-213 DOI: 10.1177/02783649221093658.

Planning under partial observability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for systems with complex dynamics remains challenging. Most on-line solvers rely on a large number of forward simulations and standard Monte Carlo methods to compute the expected outcomes of actions the robot can perform. For systems with complex dynamics, for example, those with non-linear dynamics that admit no closed-form solution, even a single forward simulation can be prohibitively expensive. Of course, this issue exacerbates for problems with long planning horizons. This paper aims to alleviate the above difficulty. To this end, we propose a new on-line POMDP solver, called Multilevel POMDP Planner\u2009(MLPP), that combines the commonly known Monte-Carlo-Tree-Search with the concept of Multilevel Monte Carlo to speed up our capability in generating approximately optimal solutions for POMDPs with complex dynamics. Experiments on four different problems involving torque control, navigation and grasping indicate that MLPP\u2009substantially outperforms state-of-the-art POMDP solvers.

Semi-Markov HMMs for modelling time series in milling machines

Kai Li, Chaochao Qiu, Xinzhao Zhou, Mingsong Chen, Yongcheng Lin, Xianshi Jia, Bin Li, Modeling and tagging of time sequence signals in the milling process based on an improved hidden semi-Markov model, Expert Systems with Applications, Volume 205, 2022 DOI: 10.1016/j.eswa.2022.117758.

Vibration signals are widely used in the field of tool wear, tool residual life prediction and health monitoring of mechanical equipment. However, the current data-driven research methods mostly rely on high-value and high-density labeled data to establish relevant models and algorithms. Therefore, it is of great significance to solve the problem of automatic tagging of data, realize automatic signal interception, and enhance the value density of manufacturing process data. The Hidden semi-Markov model (HSMM) can describe the real spatial statistical characteristics of random models through observable data. As HSMM does not need the real labels of the signal, it can reduce tagging work to improve the marking efficiency. In this paper, an improved HSMM was proposed to model and tag the spindle vibration signals in the milling process. First, the Mel frequency cepstral coefficients (MFCCs) were extracted as observation sequences from the collected spindle vibration signals, and the dimension of the original features was reduced by linear discriminant analysis (LDA). Subsequently, a signal automatic tagging model based on HSMM was developed, in which the state duration can be explicitly modeled. Finally, the evaluation of the proposed methodology was carried out in the laboratory and real industry machining. The experimental results confirmed the effectiveness and robustness of the proposed model.

A survey on visual SLAM in robotics

Iman Abaspur Kazerouni, Luke Fitzgerald, Gerard Dooly, Daniel Toal, A survey of state-of-the-art on visual SLAM, Expert Systems with Applications, Volume 205, 2022 DOI: 10.1016/j.eswa.2022.117734.

This paper is an overview to Visual Simultaneous Localization and Mapping (V-SLAM). We discuss the basic definitions in the SLAM and vision system fields and provide a review of the state-of-the-art methods utilized for mobile robot\u2019s vision and SLAM. This paper covers topics from the basic SLAM methods, vision sensors, machine vision algorithms for feature extraction and matching, Deep Learning (DL) methods and datasets for Visual Odometry (VO) and Loop Closure (LC) in V-SLAM applications. Several feature extraction and matching algorithms are simulated to show a better vision of feature-based techniques.

See also:

Jun Cheng, Liyan Zhang, Qihong Chen, Xinrong Hu, Jingcao Cai, “A review of visual SLAM methods for autonomous driving vehicles,” Engineering Applications of Artificial Intelligence, Volume 114, 2022, 104992, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2022.104992.

Tianyao Zhang, Xiaoguang Hu, Jin Xiao, Guofeng Zhang, “A survey of visual navigation: From geometry to embodied AI,” Engineering Applications of Artificial Intelligence, Volume 114, 2022, 105036, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2022.105036.

Communication delays modelled as Gamma distributions for space-earth applications

H. Chen, Z. Liu, P. Huang and Z. Kuang, Time-Delay Modeling and Simulation for Relay Communication-Based Space Telerobot System, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 7, pp. 4211-4222, July 2022 DOI: 10.1109/TSMC.2021.3090806.

In a space telerobot system (STS), effectiveness of the control method in eliminating the time delay\u2019s influences is advisable to be verified under the real circumstance. However, it is difficult and costly for many scholars to obtain confidential information that would allow them to establish an STS. It may be feasible, using some existing results, to model the time delay as close to reality as possible, and to then program a simulation system to generate the simulated time delay, thus verifying validity. In this article, time-delay modeling and simulation problems for relay communication-based STS are first studied. The time delay in relay communication-based STS consists of both processing and communication time delays; the latter is divided into ground and ground-space parts. By extending the available results, processing and ground communication time delays are modeled with the probability distribution function modeling approach. An optimal communication link identification and minimum time-delay realization (OCLIMTDR) method is proposed to model the ground-space communication time delay. In this method, the novel point\u2013vector\u2013sphere (PVS) algorithm serves to judge link connectivity. The PVS algorithm is based on geometric theory, which gives the OCLIMTDR method good extensibility and renders it suitable for any relay communication network in theory. All three parts of the time-delay models are integrated to form the loop time-delay model of the STS. Subsequently, a time-delay simulation system is created by programming the loop time-delay model. Finally, the correctness of the simulation system is further authenticated based on simulations and some prior knowledge.

Kalman filter with time delays

H. Zhu, J. Mi, Y. Li, K. -V. Yuen and H. Leung, VB-Kalman Based Localization for Connected Vehicles With Delayed and Lost Measurements: Theory and Experiments, EEE/ASME Transactions on Mechatronics, vol. 27, no. 3, pp. 1370-1378, June 2022 DOI: 10.1109/TMECH.2021.3095096.

Traditionally, connected vehicles (CVs) share their own sensor data that relies on the satellite with their surrounding vehicles by vehicle-to-vehicle (V2V) communication. However, the satellite-based signal sometimes may be lost due to environmental factors. Time-delays and packet dropouts may occur randomly by V2V communication. To ensure the reliability and accuracy of localization for CVs, a novel variational Bayesian (VB)-Kalman method is developed for unknown and time varying probabilities of delayed and lost measurements. In this VB-Kalman localization method, two random variables are introduced to indicate whether a measurement is delayed and available, respectively. A hierarchical model is then formulated and its parameters and state are simultaneously estimated by the VB technique. Experimental results validate the proposed method for the localization of CVs in practice.

NOTE: See also S. Guo, Y. Liu, Y. Zheng and T. Ersal, “A Delay Compensation Framework for Connected Testbeds,” in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 7, pp. 4163-4176, July 2022, doi: 10.1109/TSMC.2021.3091974.

A tutorial on the integration of ROS with some interesting software, such as Jupyter

T. Fischer, W. Vollprecht, S. Traversaro, S. Yen, C. Herrero and M. Milford, A RoboStack Tutorial: Using the Robot Operating System Alongside the Conda and Jupyter Data Science Ecosystems, IEEE Robotics & Automation Magazine, vol. 29, no. 2, pp. 65-74, June 2022 DOI: 10.1109/MRA.2021.3128367.

The Robot Operating System (ROS) has become the de facto standard middleware in the robotics community [1] . ROS bundles everything, from low-level drivers to tools that transform among coordinate systems, to state-of-the-art perception and control algorithms. One of ROS\u2019s key merits is the rich ecosystem of standardized tools to build and distribute ROS-based software.

Improving the quality of memory replay in RL through an evolutionary algorithm biologically inspired

M. Ramicic and A. Bonarini, Augmented Memory Replay in Reinforcement Learning With Continuous Control, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 485-496, June 2022 DOI: 10.1109/TCDS.2021.3050723.

Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent\u2019s state space enabling it to scale up to more complex problems but also increases the risk of forgetting by learning on redundant or conflicting data. To improve the approximation of a large amount of data, a random mini-batch of the past experiences that are stored in the replay memory buffer is often replayed at each learning step. The proposed work takes inspiration from a biological mechanism which acts as a protective layer of higher cognitive functions found in mammalian brain: active memory consolidation mitigates the effect of forgetting previous memories by dynamically processing the new ones. Similar dynamics are implemented by the proposed augmented memory replay or AMR algorithm. The architecture of AMR , based on a simple artificial neural network is able to provide an augmentation policy which modifies each of the agents experiences by augmenting their relevance prior to storing them in the replay memory. The function approximator of AMR is evolved using genetic algorithm in order to obtain the specific augmentation policy function that yields the best performance of a learning agent in a specific environment given by its received cumulative reward. Experimental results show that an evolved AMR augmentation function capable of increasing the significance of the specific memories is able to further increase the stability and convergence speed of the learning algorithms dealing with the complexity of continuous action domains.

Modelling the perception of time in the human brain through RL with eligibility traces

I. Louren�o, R. Mattila, R. Ventura and B. Wahlberg, A Biologically Inspired Computational Model of Time Perception, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 258-268, June 2022 DOI: 10.1109/TCDS.2021.3120301.

Time perception\u2014how humans and animals perceive the passage of time\u2014forms the basis for important cognitive skills, such as decision making, planning, and communication. In this work, we propose a framework for examining the mechanisms responsible for time perception. We first model neural time perception as a combination of two known timing sources: internal neuronal mechanisms and external (environmental) stimuli, and design a decision-making framework to replicate them. We then implement this framework in a simulated robot. We measure the robot\u2019s success on a temporal discrimination task originally performed by mice to evaluate their capacity to exploit temporal knowledge. We conclude that the robot is able to perceive time similarly to animals when it comes to their intrinsic mechanisms of interpreting time and performing time-aware actions. Next, by analyzing the behavior of agents equipped with the framework, we propose an estimator to infer characteristics of the timing mechanisms intrinsic to the agents. In particular, we show that from their empirical action probability distribution, we are able to estimate parameters used for perceiving time. Overall, our work shows promising results when it comes to drawing conclusions regarding some of the characteristics present in biological timing mechanisms.

NOTE: See also H. Basgol, I. Ayhan and E. Ugur, “Time Perception: A Review on Psychological, Computational, and Robotic Models,” in IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 301-315, June 2022, doi: 10.1109/TCDS.2021.3059045.

Dealing with the exploration with a nice introduction to the problem

Jiayi Lu, Shuai Han, Shuai L�, Meng Kang, Junwei Zhang, Sampling diversity driven exploration with state difference guidance, Expert Systems with Applications, Volume 203, 2022 DOI: 10.1016/j.eswa.2022.117418.

Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot take both global interaction dynamics and local environment changes into account simultaneously. In this paper, we propose a novel intrinsic reward for off-policy learning, which not only encourages the agent to take actions not fully learned from a global perspective, but also instructs the agent to trigger remarkable changes in the environment from a local perspective. Meanwhile, we propose the double-actors\u2013double-critics framework to combine intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of intrinsic and extrinsic rewards in previous methods. This framework can be applied to off-policy learning algorithms based on the actor\u2013critic method. We provide a comprehensive evaluation of our approach on the MuJoCo benchmark environments. The results demonstrate that our method can perform effective exploration in the environments with dense, deceptive and sparse rewards. Besides, we conduct sufficient ablation and quantitative analyses to intrinsic rewards. Furthermore, we also verify the superiority and rationality of our double-actors\u2013double-critics framework through comparative experiments.