Author Archives: Juan-antonio Fernández-madrigal

Estimating speed from inertial data by dealing with noise and outliers

W. Xu, X. Peng and L. Kneip, Tight Fusion of Events and Inertial Measurements for Direct Velocity Estimation, IEEE Transactions on Robotics, vol. 40, pp. 240-256, 2024 DOI: 10.1109/TRO.2023.3333108.

Traditional visual-inertial state estimation targets absolute camera poses and spatial landmark locations while first-order kinematics are typically resolved as an implicitly estimated substate. However, this poses a risk in velocity-based control scenarios, as the quality of the estimation of kinematics depends on the stability of absolute camera and landmark coordinates estimation. To address this issue, we propose a novel solution to tight visual\u2013inertial fusion directly at the level of first-order kinematics by employing a dynamic vision sensor instead of a normal camera. More specifically, we leverage trifocal tensor geometry to establish an incidence relation that directly depends on events and camera velocity, and demonstrate how velocity estimates in highly dynamic situations can be obtained over short-time intervals. Noise and outliers are dealt with using a nested two-layer random sample consensus (RANSAC) scheme. In addition, smooth velocity signals are obtained from a tight fusion with preintegrated inertial signals using a sliding window optimizer. Experiments on both simulated and real data demonstrate that the proposed tight event-inertial fusion leads to continuous and reliable velocity estimation in highly dynamic scenarios independently of absolute coordinates. Furthermore, in extreme cases, it achieves more stable and more accurate estimation of kinematics than traditional, point-position-based visual-inertial odometry.

Particle grid maps

G. Chen, W. Dong, P. Peng, J. Alonso-Mora and X. Zhu, Continuous Occupancy Mapping in Dynamic Environments Using Particles, IEEE Transactions on Robotics, vol. 40, pp. 64-84, 2024 DOI: 10.1109/TRO.2023.3323841.

Particle-based dynamic occupancy maps were proposed in recent years to model the obstacles in dynamic environments. Current particle-based maps describe the occupancy status in discrete grid form and suffer from the grid size problem, wherein a large grid size is unfavorable for motion planning while a small grid size lowers efficiency and causes gaps and inconsistencies. To tackle this problem, this article generalizes the particle-based map into continuous space and builds an efficient 3-D egocentric local map. A dual-structure subspace division paradigm, composed of a voxel subspace division and a novel pyramid-like subspace division, is proposed to propagate particles and update the map efficiently with the consideration of occlusions. The occupancy status at an arbitrary point in the map space can then be estimated with the weights of the particles. To reduce the noise in modeling static and dynamic obstacles simultaneously, an initial velocity estimation approach and a mixture model are utilized. Experimental results show that our map can effectively and efficiently model both dynamic obstacles and static obstacles. Compared to the state-of-the-art grid-form particle-based map, our map enables continuous occupancy estimation and substantially improves the mapping performance at different resolutions.

Offline RL in robotics

L. Yao, B. Zhao, X. Xu, Z. Wang, P. K. Wong and Y. Hu, Efficient Incremental Offline Reinforcement Learning With Sparse Broad Critic Approximation, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 1, pp. 156-169, Jan. 2024 DOI: 10.1109/TSMC.2023.3305498.

Offline reinforcement learning (ORL) has been getting increasing attention in robot learning, benefiting from its ability to avoid hazardous exploration and learn policies directly from precollected samples. Approximate policy iteration (API) is one of the most commonly investigated ORL approaches in robotics, due to its linear representation of policies, which makes it fairly transparent in both theoretical and engineering analysis. One open problem of API is how to design efficient and effective basis functions. The broad learning system (BLS) has been extensively studied in supervised and unsupervised learning in various applications. However, few investigations have been conducted on ORL. In this article, a novel incremental ORL approach with sparse broad critic approximation (BORL) is proposed with the advantages of BLS, which approximates the critic function in a linear manner with randomly projected sparse and compact features and dynamically expands its broad structure. The BORL is the first extension of API with BLS in the field of robotics and ORL. The approximation ability and convergence performance of BORL are also analyzed. Comprehensive simulation studies are then conducted on two benchmarks, and the results demonstrate that the proposed BORL can obtain comparable or better performance than conventional API methods without laborious hyperparameter fine-tuning work. To further demonstrate the effectiveness of BORL in practical robotic applications, a variable force tracking problem in robotic ultrasound scanning (RUSS) is investigated, and a learning-based adaptive impedance control (LAIC) algorithm is proposed based on BORL. The experimental results demonstrate the advantages of LAIC compared with conventional force tracking methods.

See also: X. Wang, D. Hou, L. Huang and Y. Cheng, “Offline\u2013Online Actor\u2013Critic,” in IEEE Transactions on Artificial Intelligence, vol. 5, no. 1, pp. 61-69, Jan. 2024, doi: 10.1109/TAI.2022.3225251

Hierarchical Deep-RL for continuous and large state spaces

A. P. Pope et al. Hierarchical Reinforcement Learning for Air Combat at DARPA’s AlphaDogfight Trials, EEE Transactions on Artificial Intelligence, vol. 4, no. 6, pp. 1371-1385, Dec. 2023 DOI: 10.1109/TAI.2022.3222143.

Autonomous control in high-dimensional, continuous state spaces is a persistent and important challenge in the fields of robotics and artificial intelligence. Because of high risk and complexity, the adoption of AI for autonomous combat systems has been a long-standing difficulty. In order to address these issues, DARPA’s AlphaDogfight Trials (ADT) program sought to vet the feasibility of and increase trust in AI for autonomously piloting an F-16 in simulated air-to-air combat. Our submission to ADT solves the high-dimensional, continuous control problem using a novel hierarchical deep reinforcement learning approach consisting of a high-level policy selector and a set of separately trained low-level policies specialized for excelling in specific regions of the state space. Both levels of the hierarchy are trained using off-policy, maximum entropy methods with expert knowledge integrated through reward shaping. Our approach outperformed human expert pilots and achieved a second-place rank in the ADT championship event.

Visibility graphs for robot path planning is still in use!

Junlin Ou, Seong Hyeon Hong, Ge Song, Yi Wang, Hybrid path planning based on adaptive visibility graph initialization and edge computing for mobile robots, Engineering Applications of Artificial Intelligence, Volume 126, Part D, 2023 DOI: 10.1016/j.engappai.2023.107110.

This paper presents a new initialization method that combines adaptive visibility graphs and the A* algorithm to improve the exploration, accuracy, and computing efficiency of hybrid path planning for mobile robots. First, segments/links in the full visibility graphs are removed randomly in an iterative and adaptive manner, yielding adaptive visibility graphs. Then the A* algorithm is applied to find the shortest paths in these adaptive visibility graphs. Next, high-quality paths featuring low fitness values are chosen to initialize the subsequent heuristic optimization in hybrid path planning. Specifically, in the present study, the genetic algorithm (GA) is implemented on a CPU/GPU edge computing device (Jetson AGX Xavier) to exploit its massively parallel processing threads, and the strategy for judicious CPU/GPU resource utilization is also developed. Numerical experiments are conducted to determine proper hyperparameters and configure GA with balanced performance. Various optimal paths with differential consideration of practical factors for robot path planning are obtained by the proposed method. Compared to the other benchmark methods, ours significantly improves the diversity of initial path and exploration, optimization accuracy, and computing speed (within 5�s with most less than 2�s). Furthermore, real-time experiments are carried out to demonstrate the effectiveness and application of the proposed algorithm on mobile robots.

Review of NNs for solving manipulator inverse kinematics

Daniel Cagigas-Mu�iz, Artificial Neural Networks for inverse kinematics problem in articulated robots, Engineering Applications of Artificial Intelligence,
Volume 126, Part D, 2023 DOI: 10.1016/j.engappai.2023.107175.

The inverse kinematics problem in articulated robots implies to obtain joint rotation angles using the robot end effector position and orientation tool. Unlike the problem of direct kinematics, in inverse kinematics there are no systematic methods for solving the problem. Moreover, solving the inverse kinematics problem is particularly complicated for certain morphologies of articulated robots. Machine learning techniques and, more specifically, artificial neural networks (ANNs) have been proposed in the scientific literature to solve this problem. However, there are some limitations in the performance of ANNs. In this study, different techniques that involve ANNs are proposed and analyzed. The results show that the proposed original bootstrap sampling and hybrid methods can substantially improve the performance of approaches that use only one ANN. Although all of these improvements do not solve completely the inverse kinematics problem in articulated robots, they do lay the foundations for the design and development of future more effective and efficient controllers. Therefore, the source code and documentation of this research are also publicly available to practitioners interested in adapting and improving these methods to any industrial robot or articulated robot.

Hierarchical RL with continuous options

Zhigang Huang, Quan Liu, Fei Zhu, Hierarchical reinforcement learning with adaptive scheduling for robot control, Engineering Applications of Artificial Intelligence, Volume 126, Part D, 2023 DOI: 10.1016/j.engappai.2023.107130.

Conventional hierarchical reinforcement learning (HRL) relies on discrete options to represent explicitly distinguishable knowledge, which may lead to severe performance bottlenecks. It is possible to represent richer knowledge through continuous options, but reliable scheduling methods are lacking. To design an available scheduling method for continuous options, in this paper, the hierarchical reinforcement learning with adaptive scheduling (HAS) algorithm is proposed. Its low-level controller learns diverse options, while the high-level controller schedules options to learn solutions. It achieves an adaptive balance between exploration and exploitation during the frequent scheduling of continuous options, maximizing the representation potential of continuous options. It builds on multi-step static scheduling and makes switching decisions according to the relative advantages of the previous and the estimated continuous options, enabling the agent to focus on different behaviors at different phases of the task. The expected t-step distance is applied to demonstrate the superiority of adaptive scheduling in terms of exploration. Furthermore, an interruption incentive based on annealing is proposed to alleviate excessive exploration during the early training phase, accelerating the convergence rate. Finally, we apply HAS to robot control with sparse rewards in continuous spaces, and develop a comprehensive experimental analysis scheme. The experimental results not only demonstrate the high performance and robustness of HAS, but also provide evidence that the adaptive scheduling method has a positive effect both on the representation and option policies.

RL to learn not only manipulator skills but also safety skills

A. C. Ak, E. E. Aksoy and S. Sariel, Learning Failure Prevention Skills for Safe Robot Manipulation, IEEE Robotics and Automation Letters, vol. 8, no. 12, pp. 7994-8001, Dec. 2023 DOI: 10.1109/LRA.2023.3324587.

Robots are more capable of achieving manipulation tasks for everyday activities than before. However, the safety of manipulation skills that robots employ is still an open problem. Considering all possible failures during skill learning increases the complexity of the process and restrains learning an optimal policy. Nonetheless, safety-focused modularity in the acquisition of skills has not been adequately addressed in previous works. For that purpose, we reformulate skills as base and failure prevention skills, where base skills aim at completing tasks and failure prevention skills aim at reducing the risk of failures to occur. Then, we propose a modular and hierarchical method for safe robot manipulation by augmenting base skills by learning failure prevention skills with reinforcement learning and forming a skill library to address different safety risks. Furthermore, a skill selection policy that considers estimated risks is used for the robot to select the best control policy for safe manipulation. Our experiments show that the proposed method achieves the given goal while ensuring safety by preventing failures. We also show that with the proposed method, skill learning is feasible and our safe manipulation tools can be transferred to the real environment.

Improving sample efficiency in actor-critic RL (A2C with NNs) through multimodal advantage function

Jonghyeok Park, Soohee Han, Reinforcement learning with multimodal advantage function for accurate advantage estimation in robot learning, Engineering Applications of Artificial Intelligence, Volume 126, Part C, 2023 DOI: 10.1016/j.engappai.2023.107019.

In this paper, we propose a reinforcement learning (RL) framework that uses a multimodal advantage function (MAF) to come close to the true advantage function, thereby achieving high returns. The MAF, which is constructed as a logarithm of a mixture of Gaussians policy (MoG-P) and trained by globally collected past experiences, directly assesses the complex true advantage function with its multi-modality and is expected to enhance the sample-efficiency of RL. To realize the expected enhanced learning performance with the proposed RL framework, two practical techniques are developed that include mode selection and rounding off of actions during the policy update process. Mode selection is conducted to sample the action around the most influential or weighted mode for efficient environment exploration. For fast policy updates, past actions are rounded off to discretized action values when calculating the multimodal advantage function. The proposed RL framework was validated using simulation environments and a real inverted pendulum system. The findings showed that the proposed framework can achieve a more sample-efficient performance or higher returns than other advantage-based RL benchmarks.

Learning options in RL and using rewards adequately in that context

Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White, Reward-respecting subtasks for model-based reinforcement learning, Artificial Intelligence, Volume 324, 2023, DOI: 10.1016/j.artint.2023.104001.

To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress with state abstraction, but temporal abstraction has rarely been used, despite extensively developed theory based on the options framework. One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks, such as reaching a bottleneck state or maximizing the cumulative sum of a sensory signal other than reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. In most previous work, the subtasks ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option terminates. We show that option models obtained from such reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how values, policies, options, and models can all be learned online and off-policy using standard algorithms and general value functions.