Author Archives: Juan-antonio Fernández-madrigal

Action selection strategy for model-free RL based on neurophysiology

D. Wang, S. Chen, Y. Hu, L. Liu and H. Wang, Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 1, pp. 219-233, March 2022 DOI: 10.1109/TCDS.2020.3035778.

Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor\u2013critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

Incremental learning (i.e., non-stationary environments, online -live- learning, task adaptation, life-long learning,…) for robots with Q-learning

Y. Hu, D. Li, Y. He and J. Han, Incremental Learning Framework for Autonomous Robots Based on Q-Learning and the Adaptive Kernel Linear Model IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 1, pp. 64-74, March 2022 DOI: 10.1109/TCDS.2019.2962228.

The performance of autonomous robots in varying environments needs to be improved. For such incremental improvement, here we propose an incremental learning framework based on Q -learning and the adaptive kernel linear (AKL) model. The AKL model is used for storing behavioral policies that are learned by Q -learning. Both the structure and parameters of the AKL model can be trained using a novel L2-norm kernel recursive least squares (L2-KRLS) algorithm. The AKL model initially without nodes and gradually accumulates content. The proposed framework allows to learn new behaviors without forgetting the previous ones. A novel local \u03b5 -greedy policy is proposed to speed the convergence rate of Q -learning. It calculates the exploration probability of each state for generating and selecting more important training samples. The performance of our incremental learning framework was validated in two experiments. A curve-fitting example shows that the L2-KRLS-based AKL model is suitable for incremental learning. The second experiment is based on robot learning tasks. The results show that our framework can incrementally learn behaviors in varying environments. Local \u03b5 -greedy policy-based Q -learning is faster than the existing Q -learning algorithms.

The brain as a communication network

John D. Mollon, Chie Takahashi, Marina V. Danilova, What kind of network is the brain? Trends in Cognitive Sciences, Volume 26, Issue 4, 2022, Pages 312-324 DOI: 10.1016/j.tics.2022.01.007.

The different areas of the cerebral cortex are linked by a network of white matter, comprising the myelinated axons of pyramidal cells. Is this network a neural net, in the sense that representations of the world are embodied in the structure of the net, its pattern of nodes, and connections? Or is it a communications network, where the same physical substrate carries different information from moment to moment? This question is part of the larger question of whether the brain is better modeled by connectionism or by symbolic artificial intelligence (AI), but we review it in the specific context of the psychophysics of stimulus comparison and the format and protocol of information transmission over the long-range tracts of the brain.

Adaptation of model-free RL to variations in the task under continuous state and action spaces applied to robot grasping

Shahid, A.A., Piga, D., Braghin, F. et al. Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning, Auton Robot 46, 483\u2013498 (2022) DOI: 10.1007/s10514-022-10034-z.

This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). In order to accelerate the learning process, the fine-tuning procedure is proposed that demonstrates the continuous adaptation of on-policy RL to new environments, allowing the learned policy to adapt and execute the (partially) modified task. A dense reward function is designed for the task to enable an efficient learning of the agent. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The learned control policy is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations. The approach is finally tested on a real Franka Emika Panda robot, showing the possibility to transfer the learned behavior from simulation. Experimental results show 100% of successful grasping tasks, making the proposed approach applicable to real applications.

New algorithms for outlier detection with applications in robotics

P. Antonante, V. Tzoumas, H. Yang and L. Carlone, Outlier-Robust Estimation: Hardness, Minimally Tuned Algorithms, and Applications, IEEE Transactions on Robotics, vol. 38, no. 1, pp. 281-301, Feb. 2022 DOI: 10.1109/TRO.2021.3094984.

Nonlinear estimation in robotics and vision is typically plagued with outliers due to wrong data association or incorrect detections from signal processing and machine learning methods. This article introduces two unifying formulations for outlier-robust estimation, generalized maximum consensus ( $\text{G}$ – $\text{MC}$ ) and generalized truncated least squares ( $\text{G-TLS}$ ), and investigates fundamental limits, practical algorithms, and applications. Our first contribution is a proof that outlier-robust estimation is inapproximable: In the worst case, it is impossible to (even approximately) find the set of outliers, even with slower-than-polynomial-time algorithms (particularly, algorithms running in quasi-polynomial time). As a second contribution, we review and extend two general-purpose algorithms. The first, adaptive trimming ( $\text{ADAPT}$ ), is combinatorial and is suitable for $\text{G}$ – $\text{MC}$ ; the second, graduated nonconvexity ( $\text{GNC}$ ), is based on homotopy methods and is suitable for $\text{G-TLS}$ . We extend $\text{ADAPT}$ and $\text{GNC}$ to the case where the user does not have prior knowledge of the inlier-noise statistics (or the statistics may vary over time) and is unable to guess a reasonable threshold to separate inliers from outliers (as the one commonly used in RANdom SAmple Consensus $(\text{RANSAC})$ . We propose the first minimally tuned algorithms for outlier rejection, which dynamically decide how to separate inliers from outliers. Our third contribution is an evaluation of the proposed algorithms on robot perception problems: mesh registration, image-based object detection ( shape alignment ), and pose graph optimization. $\text{ADAPT}$ and $\text{GNC}$ execute in real time, are deterministic, outperform $\text{RANSAC}$ , and are robust up to 80\u201390% outliers. Their minimally tuned versions also compare favorably with the state of the art, even though they do not rely on a noise bound for the inliers.

Leveraging embodiment: finding an optimal viewpoint in the robot environment for improving scene description

Tan, Sinan, Guo, Di, Liu, Huaping, Zhang, Xinyu, Sun, Fuchun Embodied scene description, Autonomous Robots 46(1) DOI: 10.1007/s10514-021-10014-9.

Embodiment is an important characteristic for all intelligent agents, while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from the interaction between the agent and the environment. In this work, we propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks. A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities. The proposed framework is tested on both the AI2Thor dataset and a real-world robotic platform for different scene description tasks, demonstrating the effectiveness and scalability of the developed method. Also, a mobile application is developed, which can be used to assist visually-impaired people to better understand their surroundings.

Using physical human-robot interaction to deduce the goals of the human during learning

Losey DP, Bajcsy A, O’Malley MK, Dragan AD, Physical interaction as communication: Learning robot objectives online from human corrections, The International Journal of Robotics Research. 2022;41(1):20-44 DOI: 10.1177/02783649211050958.

When a robot performs a task next to a human, physical interaction is inevitable: the human might push, pull, twist, or guide the robot. The state of the art treats these interactions as disturbances that the robot should reject or avoid. At best, these robots respond safely while the human interacts; but after the human lets go, these robots simply return to their original behavior. We recognize that physical human\u2013robot interaction (pHRI) is often intentional: the human intervenes on purpose because the robot is not doing the task correctly. In this article, we argue that when pHRI is intentional it is also informative: the robot can leverage interactions to learn how it should complete the rest of its current task even after the person lets go. We formalize pHRI as a dynamical system, where the human has in mind an objective function they want the robot to optimize, but the robot does not get direct access to the parameters of this objective: they are internal to the human. Within our proposed framework human interactions become observations about the true objective. We introduce approximations to learn from and respond to pHRI in real-time. We recognize that not all human corrections are perfect: often users interact with the robot noisily, and so we improve the efficiency of robot learning from pHRI by reducing unintended learning. Finally, we conduct simulations and user studies on a robotic manipulator to compare our proposed approach with the state of the art. Our results indicate that learning from pHRI leads to better task performance and improved human satisfaction.

An hypothesis that human perception can only be done in real-time if prediction mechanisms go ahead and save the gap caused by the processing of inputs, which actually cannot be done in real-time (plus further post-processing and adjustment of past perceptions)

Hinze Hogendoorn, Perception in real-time: predicting the present, reconstructing the past, Trends in Cognitive Sciences, Volume 26, Issue 2, 2022 DOI: 10.1016/j.tics.2021.11.003.

We feel that we perceive events in the environment as they unfold in real-time. However, this intuitive view of perception is impossible to implement in the nervous system due to biological constraints such as neural transmission delays. I propose a new way of thinking about real-time perception: at any given moment, instead of representing a single timepoint, perceptual mechanisms represent an entire timeline. On this timeline, predictive mechanisms predict ahead to compensate for delays in incoming sensory input, and reconstruction mechanisms retroactively revise perception when those predictions do not come true. This proposal integrates and extends previous work to address a crucial gap in our understanding of a fundamental aspect of our everyday life: the experience of perceiving the present.

A really nice comparison of different outlier detection methods

Hamzeh Alimohammadi, Shengnan Nancy Chen, Performance evaluation of outlier detection techniques in production timeseries: A systematic review and meta-analysis, Expert Systems with Applications, Volume 191, 2022 DOI: 10.1016/j.eswa.2021.116371.

Time-series data have been extensively collected and analyzed in many disciplines, such as stock market, medical diagnosis, meteorology, and oil and gas industry. Numerous data in these disciplines are sequence of observations measured as functions of time, which can be further used for different applications via analytical or data analytics techniques (e.g., to forecast future price, climate change, etc.). However, presence of outliers can cause significant uncertainties to interpretation results; hence, it is essential to remove the outliers accurately and efficiently before conducting any further analysis. A total of 17 techniques that belong to statistical, regression-based, and machine learning (ML) based categories for outlier detection in timeseries are applied to the oil and gas production data analysis. 15 of these methods are utilized for production data analysis for the first time. Two state-of-the-art and high-performance techniques are then selected for data cleaning which require minimum control and time complexity. Moreover, performances of these techniques are evaluated based on several metrics including the accuracy, precision, recall, F1 score, and Cohen\u2019s Kappa to rank the techniques. Results show that eight unsupervised algorithms outperform the rest of the methods based on the synthetic case study with known outliers. For example, accuracies of the eight shortlisted methods are in the range of 0.83\u20130.99 with a precision between 0.83 and 0.98, compared to 0.65\u20130.82 and 0.07\u20130.77 for the others. In addition, ML-based techniques perform better than statistical techniques. Our experimental results on real field data further indicate that the k-nearest neighbor (KNN) and Fulford-Blasingame methods are superior to other outlier detection frameworks for outlier detection in production data, followed by four others including density-based spatial clustering of applications with noise (DBSCAN), and angle-based outlier detection (ABOD). Even though the techniques are examined with oil and gas production data, but the same data cleaning workflow can be used to detect timeseries\u2019 outliers in other disciplines.

Defining and measuring mathematically the level of knowledge, ignorance and uncertainty

Fujun Hou, Evangelos Triantaphyllou, Juri Yanase, Knowledge, ignorance, and uncertainty: An investigation from the perspective of some differential equations, Expert Systems with Applications, Volume 191, 2022 DOI: 10.1016/j.eswa.2021.116325.

People use knowledge on several cognitive tasks such as when they recognize objects, rank entities such as the alternatives in multi-criteria decision making, or for classification tasks of decision making / expert / intelligent systems. When people have sufficient relevant knowledge, they can make well-distinctive assessments among entities. Otherwise, they may exhibit some uncertainty. This paper establishes two differential equations, of which one is for the interaction between the knowledge level and the uncertainty level, and the other is for the interaction between the ignorance level and the uncertainty level. By solving these two differential equations under certain boundary conditions, one can derive that the proposed knowledge level indicator is equivalent to Wierman’s knowledge granularity measure up to a constant (exactly, ln2). Moreover, the knowledge level indicator and the ignorance level indicator are found to be in a complementary relationship with each other. That is, more knowledge implies less ignorance, and vice-versa. The results of this study bridge a critical gap that exists in the understanding of the concepts of knowledge and ignorance.