Monthly Archives: October 2023

You are browsing the site archives by month.

Shorter exploration stage in RL through the use of expert (a PID) that sets the expectation of the explored action

J. Enrique Sierra-Garcia, Matilde Santos, Ravi Pandit, Wind turbine pitch reinforcement learning control improved by PID regulator and learning observer, Engineering Applications of Artificial Intelligence, Volume 111, 2022 DOI: 10.1016/j.engappai.2022.104769.

Wind turbine (WT) pitch control is a challenging issue due to the non-linearities of the wind device and its complex dynamics, the coupling of the variables and the uncertainty of the environment. Reinforcement learning (RL) based control arises as a promising technique to address these problems. However, its applicability is still limited due to the slowness of the learning process. To help alleviate this drawback, in this work we present a hybrid RL-based control that combines a RL-based controller with a proportional\u2013integral\u2013derivative (PID) regulator, and a learning observer. The PID is beneficial during the first training episodes as the RL based control does not have any experience to learn from. The learning observer oversees the learning process by adjusting the exploration rate and the exploration window in order to reduce the oscillations during the training and improve convergence. Simulation experiments on a small real WT show how the learning significantly improves with this control architecture, speeding up the learning convergence up to 37%, and increasing the efficiency of the intelligent control strategy. The best hybrid controller reduces the error of the output power by around 41% regarding a PID regulator. Moreover, the proposed intelligent hybrid control configuration has proved more efficient than a fuzzy controller and a neuro-control strategy.

Live-RL enhancement / reduction of unsafe situations by reducing the transition possibility of unsafe actions

Serhat Duman, Hamdi Tolga Kahraman, Yusuf Sonmez, Ugur Guvenc, Mehmet Kati, Sefa Aras, A powerful meta-heuristic search algorithm for solving global optimization and real-world solar photovoltaic parameter estimation problems, Engineering Applications of Artificial Intelligence, Volume 111, 2022 DOI: 10.1016/j.engappai.2022.104763.

The teaching-learning-based artificial bee colony (TLABC) is a new hybrid swarm-based metaheuristic search algorithm. It combines the exploitation of the teaching learning-based optimization (TLBO) with the exploration of the artificial bee colony (ABC). With the hybridization of these two nature-inspired swarm intelligence algorithms, a robust method has been proposed to solve global optimization problems. However, as with swarm-based algorithms, with the TLABC method, it is a great challenge to effectively simulate the selection process. Fitness-distance balance (FDB) is a powerful recently developed method to effectively imitate the selection process in nature. In this study, the three search phases of the TLABC algorithm were redesigned using the FDB method. In this way, the FDB-TLABC algorithm, which imitates nature more effectively and has a robust search performance, was developed. To investigate the exploitation, exploration, and balanced search capabilities of the proposed algorithm, it was tested on standard and complex benchmark suites (Classic, IEEE CEC 2014, IEEE CEC 2017, and IEEE CEC 2020). In order to verify the performance of the proposed FDB-TLABC for global optimization problems and in the photovoltaic parameter estimation problem (a constrained real-world engineering problem) a very comprehensive and qualified experimental study was carried out according to IEEE CEC standards. Statistical analysis results confirmed that the proposed FDB-TLABC provided the best optimum solution and yielded a superior performance compared to other optimization methods.

Interesting summary of photovoltaic modelling

Serhat Duman, Hamdi Tolga Kahraman, Yusuf Sonmez, Ugur Guvenc, Mehmet Kati, Sefa Aras, A powerful meta-heuristic search algorithm for solving global optimization and real-world solar photovoltaic parameter estimation problems, Engineering Applications of Artificial Intelligence, Volume 111, 2022 DOI: 10.1016/j.engappai.2022.104763.

The teaching-learning-based artificial bee colony (TLABC) is a new hybrid swarm-based metaheuristic search algorithm. It combines the exploitation of the teaching learning-based optimization (TLBO) with the exploration of the artificial bee colony (ABC). With the hybridization of these two nature-inspired swarm intelligence algorithms, a robust method has been proposed to solve global optimization problems. However, as with swarm-based algorithms, with the TLABC method, it is a great challenge to effectively simulate the selection process. Fitness-distance balance (FDB) is a powerful recently developed method to effectively imitate the selection process in nature. In this study, the three search phases of the TLABC algorithm were redesigned using the FDB method. In this way, the FDB-TLABC algorithm, which imitates nature more effectively and has a robust search performance, was developed. To investigate the exploitation, exploration, and balanced search capabilities of the proposed algorithm, it was tested on standard and complex benchmark suites (Classic, IEEE CEC 2014, IEEE CEC 2017, and IEEE CEC 2020). In order to verify the performance of the proposed FDB-TLABC for global optimization problems and in the photovoltaic parameter estimation problem (a constrained real-world engineering problem) a very comprehensive and qualified experimental study was carried out according to IEEE CEC standards. Statistical analysis results confirmed that the proposed FDB-TLABC provided the best optimum solution and yielded a superior performance compared to other optimization methods.

Variation of the Newton-Rhapson algorithm that copes with noise, with some illustrative applications such as robotics

D. Fu et al. Modified Newton Integration Algorithm With Noise Tolerance Applied to Robotics, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 4, pp. 2134-2144 DOI: 10.1109/TSMC.2021.3049386.

Currently, the Newton\u2013Raphson iterative algorithm has been extensively employed in the fields of basic research and engineering. However, when noise components exist in a system, its performance is largely affected. To remedy shortcomings that the conventional computing methods have encountered in a noisy workspace, a novel modified Newton integration (MNI) algorithm is proposed in this article. In addition, the steady-state error of the proposed MNI algorithm is smaller than that of the Newton\u2013Raphson algorithm under a noise-free or noisy workspace. To lay the foundations for the corresponding theoretical analyses, the proposed MNI algorithm is first converted into a homogeneous linear equation with a residual term. Then, the related theoretical analyses are carried out, which indicate that the MNI algorithm possesses noise-tolerance ability under various noisy environments. Finally, multiple computer simulations and physical experiments on robot control applications are performed to verify the feasibility and advantage of the proposed MNI algorithm.

Action selection strategy for model-free RL based on neurophysiology

D. Wang, S. Chen, Y. Hu, L. Liu and H. Wang, Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 1, pp. 219-233, March 2022 DOI: 10.1109/TCDS.2020.3035778.

Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor\u2013critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

Incremental learning (i.e., non-stationary environments, online -live- learning, task adaptation, life-long learning,…) for robots with Q-learning

Y. Hu, D. Li, Y. He and J. Han, Incremental Learning Framework for Autonomous Robots Based on Q-Learning and the Adaptive Kernel Linear Model IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 1, pp. 64-74, March 2022 DOI: 10.1109/TCDS.2019.2962228.

The performance of autonomous robots in varying environments needs to be improved. For such incremental improvement, here we propose an incremental learning framework based on Q -learning and the adaptive kernel linear (AKL) model. The AKL model is used for storing behavioral policies that are learned by Q -learning. Both the structure and parameters of the AKL model can be trained using a novel L2-norm kernel recursive least squares (L2-KRLS) algorithm. The AKL model initially without nodes and gradually accumulates content. The proposed framework allows to learn new behaviors without forgetting the previous ones. A novel local \u03b5 -greedy policy is proposed to speed the convergence rate of Q -learning. It calculates the exploration probability of each state for generating and selecting more important training samples. The performance of our incremental learning framework was validated in two experiments. A curve-fitting example shows that the L2-KRLS-based AKL model is suitable for incremental learning. The second experiment is based on robot learning tasks. The results show that our framework can incrementally learn behaviors in varying environments. Local \u03b5 -greedy policy-based Q -learning is faster than the existing Q -learning algorithms.

The brain as a communication network

John D. Mollon, Chie Takahashi, Marina V. Danilova, What kind of network is the brain? Trends in Cognitive Sciences, Volume 26, Issue 4, 2022, Pages 312-324 DOI: 10.1016/j.tics.2022.01.007.

The different areas of the cerebral cortex are linked by a network of white matter, comprising the myelinated axons of pyramidal cells. Is this network a neural net, in the sense that representations of the world are embodied in the structure of the net, its pattern of nodes, and connections? Or is it a communications network, where the same physical substrate carries different information from moment to moment? This question is part of the larger question of whether the brain is better modeled by connectionism or by symbolic artificial intelligence (AI), but we review it in the specific context of the psychophysics of stimulus comparison and the format and protocol of information transmission over the long-range tracts of the brain.

Adaptation of model-free RL to variations in the task under continuous state and action spaces applied to robot grasping

Shahid, A.A., Piga, D., Braghin, F. et al. Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning, Auton Robot 46, 483\u2013498 (2022) DOI: 10.1007/s10514-022-10034-z.

This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). In order to accelerate the learning process, the fine-tuning procedure is proposed that demonstrates the continuous adaptation of on-policy RL to new environments, allowing the learned policy to adapt and execute the (partially) modified task. A dense reward function is designed for the task to enable an efficient learning of the agent. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The learned control policy is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations. The approach is finally tested on a real Franka Emika Panda robot, showing the possibility to transfer the learned behavior from simulation. Experimental results show 100% of successful grasping tasks, making the proposed approach applicable to real applications.

New algorithms for outlier detection with applications in robotics

P. Antonante, V. Tzoumas, H. Yang and L. Carlone, Outlier-Robust Estimation: Hardness, Minimally Tuned Algorithms, and Applications, IEEE Transactions on Robotics, vol. 38, no. 1, pp. 281-301, Feb. 2022 DOI: 10.1109/TRO.2021.3094984.

Nonlinear estimation in robotics and vision is typically plagued with outliers due to wrong data association or incorrect detections from signal processing and machine learning methods. This article introduces two unifying formulations for outlier-robust estimation, generalized maximum consensus ( $\text{G}$ – $\text{MC}$ ) and generalized truncated least squares ( $\text{G-TLS}$ ), and investigates fundamental limits, practical algorithms, and applications. Our first contribution is a proof that outlier-robust estimation is inapproximable: In the worst case, it is impossible to (even approximately) find the set of outliers, even with slower-than-polynomial-time algorithms (particularly, algorithms running in quasi-polynomial time). As a second contribution, we review and extend two general-purpose algorithms. The first, adaptive trimming ( $\text{ADAPT}$ ), is combinatorial and is suitable for $\text{G}$ – $\text{MC}$ ; the second, graduated nonconvexity ( $\text{GNC}$ ), is based on homotopy methods and is suitable for $\text{G-TLS}$ . We extend $\text{ADAPT}$ and $\text{GNC}$ to the case where the user does not have prior knowledge of the inlier-noise statistics (or the statistics may vary over time) and is unable to guess a reasonable threshold to separate inliers from outliers (as the one commonly used in RANdom SAmple Consensus $(\text{RANSAC})$ . We propose the first minimally tuned algorithms for outlier rejection, which dynamically decide how to separate inliers from outliers. Our third contribution is an evaluation of the proposed algorithms on robot perception problems: mesh registration, image-based object detection ( shape alignment ), and pose graph optimization. $\text{ADAPT}$ and $\text{GNC}$ execute in real time, are deterministic, outperform $\text{RANSAC}$ , and are robust up to 80\u201390% outliers. Their minimally tuned versions also compare favorably with the state of the art, even though they do not rely on a noise bound for the inliers.