A novel RL setting for non-Markovian systems

Ronen I. Brafman, Giuseppe De Giacomo, Regular decision processes, Artificial Intelligence, Volume 331, 2024 DOI: 10.1016/j.artint.2024.104113.

We introduce and study Regular Decision Processes (RDPs), a new, compact model for domains with non-Markovian dynamics and rewards, in which the dependence on the past is regular, in the language theoretic sense. RDPs are an intermediate model between MDPs and POMDPs. They generalize k-order MDPs and can be viewed as a POMDP in which the hidden state is a regular function of the entire history. In factored RDPs, transition and reward functions are specified using formulas in linear temporal logics over finite traces, or using regular expressions. This allows specifying complex dependence on the past using intuitive and compact formulas, and building models of partially observable domains without specifying an underlying state space.

The problem of incorporating novelties into the knowledge of an AI agent

Shivam Goel, Panagiotis Lymperopoulos, Ravenna Thielstrom, Evan Krause, Patrick Feeney, Pierrick Lorang, Sarah Schneider, Yichen Wei, Eric Kildebeck, Stephen Goss, Michael C. Hughes, Liping Liu, Jivko Sinapov, Matthias Scheutz, A neurosymbolic cognitive architecture framework for handling novelties in open worlds, Artificial Intelligence, Volume 331, 2024 DOI: 10.1016/j.artint.2024.104111.

“Open world” environments are those in which novel objects, agents, events, and more can appear and contradict previous understandings of the environment. This runs counter to the “closed world” assumption used in most AI research, where the environment is assumed to be fully understood and unchanging. The types of environments AI agents can be deployed in are limited by the inability to handle the novelties that occur in open world environments. This paper presents a novel cognitive architecture framework to handle open-world novelties. This framework combines symbolic planning, counterfactual reasoning, reinforcement learning, and deep computer vision to detect and accommodate novelties. We introduce general algorithms for exploring open worlds using inference and machine learning methodologies to facilitate novelty accommodation. The ability to detect and accommodate novelties allows agents built on this framework to successfully complete tasks despite a variety of novel changes to the world. Both the framework components and the entire system are evaluated in Minecraft-like simulated environments. Our results indicate that agents are able to efficiently complete tasks while accommodating “concealed novelties” not shared with the architecture development team.

Imitating physiological processes for achieving robot-human social interaction

Marcos Maroto-Gómez, Martín Bueno-Adrada, María Malfaz, Álvaro Castro-González, Miguel Ángel Salichs, Human–robot pair-bonding from a neuroendocrine perspective: Modeling the effect of oxytocin, arginine vasopressin, and dopamine on the social behavior of an autonomous robot, Robotics and Autonomous Systems, Volume 176, 2024 DOI: 10.1016/j.robot.2024.104687.

Robots and humans coexist in various social environments. In these contexts, robots predominantly serve as assistants, necessitating communication and understanding capabilities. This paper introduces a biologically inspired model grounded on neuroendocrine substances that facilitate the development of social bonds between robots and individuals. The model simulates the effects of oxytocin, arginine vasopressin, and dopamine on social behavior, acting as modulators for bonding in the interaction between the social robot Mini and its users. Neuroendocrine levels vary in response to circadian rhythms and social stimuli perceived by the robot. If users express care for the robot, a positive bond is established, enhancing human–robot interaction by prompting the robot to engage in cooperative actions such as playing or communicating more frequently. Conversely, mistreating the robot leads to a deterioration of the relationship, causing user rejection. An experimenter-robot interaction scenario illustrates the model’s adaptive mechanisms involving three types of profiles: Friendly, Aversive, and Naive. Besides, a user study with 22 participants was conducted to analyze the differences in Attachment, Social Presence, perceived Anthropomorphism, Likability, and User Experience between a robot randomly selecting its behavior and a robot behaving using the bioinspired pair-bonded method proposed in this contribution. The results show how the pair-bonding with the user regulates the robot’s social behavior in response to user actions. The user study reveals statistical differences favoring the robot using the pair-bonding regulation in Attachment and Social Presence. A qualitative study using an interview-like form suggests the positive effects of creating bonds with bioinspired robots.

Interesting testing (simulated) bed for quadrotors

Júnio Santos Bulhões, Cristiane Lopes Martins, Cristian Hansen, Márcio Rodrigues da Cunha Reis, Alana da Silva Magalhães, Antonio Paulo Coimbra, Wesley Pacheco Calixto, Platform and simulator with three degrees of freedom for testing quadcopters, Robotics and Autonomous Systems, Volume 176, 2024 DOI: 10.1016/j.robot.2024.104682.

This study aims to design a test platform for quadcopters, which allows the execution of all rotational movements and prevents translational movements without affecting the dynamics of the system. The methodological approach involves both simulation and the construction of the test platform. Two simulators are developed: (i) a linear simulator, used to assist in determining control parameters, and (ii) a nonlinear simulator, used to model the nonlinearity inherent to the rotational behavior of aircraft. In addition, the control system for the quadcopter is implemented, utilizing proportional, integral, and derivative control principles. By conducting seven experiments on the test platform and in the nonlinear simulator, the obtained results are compared in order to validate the proposed methodology. The mean discrepancy observed between the mean absolute difference obtained by the test platform and by the nonlinear simulator for the angle ϕ was 0.85°, for the angle θ was 2.77°, and for the angle ψ was 4.66°. When analyzed separately, the mean absolute errors for the angles, using the nonlinear simulator and the test platform, showed differences below 2% in almost all evaluated experiments. The developed test platform preserves the rotational dynamics of the quadcopter as desired, closely approaching the results obtained by the nonlinear simulator. Consequently, this platform can be used to carry out practical tests in a controlled environment.

Interesting improvements in MC localization

Alireza Mohseni, Vincent Duchaine, Tony Wong, Improvement in Monte Carlo localization using information theory and statistical approaches, Engineering Applications of Artificial Intelligence, Volume 131, 2024 DOI: 10.1016/j.engappai.2024.107897.

Monte Carlo localization methods deploy a particle filter to resolve a hidden Markov process based on recursive Bayesian estimation, which approximates the internal states of a dynamic system given observation data. When the observed data are corrupted by outliers, the particle filter’s performance may deteriorate, preventing the algorithm from accurately computing dynamic system states such as a robot’s position, which in turn reduces the accuracy of the localization and navigation. In this paper, the notion of information entropy is used to identify outliers. Then, a probability-based approach is used to remove the discovered outliers. In addition, a new mutation process is added to the localization algorithm to exploit the posterior probability density function in order to actively detect the high-likelihood region. The goal of incorporating the mutation operator into this method is to solve the problem of algorithm impoverishment which is due to insufficient representation of the complete probability density function. Simulation experiments are used to confirm the effectiveness of the proposed techniques. They also are employed to predict the remaining viability of a lithium-ion battery. Furthermore, in an experimental study, the modified Monte Carlo localization algorithm was applied to a mobile robot to demonstrate the local planner’s improved accuracy. The test results indicate that developed techniques are capable of effectively capturing the dynamic behavior of a system and accurately tracking its characteristics.

What attention is (from a cognitive science point of view)

Wayne Wu, We know what attention is!, Trends in Cognitive Sciences, Volume 28, Issue 4, 2024 DOI: 10.1016/j.tics.2023.11.007.

Attention is one of the most thoroughly investigated psychological phenomena, yet skepticism about attention is widespread: we do not know what it is, it is too many things, there is no such thing. The deficiencies highlighted are not about experimental work but the adequacy of the scientific theory of attention. Combining common scientific claims about attention into a single theory leads to internal inconsistency. This paper demonstrates that a specific functional conception of attention is incorporated into the tasks used in standard experimental paradigms. In accepting these paradigms as valid probes of attention, we commit to this common conception. The conception unifies work at multiple levels of analysis into a coherent scientific explanation of attention. Thus, we all know what attention is.

On how much imagery can be said to be real or not by the human brain

Rebecca Keogh, Reality check: how do we know what’s real?, Trends in Cognitive Sciences, Volume 28, Issue 4, 2024 DOI: 10.1016/j.tics.2023.06.001.

How do we know what is real and what is merely a figment of our imagination? Dijkstra and Fleming tackle this question in a recent study. In contrast to the classic Perky effect, they found that once imagery crosses a ‘reality threshold’, it becomes difficult to distinguish from reality.

Learning how to reset the episode in RL

S. -H. Lee and S. -W. Seo, Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning Without Task-Specific Knowledge, IEEE Robotics and Automation Letters, vol. 9, no. 5, pp. 4043-4050, May 2024 DOI: 10.1109/LRA.2024.3375714.

A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent’s learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent’s learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation and manipulation tasks, outperforming baselines with significantly fewer manual resets.

Networked differential telerrobot remotely controlled in spite of disturbances and delays

Luca Nanu, Luigi Colangelo, Carlo Novara, Carlos Perez Montenegro, Embedded model control of networked control systems: An experimental robotic application, Mechatronics, Volume 99, 2024 DOI: 10.1016/j.mechatronics.2024.103160.

In Networked Control System (NCS), the absence of physical communication links in the loop leads to relevant issues, such as measurement delays and asynchronous execution of the control commands. In general, these issues may significantly compromise the performance of the NCS, possibly causing unstable behaviours. This paper presents an original approach to the design of a complete digital control unit for a system characterized by a varying sampling time and asynchronous command execution. The approach is based on the Embedded Model Control (EMC) methodology, whose key feature is the estimation of the disturbances, errors and nonlinearities affecting the plant to control and their online cancellation. In this way, measurement delays and execution asynchronicity are treated as errors and rejected up to a given frequency by the EMC unit. The effectiveness of the proposed approach is demonstrated in a real-world case-study, where the NCS consists of a differential-drive mobile robot (the plant) and a control unit, and the two subsystems communicate through the web without physical connection links. After a preliminary verification using a high-fidelity numerical simulator, the designed controller is validated in several experimental tests, carried out on a real-time embedded system incorporated in the robotic platform.

Improving EKF and UKF when diverse precision sensors are used for localization through adaptive covariances

Giseo Park, Optimal vehicle position estimation using adaptive unscented Kalman filter based on sensor fusion, Mechatronics, Volume 99, 2024 DOI: 10.1016/j.mechatronics.2024.103144.

Precise position recognition systems are actively used in various automotive technology fields such as autonomous vehicles, intelligent transportation systems, and vehicle driving safety systems. In line with this demand, this paper proposes a new vehicle position estimation algorithm based on sensor fusion between low-cost standalone global positioning system (GPS) and inertial measurement unit (IMU) sensors. In order to estimate accurate vehicle position information using two complementary sensor types, adaptive unscented Kalman filter (AUKF), an optimal state estimation algorithm, is applied to the vehicle kinematic model. Since this AUKF includes an adaptive covariance matrix whose value changes under GPS outage conditions, it has high estimation robustness even if the accuracy of the GPS measurement signal is low. Through comparison of estimation errors with both extended Kalman filter (EKF) and UKF, which are widely used state estimation algorithms, it can be confirmed how improved the estimation performance of the proposed AUKF algorithm in real-vehicle experiments is. The given test course includes roads of various shapes as well as GPS outage sections, so it is suitable for evaluating vehicle position estimation performance.