Modelling the perception of time in the human brain through RL with eligibility traces

I. Louren�o, R. Mattila, R. Ventura and B. Wahlberg, A Biologically Inspired Computational Model of Time Perception, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 258-268, June 2022 DOI: 10.1109/TCDS.2021.3120301.

Time perception\u2014how humans and animals perceive the passage of time\u2014forms the basis for important cognitive skills, such as decision making, planning, and communication. In this work, we propose a framework for examining the mechanisms responsible for time perception. We first model neural time perception as a combination of two known timing sources: internal neuronal mechanisms and external (environmental) stimuli, and design a decision-making framework to replicate them. We then implement this framework in a simulated robot. We measure the robot\u2019s success on a temporal discrimination task originally performed by mice to evaluate their capacity to exploit temporal knowledge. We conclude that the robot is able to perceive time similarly to animals when it comes to their intrinsic mechanisms of interpreting time and performing time-aware actions. Next, by analyzing the behavior of agents equipped with the framework, we propose an estimator to infer characteristics of the timing mechanisms intrinsic to the agents. In particular, we show that from their empirical action probability distribution, we are able to estimate parameters used for perceiving time. Overall, our work shows promising results when it comes to drawing conclusions regarding some of the characteristics present in biological timing mechanisms.

NOTE: See also H. Basgol, I. Ayhan and E. Ugur, “Time Perception: A Review on Psychological, Computational, and Robotic Models,” in IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 301-315, June 2022, doi: 10.1109/TCDS.2021.3059045.

Dealing with the exploration with a nice introduction to the problem

Jiayi Lu, Shuai Han, Shuai L�, Meng Kang, Junwei Zhang, Sampling diversity driven exploration with state difference guidance, Expert Systems with Applications, Volume 203, 2022 DOI: 10.1016/j.eswa.2022.117418.

Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot take both global interaction dynamics and local environment changes into account simultaneously. In this paper, we propose a novel intrinsic reward for off-policy learning, which not only encourages the agent to take actions not fully learned from a global perspective, but also instructs the agent to trigger remarkable changes in the environment from a local perspective. Meanwhile, we propose the double-actors\u2013double-critics framework to combine intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of intrinsic and extrinsic rewards in previous methods. This framework can be applied to off-policy learning algorithms based on the actor\u2013critic method. We provide a comprehensive evaluation of our approach on the MuJoCo benchmark environments. The results demonstrate that our method can perform effective exploration in the environments with dense, deceptive and sparse rewards. Besides, we conduct sufficient ablation and quantitative analyses to intrinsic rewards. Furthermore, we also verify the superiority and rationality of our double-actors\u2013double-critics framework through comparative experiments.

Reconstructing indoor map layouts from geometrical data

Matteo Luperto, Francesco Amigoni, Reconstruction and prediction of the layout of indoor environments from two-dimensional metric maps, Engineering Applications of Artificial Intelligence, Volume 113, 2022 DOI: 10.1016/j.engappai.2022.104910.

Metric maps, like occupancy grids, are one of the most common ways to represent indoor environments in autonomous mobile robotics. Although they are effective for navigation and localization, metric maps contain little knowledge about the structure of the buildings they represent. In this paper, we propose a method that identifies the structure of indoor environments from 2D metric maps by retrieving their layout, namely an abstract geometrical representation that models walls as line segments and rooms as polygons. The method works by finding regularities within a building, abstracting from the possibly noisy information of the metric map, and uses such knowledge to reconstruct the layout of the observed part and to predict a possible layout of the partially observed portion of the building. Thus, differently of other methods from the state of the art, our method can be applied both to fully observed environments and, most significantly, to partially observed ones. Experimental results show that our approach performs effectively and robustly on different types of input metric maps and that the predicted layout is increasingly more accurate when the input metric map is increasingly more complete. The layout returned by our method can be exploited in several tasks, such as semantic mapping, place categorization, path planning, human\u2013robot communication, and task allocation.

Increasing exploration when the agent performs worse, decreasing when performing better, in the context of DQN for distributing computation among cloud and edge servers, also dealing with hybridization of RL with Fuzzy

Do Bao Son, Ta Huu Binh, Hiep Khac Vo, Binh Minh Nguyen, Huynh Thi Thanh Binh, Shui Yu, Value-based reinforcement learning approaches for task offloading in Delay Constrained Vehicular Edge Computing, Engineering Applications of Artificial Intelligence, Volume 113, 2022 DOI: 10.1016/j.engappai.2022.104898.

In the age of booming information technology, human-being has witnessed the need for new paradigms with both high computational capability and low latency. A potential solution is Vehicular Edge Computing (VEC). Previous work proposed a Fuzzy Deep Q-Network in Offloading scheme (FDQO) that combines Fuzzy rules and Deep Q-Network (DQN) to improve DQN\u2019s early performance by using Fuzzy Controller (FC). However, we notice that frequent usage of FC can hinder the future growth performance of model. One way to overcome this issue is to remove Fuzzy Controller entirely. We introduced an algorithm called baseline DQN (b-DQN), represented by its two variants Static baseline DQN (Sb-DQN) and Dynamic baseline DQN (Db-DQN), to modify the exploration rate base on the average rewards of closest observations. Our findings confirm that these baseline DQN algorithms surpass traditional DQN models in terms of average Quality of Experience (QoE) in 100 time slots by about 6%, but still suffer from poor early performance (such as in the first 5 time slots). Here, we introduce baseline FDQO (b-FDQO). This algorithm has a strategy to modify the Fuzzy Logic usage instead of removing it entirely while still observing the rewards to modify the exploration rate. It brings a higher average QoE in the first 5 time slots compared to other non-fuzzy-logic algorithms by at least 55.12%, prevent the model from getting too bad result over all time slots, while having the late performance as good as that of b-DQN.

Abstraction of continuous control problems considered as MDPs

H. G. Tanner and A. Stager, Data-Driven Abstractions for Robots With Stochastic Dynamics, IEEE Transactions on Robotics, vol. 38, no. 3, pp. 1686-1702, June 2022 DOI: 10.1109/TRO.2021.3119209.

This article describes the construction of stochastic, data-based discrete abstractions for uncertain random processes continuous in time and space. Motivated by the fact that modeling processes often introduce errors which interfere with the implementation of control strategies, here the abstraction process proceeds in reverse: the methodology does not abstract models; rather it models abstractions. Specifically, it first formalizes a template for a family of stochastic abstractions, and then fits the parameters of that template to match the dynamics of the underlying process and ground the abstraction. The article also shows how the parameter-fitting approach can be implemented based on a probabilistic model validation approach which draws from randomized algorithms, and results in a discrete abstract model which is approximately simulated by the actual process physics, at a desired confidence level. In this way, the models afford the implementation of symbolic control plans with probabilistic guarantees at a desired level of fidelity.

Continuous POMDPs through belief state sparsification, applied to active SLAM

Elimelech K, Indelman V. Simplified decision making in the belief space using belief sparsification. The International Journal of Robotics Research. 2022;41(5):470-496 DOI: 10.1177/02783649221076381.

In this work, we introduce a new and efficient solution approach for the problem of decision making under uncertainty, which can be formulated as decision making in a belief space, over a possibly high-dimensional state space. Typically, to solve a decision problem, one should identify the optimal action from a set of candidates, according to some objective. We claim that one can often generate and solve an analogous yet simplified decision problem, which can be solved more efficiently. A wise simplification method can lead to the same action selection, or one for which the maximal loss in optimality can be guaranteed. Furthermore, such simplification is separated from the state inference and does not compromise its accuracy, as the selected action would finally be applied on the original state. First, we present the concept for general decision problems and provide a theoretical framework for a coherent formulation of the approach. We then practically apply these ideas to decision problems in the belief space, which can be simplified by considering a sparse approximation of their initial belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a realistic active-SLAM problem and manage to significantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical and holds numerous possible extensions.

Hybridizing model-free and model-based in continuous RL, and a nice review of current research and benchmarks in robotics

Pinosky A, Abraham I, Broad A, Argall B, Murphey TD. Hybrid control for combining model-based and model-free reinforcement learning The International Journal of Robotics Research. 2023;42(6):337-355 DOI: 10.1177/02783649221083331.

We develop an approach to improve the learning capabilities of robotic systems by combining learned predictive models with experience-based state-action policy mappings. Predictive models provide an understanding of the task and the dynamics, while experience-based (model-free) policy mappings encode favorable actions that override planned actions. We refer to our approach of systematically combining model-based and model-free learning methods as hybrid learning. Our approach efficiently learns motor skills and improves the performance of predictive models and experience-based policies. Moreover, our approach enables policies (both model-based and model-free) to be updated using any off-policy reinforcement learning method. We derive a deterministic method of hybrid learning by optimally switching between learning modalities. We adapt our method to a stochastic variation that relaxes some of the key assumptions in the original derivation. Our deterministic and stochastic variations are tested on a variety of robot control benchmark tasks in simulation as well as a hardware manipulation task. We extend our approach for use with imitation learning methods, where experience is provided through demonstrations, and we test the expanded capability with a real-world pick-and-place task. The results show that our method is capable of improving the performance and sample efficiency of learning motor skills in a variety of experimental domains.

How plans influence sensors

McFassel G, Shell DA. Reactivity and statefulness: Action-based sensors, plans, and necessary state. The International Journal of Robotics Research. 2023;42(6):385-411 DOI: 10.1177/02783649221078874.

Typically to a roboticist, a plan is the outcome of other work, a synthesized object that realizes ends defined by some problem; plans qua plans are seldom treated as first-class objects of study. Plans designate functionality: a plan can be viewed as defining a robot\u2019s behavior throughout its execution. This informs and reveals many other aspects of the robot\u2019s design, including: necessary sensors and action choices, history, state, task structure, and how to define progress. Interrogating sets of plans helps in comprehending the ways in which differing executions influence the interrelationships between these various aspects. Revisiting Erdmann\u2019s theory of action-based sensors, a classical approach for characterizing fundamental information requirements, we show how plans (in their role of designating behavior) influence sensing requirements. Using an algorithm for enumerating plans, we examine how some plans for which no action-based sensor exists can be transformed into sets of sensors through the identification and handling of features that preclude the existence of action-based sensors. We are not aware of those obstructing features having been previously identified. Action-based sensors may be treated as standalone reactive plans; we relate them to the set of all possible plans through a lattice structure. This lattice reveals a boundary between plans with action-based sensors and those without. Some plans, specifically those that are not reactive plans and require some notion of internal state, can never have associated action-based sensors. Even so, action-based sensors can serve as a framework to explore and interpret how such plans make use of state.

POMDPs in robotics: QMDP-Net as a counterpart for the Partially Observable Markov Decision Process (POMDP) whose transition, observation, and reward functions are initially unknown

Collins N, Kurniawati H. Locally connected interrelated network: A forward propagation primitive, The International Journal of Robotics Research. 2023;42(6):371-384 DOI: 10.1177/02783649221093092.

End-to-end learning for planning is a promising approach for finding good robot strategies in situations where the state transition, observation, and reward functions are initially unknown. Many neural network architectures for this approach have shown positive results. Across these networks, seemingly small components have been used repeatedly in different architectures, which means improving the efficiency of these components has great potential to improve the overall performance of the network. This paper aims to improve one such component: The forward propagation module. In particular, we propose Locally Connected Interrelated Network (LCI-Net) \u2013 a novel type of locally connected layer with unshared but interrelated weights \u2013 to improve the efficiency of learning stochastic transition models for planning and propagating information via the learned transition models. LCI-Net is a small differentiable neural network module that can be plugged into various existing architectures. For evaluation purposes, we apply LCI-Net to VIN and QMDP-Net. VIN is an end-to-end neural network for solving Markov Decision Processes (MDPs) whose transition and reward functions are initially unknown, while QMDP-Net is its counterpart for the Partially Observable Markov Decision Process (POMDP) whose transition, observation, and reward functions are initially unknown. Simulation tests on benchmark problems involving 2D and 3D navigation and grasping indicate promising results: Changing only the forward propagation module alone with LCI-Net improves VIN\u2019s and QMDP-Net generalisation capability by more than 3� and 10�, respectively.

RL in manufacturing control

Vladimir Samsonov, Karim Ben Hicham, Tobias Meisen, Reinforcement Learning in Manufacturing Control: Baselines, challenges and ways forward, Engineering Applications of Artificial Intelligence, Volume 112, 2022 DOI: 10.1016/j.engappai.2022.104868.

The field of Neural Combinatorial Optimization (NCO) offers multiple learning-based approaches to solve well-known combinatorial optimization tasks such as Traveling Salesman or Knapsack problem capable of competing with classical optimization approaches in terms of both solution quality and speed. This brought the attention of the research community to the tasks of Manufacturing Control (MC) with combinatorial nature. In this paper we outline the main components of MC tasks, select the most promising application fields and analyze dedicated learning-based solutions available in the literature. We draw multiple parallels to the current state of the art in the NCO field and allocate the main research gaps and directions on the perception, cognition and interaction levels. Using a set of practical examples we implement and benchmark common design patterns for single-agent Reinforcement Learning (RL) solutions. Along with testing existing solutions, we build on the ranked reward idea (Laterre et al., 2018) and offer a novel Multi-Instance Ranked Reward (m-R2) approach tailored to MC optimization tasks. It minimizes the reward shaping effort and defines a suitable training curriculum for more stable learning by separately tracking the agent\u2019s performance on every scheduling task and rewarding only policies contributing towards better scheduling solutions. We implement all solution design patterns as a set of interchangeable modules with a shared API, unified in a benchmarking framework with the focus on standardization of training and evaluation processes, reproducibility and simplified experiment lifecycle management. In addition to the framework, we make available our discrete-event simulation of a job shop production.

Also:

Zhihao Liu, Quan Liu, Wenjun Xu, Lihui Wang, Zude Zhou,
Robot learning towards smart robotic manufacturing: A review,
Robotics and Computer-Integrated Manufacturing,
Volume 77,
2022,
102360,
ISSN 0736-5845,
https://doi.org/10.1016/j.rcim.2022.102360.