Bayesian estimation of the model in model-based RL for robots

Senda, Kei, Hishinuma, Toru, Tani, Yurika, Approximate Bayesian reinforcement learning based on estimation of plant, Autonomous Robots 44(5), DOI: 10.1007/s10514-020-09901-4.

This study proposes an approximate parametric model-based Bayesian reinforcement learning approach for robots, based on online Bayesian estimation and online planning for an estimated model. The proposed approach is designed to learn a robotic task with a few real-world samples and to be robust against model uncertainty, within feasible computational resources. The proposed approach employs two-stage modeling, which is composed of (1) a parametric differential equation model with a few parameters based on prior knowledge such as equations of motion, and (2) a parametric model that interpolates a finite number of transition probability models for online estimation and planning. The proposed approach modifies the online Bayesian estimation to be robust against approximation errors of the parametric model to a real plant. The policy planned for the interpolating model is proven to have a form of theoretical robustness. Numerical simulation and hardware experiments of a planar peg-in-hole task demonstrate the effectiveness of the proposed approach.

Adapting the resolution of depth sensors and the location of the high-resolution area (fovea) as a possible attention mechanism in robots

Tasneem Z, Adhivarahan C, Wang D, Xie H, Dantu K, Koppal SJ., Adaptive fovea for scanning depth sensors, The International Journal of Robotics Research. 2020;39(7):837-855, DOI: 10.1177/0278364920920931.

Depth sensors have been used extensively for perception in robotics. Typically these sensors have a fixed angular resolution and field of view (FOV). This is in contrast to human perception, which involves foveating: scanning with the eyes’ highest angular resolution over regions of interest (ROIs). We build a scanning depth sensor that can control its angular resolution over the FOV. This opens up new directions for robotics research, because many algorithms in localization, mapping, exploration, and manipulation make implicit assumptions about the fixed resolution of a depth sensor, impacting latency, energy efficiency, and accuracy. Our algorithms increase resolution in ROIs either through deconvolutions or intelligent sample distribution across the FOV. The areas of high resolution in the sensor FOV act as artificial fovea and we adaptively vary the fovea locations to maximize a well-known information theoretic measure. We demonstrate novel applications such as adaptive time-of-flight (TOF) sensing, LiDAR zoom, gradient-based LiDAR sensing, and energy-efficient LiDAR scanning. As a proof of concept, we mount the sensor on a ground robot platform, showing how to reduce robot motion to obtain a desired scanning resolution. We also present a ROS wrapper for active simulation for our novel sensor in Gazebo. Finally, we provide extensive empirical analysis of all our algorithms, demonstrating trade-offs between time, resolution and stand-off distance.

Interesting review of pshycological motivation and the role of RL in studying it

Randall C. O’Reilly, Unraveling the Mysteries of Motivation, Trends in Cognitive Sciences, Volume 24, Issue 6, 2020, Pages 425-434, DOI: 10.1016/j.tics.2020.03.001.

Motivation plays a central role in human behavior and cognition but is not well captured by widely used artificial intelligence (AI) and computational modeling frameworks. This Opinion article addresses two central questions regarding the nature of motivation: what are the nature and dynamics of the internal goals that drive our motivational system and how can this system be sufficiently flexible to support our ability to rapidly adapt to novel situations, tasks, etc.? In reviewing existing systems and neuroscience research and theorizing on these questions, a wealth of insights to constrain the development of computational models of motivation can be found.

Path planning by merging random sampling (RRT) with informed heuristics (A*)

Jonathan D Gammell, Timothy D Barfoot, Siddhartha S Srinivasa, Batch Informed Trees (BIT*): Informed asymptotically optimal anytime search, The International Journal of Robotics Research. 2020;39(5):543-567, DOI: 10.1177/0278364919890396.

Path planning in robotics often requires finding high-quality solutions to continuously valued and/or high-dimensional problems. These problems are challenging and most planning algorithms instead solve simplified approximations. Popular approximations include graphs and random samples, as used by informed graph-based searches and anytime sampling-based planners, respectively.

Informed graph-based searches, such as A*, traditionally use heuristics to search a priori graphs in order of potential solution quality. This makes their search efficient, but leaves their performance dependent on the chosen approximation. If the resolution of the chosen approximation is too low, then they may not find a (suitable) solution, but if it is too high, then they may take a prohibitively long time to do so.

Anytime sampling-based planners, such as RRT*, traditionally use random sampling to approximate the problem domain incrementally. This allows them to increase resolution until a suitable solution is found, but makes their search dependent on the order of approximation. Arbitrary sequences of random samples approximate the problem domain in every direction simultaneously, but may be prohibitively inefficient at containing a solution.

This article unifies and extends these two approaches to develop Batch Informed Trees (BIT*), an informed, anytime sampling-based planner. BIT* solves continuous path planning problems efficiently by using sampling and heuristics to alternately approximate and search the problem domain. Its search is ordered by potential solution quality, as in A*, and its approximation improves indefinitely with additional computational time, as in RRT*. It is shown analytically to be almost-surely asymptotically optimal and experimentally to outperform existing sampling-based planners, especially on high-dimensional planning problems.

Consciousness as a learning framework

Axel Cleeremans, Dalila Achoui, Arnaud Beauny, Lars Keuninckx, Jean-Remy Martin, Santiago Muñoz-Moldes, Laurène Vuillaume, Adélaïde de Heering, Learning to Be Conscious, Trends in Cognitive Sciences, Volume 24, Issue 2, 2020, Pages 112-123 DOI: 10.1016/j.tics.2019.11.011.

Consciousness remains a formidable challenge. Different theories of consciousness have proposed vastly different mechanisms to account for phenomenal experience. Here, appealing to aspects of global workspace theory, higher-order theories, social theories, and predictive processing, we introduce a novel framework: the self-organizing metarerpresentational account (SOMA), in which consciousness is viewed as something that the brain learns to do. By this account, the brain continuously and unconsciously learns to redescribe its own activity to itself, so developing systems of metarepresentations that qualify target first-order representations. Thus, experiences only occur in experiencers that have learned to know they possess certain first-order states and that have learned to care more about certain states than about others. In this sense, consciousness is the brain’s (unconscious, embodied, enactive, nonconceptual) theory about itself.

Including the models into the state of a POMDP for learning them (using POMCPs in a robotic application)

Akinobu Hayashi, Dirk Ruiken, Tadaaki Hasegawa, Christian Goerick, Reasoning about uncertain parameters and agent behaviors through encoded experiences and belief planning, Artificial Intelligence, Volume 280, 2020 DOI: 10.1016/j.artint.2019.103228.

Robots are expected to handle increasingly complex tasks. Such tasks often include interaction with objects or collaboration with other agents. One of the key challenges for reasoning in such situations is the lack of accurate models that hinders the effectiveness of planners. We present a system for online model adaptation that continuously validates and improves models while solving tasks with a belief space planner. We employ the well known online belief planner POMCP. Particles are used to represent hypotheses about the current state and about models of the world. They are sufficient to configure a simulator to provide transition and observation models. We propose an enhanced particle reinvigoration process that leverages prior experiences encoded in a recurrent neural network (RNN). The network is trained through interaction with a large variety of object and agent parametrizations. The RNN is combined with a mixture density network (MDN) to process the current history of observations in order to propose suitable particles and models parametrizations. The proposed method also ensures that newly generated particles are consistent with the current history. These enhancements to the particle reinvigoration process help alleviate problems arising from poor sampling quality in large state spaces and enable handling of dynamics with discontinuities. The proposed approach can be applied to a variety of domains depending on what uncertainty the decision maker needs to reason about. We evaluate the approach with experiments in several domains and compare against other state-of-the-art methods. Experiments are done in a collaborative multi-agent and a single agent object manipulation domain. The experiments are performed both in simulation and on a real robot. The framework handles reasoning with uncertain agent behaviors and with unknown object and environment parametrizations well. The results show good performance and indicate that the proposed approach can improve existing state-of-the-art methods.

A possibly interesting paper on the estimation and adaptation of EKF-SLAM to actual models of the system and the noise that I have been unable to read due to its painful syntax

Yingzhong Tian, Heru Suwoyo, Wenbin Wang, Dziki Mbemba, Long Li, An AEKF-SLAM Algorithm with Recursive Noise Statistic Based on MLE and EM, Journal of Intelligent & Robotic Systems (2020) 97:339–355, DOI: 10.1007/s10846-019-01044-8.

Extended Kalman Filter (EKF) has been popularly utilized for solving Simultaneous Localization and Mapping (SLAM)
problem. Essentially, it requires the accurate system model and known noise statistic. Nevertheless, this condition can
be satisfied in simulation case. Hence, EKF has to be enhanced when it is applied in the real-application. Mainly, this
improvement is known as adaptive-based approach. In many different cases, it is indicated by some manners of estimating
for either part or full noise statistic. This paper present a proposed method based on the adaptive-based solution used for
improving classical EKF namely An Adaptive Extended Kalman Filter. Initially, the classical EKF was improved based on
Maximum Likelihood Estimation (MLE) and Expectation-Maximization (EM) Creation. It aims to equips the conventional
EKF with ability of approximating noise statistic and its covariance matrices recursively. Moreover, EKF was modified and
improved to tune the estimated values given by MLE and EM creation. Besides that, the recursive noise statistic estimators
were also estimated based on the unbiased estimation. Although it results high quality solution but it is followed with some
risks of non-positive definite matrices of the process and measurement noise statistic covariances. Thus, an addition of
Innovation Covariance Estimation (ICE) was also utilized to depress this possibilities. The proposed method is applied for
solving SLAM problem of autonomous wheeled mobile robot. Henceforth, it is termed as AEKF-SLAM Algorithm. In order
to validate the effectiveness of proposed method, some different SLAM-Based algorithm were compared and analyzed.
The different simulation has been showing that the proposed method has better stability and accuracy compared to the
conventional filter in term of Root Mean Square Error (RMSE) of Estimated Map Coordinate (EMC) and Estimated Path
Coordinate (EPC).

Application of Deep RL to person following by a robot, reducing the training effort of the network by reusing simple state situations in many artificially generated states

Pang, L., Zhang, Y., Coleman, S. et al., Efficient Hybrid-Supervised Deep Reinforcement Learning for Person Following Robot, J Intell Robot Syst 97, 299–312 (2020), DOI: 10.1007/s10846-019-01030-0.

Traditional person following robots usually need hand-crafted features and a well-designed controller to follow the assigned person. Normally it is difficult to be applied in outdoor situations due to variability and complexity of the environment. In this paper, we propose an approach in which an agent is trained by hybrid-supervised deep reinforcement learning (DRL) to perform a person following task in end-to-end manner. The approach enables the robot to learn features autonomously from monocular images and to enhance performance via robot-environment interaction. Experiments show that the proposed approach is adaptive to complex situations with significant illumination variation, object occlusion, target disappearance, pose change, and pedestrian interference. In order to speed up the training process to ensure easy application of DRL to real-world robotic follower controls, we apply an integration method through which the agent receives prior knowledge from a supervised learning (SL) policy network and reinforces its performance with a value-based or policy-based (including actor-critic method) DRL model. We also utilize an efficient data collection approach for supervised learning in the context of person following. Experimental results not only verify the robustness of the proposed DRL-based person following robot system, but also indicate how easily the robot can learn from mistakes and improve performance.

A particular application of quick detection of changes in a signal: detecting changes of voltage regimes in the electric distribution network

D. Macii and D. Petri, Rapid Voltage Change Detection: Limits of the IEC Standard Approach and Possible Solutions, IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 2, pp. 382-392, Feb. 2020, DOI: 10.1109/TIM.2019.2903617.

Rapid voltage changes (RVCs) are power quality (PQ) events characterized by small and fast transitions between two steady-state root-mean-square (rms) voltage levels. RVCs occur quite often at the distribution level and are expected to be even more frequent in the future due to the increasing penetration of dynamic loads and renewable-based generators in the smart grid. Unlike other PQ events, RVCs are less critical, but also more difficult to detect than dips/sags and swells, due to their smaller voltage variations. Nevertheless, they can be harmful to generators’ control systems and electronic equipment in general. Moreover, they strongly affect flicker. The IEC Standard 61000-4-3:2015 clearly describes an algorithm for RVC detection. However, this approach is poorly characterized in the scientific literature. In fact, it suffers from some drawbacks. In this paper, some of them (e.g., rate-dependent detection limits and detection delays) are analyzed in depth. In addition, an alternative approach based on the estimation of the rate of change of rms voltage is proposed. Multiple simulation results show that the approach considered is more sensitive to noise, but also faster, especially when not so fast RVCs occur. Moreover, it allows measuring the rate of change of rms voltage, which is currently disregarded in the IEC Standard.

Estimating parameters of periodic signals that are sampled with just two levels (0/1) in magnitude

P. Carbone, J. Schoukens and A. Moschitta, Quick Estimation of Periodic Signal Parameters From 1-Bit Measurements, IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 2, pp. 339-353, Feb. 2020, DOI: 10.1109/TIM.2019.2902023.

Estimation of periodic signals, based on quantized data, is a topic of general interest in the area of instrumentation and measurement. Although several methods are available, new applications require low-power, low-complexity, and adequate estimation accuracy. In this paper, we consider the simplest possible quantization, that is, binary quantization, and describe a technique to estimate the parameters of a sampled periodic signal, using a fast algorithm. By neglecting the possibility that the sampling process is triggered by some signal-derived event, sampling is assumed to be asynchronous, that is, the ratio between the signal and the sampling periods is defined to be an irrational number. To preserve enough information at the quantizer output, additive Gaussian input noise is assumed as the information encoding mechanism. With respect to the published techniques addressing the same problem, the proposed approach does not rely on the numerical estimation of the maximum likelihood function but provides solutions that are very close to this estimate. At the same time, since the main estimator is based on matrix inversion, it proves to be less time-consuming than the numerical maximization of the likelihood function, especially when solving problems with a large number of parameters. The estimation procedure is described in detail and validated using both simulation and experimental results. The estimator performance limitations are also highlighted.