On rewards and values when the RL theory is applied to human brain

Keno Juechems, Christopher Summerfield, Where Does Value Come From?. Trends in Cognitive Sciences, Volume 23, Issue 10, 2019, Pages 836-850, DOI: 10.1016/j.tics.2019.07.012.

The computational framework of reinforcement learning (RL) has allowed us to both understand biological brains and build successful artificial agents. However, in this opinion, we highlight open challenges for RL as a model of animal behaviour in natural environments. We ask how the external reward function is designed for biological systems, and how we can account for the context sensitivity of valuation. We summarise both old and new theories proposing that animals track current and desired internal states and seek to minimise the distance to a goal across multiple value dimensions. We suggest that this framework readily accounts for canonical phenomena observed in the fields of psychology, behavioural ecology, and economics, and recent findings from brain-imaging studies of value-guided decision-making.

On the integer numbers in the brain

Susan Carey, David Barner, Ontogenetic Origins of Human Integer Representations. Trends in Cognitive Sciences, Volume 23, Issue 10, 2019, Pages 823-835, DOI: 10.1016/j.tics.2019.07.004.

Do children learn number words by associating them with perceptual magnitudes? Recent studies argue that approximate numerical magnitudes play a foundational role in the development of integer concepts. Against this, we argue that approximate number representations fail both empirically and in principle to provide the content required of integer concepts. Instead, we suggest that children\u2019s understanding of integer concepts proceeds in two phases. In the first phase, children learn small exact number word meanings by associating words with small sets. In the second phase, children learn the meanings of larger number words by mastering the logic of exact counting algorithms, which implement the successor function and Hume\u2019s principle (that one-to-one correspondence guarantees exact equality). In neither phase do approximate number representations play a foundational role.

On the role and limitations of motor internal simulation as a way of predicting the effects of a future action in the brain

Myrthel Dogge, Ruud Custers, Henk Aarts, Moving Forward: On the Limits of Motor-Based Forward Models. Trends in Cognitive Sciences, Volume 23, Issue 9, 2019, Pages 743-753, DOI: 10.1016/j.tics.2019.06.008.

The human ability to anticipate the consequences that result from action is an essential building block for cognitive, emotional, and social functioning. A dominant view is that this faculty is based on motor predictions, in which a forward model uses a copy of the motor command to predict imminent sensory action-consequences. Although this account was originally conceived to explain the processing of action-outcomes that are tightly coupled to bodily movements, it has been increasingly extrapolated to effects beyond the body. Here, we critically evaluate this generalization and argue that, although there is ample evidence for the role of predictions in the processing of environment-related action-outcomes, there is hitherto little reason to assume that these predictions result from motor-based forward models.

Numerosity in animals (insects)

Martin Giurfa, An Insect\u2019s Sense of Number. Trends in Cognitive Sciences, Volume 23, Issue 9, 2019, Pages 720-722, DOI: 10.1016/j.tics.2019.06.010.

Recent studies revealed numerosity judgments in bees, which include the concept of zero, subtraction and addition, and matching symbols to numbers. Despite their distant origins, bees and vertebrates share similarities in their numeric competences, thus suggesting that numerosity is evolutionary conserved and can be implemented in miniature brains without neocortex.

Interesting summary of SLAM and its computational cost approaches

Joan Vallvé, Joan Solà, Juan Andrade-Cetto, Pose-graph SLAM sparsification using factor descent. Robotics and Autonomous Systems, Volume 119, 2019, Pages 108-118, DOI: 10.1016/j.robot.2019.06.004.

Since state of the art simultaneous localization and mapping (SLAM) algorithms are not constant time, it is often necessary to reduce the problem size while keeping as much of the original graph\u2019s information content. In graph SLAM, the problem is reduced by removing nodes and rearranging factors. This is normally faced locally: after selecting a node to be removed, its Markov blanket sub-graph is isolated, the node is marginalized and its dense result is sparsified. The aim of sparsification is to compute an approximation of the dense and non-relinearizable result of node marginalization with a new set of factors. Sparsification consists on two processes: building the topology of new factors, and finding the optimal parameters that best approximate the original dense distribution. This best approximation can be obtained through minimization of the Kullback\u2013Liebler divergence between the two distributions. Using simple topologies such as Chow\u2013Liu trees, there is a closed form for the optimal solution. However, a tree is oftentimes too sparse and produces bad distribution approximations. On the contrary, more populated topologies require nonlinear iterative optimization. In the present paper, the particularities of pose-graph SLAM are exploited for designing new informative topologies and for applying the novel factor descent iterative optimization method for sparsification. Several experiments are provided comparing the proposed topology methods and factor descent optimization with state-of-the-art methods in synthetic and real datasets with regards to approximation accuracy and computational cost.

Synthesizing a supervisor (a Finite State Machine) instead of finding a standard policy in MDPs, applied to multi-agent systems

B. Wu, X. Zhang and H. Lin Permissive Supervisor Synthesis for Markov Decision Processes Through Learning. IEEE Transactions on Automatic Control, vol. 64, no. 8, pp. 3332-3338, Aug. 2019. DOI: 10.1109/TAC.2018.2879505.

This paper considers the permissive supervisor synthesis for probabilistic systems modeled as Markov Decision Processes (MDP). Such systems are prevalent in power grids, transportation networks, communication networks, and robotics. We propose a novel supervisor synthesis framework using automata learning and compositional model checking to generate the permissive local supervisors in a distributed manner. With the recent advances in assume-guarantee reasoning verification for MDPs, constructing the composed system can be avoided to alleviate the state space explosion. Our framework learns the supervisors iteratively using counterexamples from the verification and is guaranteed to terminate in finite steps and to be correct.

Interesting account of the “computation/communication” binom in distributed computing, particularly in distributed optimization

A. S. Berahas, R. Bollapragada, N. S. Keskar and E. Wei, Balancing Communication and Computation in Distributed Optimization. IEEE Transactions on Automatic Control, vol. 64, no. 8, pp. 3141-3155, Aug. 2019 DOI: 10.1109/TAC.2018.2880407.

Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains including machine learning, robotics, and sensor networks. A distributed optimization method typically consists of two key components: communication and computation. More specifically, at every iteration (or every several iterations) of a distributed algorithm, each node in the network requires some form of information exchange with its neighboring nodes (communication) and the computation step related to a (sub)-gradient (computation). The standard way of judging an algorithm via only the number of iterations overlooks the complexity associated with each iteration. Moreover, various applications deploying distributed methods may prefer a different composition of communication and computation. Motivated by this discrepancy, in this paper, we propose an adaptive cost framework that adjusts the cost measure depending on the features of various applications. We present a flexible algorithmic framework, where communication and computation steps are explicitly decomposed to enable algorithm customization for various applications. We apply this framework to the well-known distributed gradient descent (DGD) method, and show that the resulting customized algorithms, which we call DGDt, NEAR-DGDt, and NEAR-DGD+, compare favorably to their base algorithms, both theoretically and empirically. The proposed NEAR-DGD+ algorithm is an exact first-order method where the communication and computation steps are nested, and when the number of communication steps is adaptively increased, the method converges to the optimal solution. We test the performance and illustrate the flexibility of the methods, as well as practical variants, on quadratic functions and classification problems that arise in machine learning, in terms of iterations, gradient evaluations, communications, and the proposed cost framework.

Interesting account of robots that have non-rich sensors but have to do mapping and other modern stuff

Ma, F., Carlone, L., Ayaz, U., & Karaman, S. Sparse depth sensing for resource-constrained robots. The International Journal of Robotics Research, 38(8), 935 DOI: 10.1177/0278364919850296.

We consider the case in which a robot has to navigate in an unknown environment, but does not have enough on-board power or payload to carry a traditional depth sensor (e.g., a 3D lidar) and thus can only acquire a few (point-wise) depth measurements. We address the following question: is it possible to reconstruct the geometry of an unknown environment using sparse and incomplete depth measurements? Reconstruction from incomplete data is not possible in general, but when the robot operates in man-made environments, the depth exhibits some regularity (e.g., many planar surfaces with only a few edges); we leverage this regularity to infer depth from a small number of measurements. Our first contribution is a formulation of the depth reconstruction problem that bridges robot perception with the compressive sensing literature in signal processing. The second contribution includes a set of formal results that ascertain the exactness and stability of the depth reconstruction in 2D and 3D problems, and completely characterize the geometry of the profiles that we can reconstruct. Our third contribution is a set of practical algorithms for depth reconstruction: our formulation directly translates into algorithms for depth estimation based on convex programming. In real-world problems, these convex programs are very large and general-purpose solvers are relatively slow. For this reason, we discuss ad-hoc solvers that enable fast depth reconstruction in real problems. The last contribution is an extensive experimental evaluation in 2D and 3D problems, including Monte Carlo runs on simulated instances and testing on multiple real datasets. Empirical results confirm that the proposed approach ensures accurate depth reconstruction, outperforms interpolation-based strategies, and performs well even when the assumption of a structured environment is violated.

Enforcing safe behaviour on critical systems that use machine learning through robust control and bayesian inference

J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula and C. J. Tomlin, A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems, IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2737-2752 DOI: 10.1109/TAC.2018.2876389.

The proven efficacy of learning-based control schemes strongly motivates their application to robotic systems operating in the physical world. However, guaranteeing correct operation during the learning process is currently an unresolved issue, which is of vital importance in safety-critical systems. We propose a general safety framework based on Hamilton–Jacobi reachability methods that can work in conjunction with an arbitrary learning algorithm. The method exploits approximate knowledge of the system dynamics to guarantee constraint satisfaction while minimally interfering with the learning process. We further introduce a Bayesian mechanism that refines the safety analysis as the system acquires new evidence, reducing initial conservativeness when appropriate while strengthening guarantees through real-time validation. The result is a least-restrictive, safety-preserving control law that intervenes only when the computed safety guarantees require it, or confidence in the computed guarantees decays in light of new observations. We prove theoretical safety guarantees combining probabilistic and worst-case analysis and demonstrate the proposed framework experimentally on a quadrotor vehicle. Even though safety analysis is based on a simple point-mass model, the quadrotor successfully arrives at a suitable controller by policy-gradient reinforcement learning without ever crashing, and safely retracts away from a strong external disturbance introduced during flight.

Use of Markov Decision Processes to select tasks in a service mobile robots

Lacerda, B., Faruq, F., Parker, D., & Hawes, N., Probabilistic planning with formal performance guarantees for mobile service robots, The International Journal of Robotics Research, DOI: 10.1177/0278364919856695.

We present a framework for mobile service robot task planning and execution, based on the use of probabilistic verification techniques for the generation of optimal policies with attached formal performance guarantees. Our approach is based on a Markov decision process model of the robot in its environment, encompassing a topological map where nodes represent relevant locations in the environment, and a range of tasks that can be executed in different locations. The navigation in the topological map is modeled stochastically for a specific time of day. This is done by using spatio-temporal models that provide, for a given time of day, the probability of successfully navigating between two topological nodes, and the expected time to do so. We then present a methodology to generate cost optimal policies for tasks specified in co-safe linear temporal logic. Our key contribution is to address scenarios in which the task may not be achievable with probability one. We introduce a task progression function and present an approach to generate policies that are formally guaranteed to, in decreasing order of priority: maximize the probability of finishing the task; maximize progress towards completion, if this is not possible; and minimize the expected time or cost required. We illustrate and evaluate our approach with a scalability evaluation in a simulated scenario, and report on its implementation in a robot performing service tasks in an office environment for long periods of time.