Monthly Archives: July 2023

You are browsing the site archives by month.

Safety in MDPs by measuring the probability of reaching dangerous states

Rafal Wisniewski, Luminita-Manuela Bujorianu, Safety of stochastic systems: An analytic and computational approach, . Automatica, Volume 133, 2021 DOI: 10.1016/j.automatica.2021.109839.

We refine the concept of stochastic reach avoidance for a general class of Markov processes introducing a threshold of p for the reaching probability. This new problem is called p-safety, and it aims to ensure that the given process reaches a forbidden set before leaving its ‘working’ state space with a probability of less than p. In the situation when an initial probability measure characterizes the initial states, a variant of p-safety is put forward. We call this form of safety weak p-safety. In this work, we characterize both p-safety and weak p-safety and show how to compute them. We employ semi-definite programming to compute p-safety and linear programming to compute weak p-safety. To get to this point, we use certificates of positivity of polynomials translated into the sum of squares and the Bernstein forms.

Solving the “self-recognition on a mirror” problem for robots

Arianna Pipitone, Antonio Chella, Robot passes the mirror test by inner speech, . Robotics and Autonomous Systems, Volume 144, 2021 DOI: 10.1016/j.robot.2021.103838.

The mirror test is a well-known task in Robotics. The existing strategies are based on kinesthetic-visual matching techniques and manipulate perceptual and motion data. The proposed work attempts to demonstrate that it is possible to implement a robust robotic self-recognition method by the inner speech, i.e. the self-dialogue that enables reasoning on symbolic information. The robot self-talks and conceptually reasons on the symbolic forms of signals, and infers if the robot it sees in the mirror is itself or not. The idea is supported by the existing literature in psychology, where the importance of inner speech in self-reflection and self-concept emergence for solving the mirror test was empirically demonstrated.

Learning rewards from diverse human sources

Bıyık E, Losey DP, Palan M, Landolfi NC, Shevchuk G, Sadigh D., Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences, . The International Journal of Robotics Research. 2022;41(1):45-67 DOI: 10.1177/02783649211041652.

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework..

Trying to reach general AI through just decision-making (rewards) instead of using a diversity of paradigms

avid Silver, Satinder Singh, Doina Precup, Richard S. Sutton, Reward is enough, . Artificial Intelligence, Volume 299, 2021 DOI: 10.1016/j.artint.2021.103535.

In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.


  • The computational and physical limitations of the agent to cope with a too complex world is the main reason to use learning instead of pre-built knowledge (evolution): it allows the agent to focus on acquiring skills for its own circumstances first, that are the most important for it.
  • Argument why classification (supervised learning) is less powerful and efficient than RL.
  • Same with multi-agent settings vs. one agent confronted with a single complex environment (containing other agents).

A MPC-based (non-POMDP) approach to sequential decision planning with partial observability in continuous time and space

Nishimura H, Schwager M., SACBP: Belief space planning for continuous-time dynamical systems via stochastic sequential action control, . The International Journal of Robotics Research. 2021;40(10-11):1167-1195 DOI: 10.1177/02783649211037697.

We propose a novel belief space planning technique for continuous dynamics by viewing the belief system as a hybrid dynamical system with time-driven switching. Our approach is based on the perturbation theory of differential equations and extends sequential action control to stochastic dynamics. The resulting algorithm, which we name SACBP, does not require discretization of spaces or time and synthesizes control signals in near real-time. SACBP is an anytime algorithm that can handle general parametric Bayesian filters under certain assumptions. We demonstrate the effectiveness of our approach in an active sensing scenario and a model-based Bayesian reinforcement learning problem. In these challenging problems, we show that the algorithm significantly outperforms other existing solution techniques including approximate dynamic programming and local trajectory optimization.

A nice survey on active learning, in particular for robotics

Annalisa T. Taylor, Thomas A. Berrueta, Todd D. Murphey, Active learning in robotics: A review of control principles, . Mechatronics, Volume 77, 2021 DOI: 10.1016/j.mechatronics.2021.102576.

Active learning is a decision-making process. In both abstract and physical settings, active learning demands
both analysis and action. This is a review of active learning in robotics, focusing on methods amenable to
the demands of embodied learning systems. Robots must be able to learn efficiently and flexibly through
continuous online deployment. This poses a distinct set of control-oriented challenges??one must choose
suitable measures as objectives, synthesize real-time control, and produce analyses that guarantee performance
and safety with limited knowledge of the environment or robot itself. In this work, we survey the fundamental
components of robotic active learning systems. We discuss classes of learning tasks that robots typically
encounter, measures with which they gauge the information content of observations, and algorithms for
generating action plans. Moreover, we provide a variety of examples ?? from environmental mapping to
nonparametric shape estimation ?? that highlight the qualitative differences between learning tasks, information
measures, and control techniques. We conclude with a discussion of control-oriented open challenges, including
safety-constrained learning and distributed learning.


  • RL can be considered one of the areas within computational learning theory, that usually ignore physical embodiment aspects of the learning agent. However, that is only so when RL explores through decision-making, not when it explores randomly, without much purpose of enhancing learning itself through its actions.
  • RL caveats (particularly Deep RL): their large data requirements, lack of generalizability between tasks, as well as their inability to learn incrementally and guarantee
  • Bayesian filters can be seen as learner systems: they learn parameters of objects (pose) or environments (maps) aided by some models. However, they are more active learners when they use the robot actions to improve that parameter learning.
  • Gaussian processes can be effective in learning those models when no parameterical form is available or much first-principle knowledge, for instance, when the robot has to learn the model only observing a small part of the environment (local).
  • Entropy/information, Fisher’s information (conditional information) and ergodicity are the main ways of measuring information gain in active learning.

Example of non-NN approach that produces better results in classification tasks than NNs

Jiang, Zhiying and Yang, Matthew and Tsirlin, Mikhail and Tang, Raphael and Dai, Yiqin and Lin, Jimmy, Low-Resource Text Classification: A Parameter-Free Classification Method with Compressors, . Findings of the Association for Computational Linguistics: ACL 2023 URL.

Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize, and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that??s easy, lightweight, and universal in text classification: a combination of a simple compressor like gzip with a k-nearest-neighbor classifier. Without any training parameters, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distribution datasets.It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also excels in the few-shot setting, where labeled data are too scarce to train DNNs effectively.

Dropping laser scans for SLAM when they contribute with no relevant information

Kirill Krinkin, Anton Filatov, Correlation filter of 2D laser scans for indoor environment, . Robotics and Autonomous Systems, Volume 142, 2021 DOI: 10.1016/j.robot.2021.103809.

Modern laser SLAM (simultaneous localization and mapping) and structure from motion algorithms face the problem of processing redundant data. Even if a sensor does not move, it still continues to capture scans that should be processed. This paper presents the novel filter that allows dropping 2D scans that bring no new information to the system. Experiments on MIT and TUM datasets show that it is possible to drop more than half of the scans. Moreover the paper describes the formulas that enable filter adaptation to a particular robot with known speed and characteristics of lidar. In addition, the indoor corridor detector is introduced that also can be applied to any specific shape of a corridor and sensor.

The Evolutionary History of Brains for Numbers

Andreas Nieder, The Evolutionary History of Brains for Numbers, . Trends in Cognitive Sciences, Volume 25, Issue 7, 2021, Pages 608-621 DOI: 10.1016/j.tics.2021.03.012.

Humans and other animals share a number sense’, an intuitive understanding of countable quantities. Having evolved independent from one another for hundreds of millions of years, the brains of these diverse species, including monkeys, crows, zebrafishes, bees, and squids, differ radically. However, in all vertebrates investigated, the pallium of the telencephalon has been implicated in number processing. This suggests that properties of the telencephalon make it ideally suited to host number representations that evolved by convergent evolution as a result of common selection pressures. In addition, promising candidate regions in the brains of invertebrates, such as insects, spiders, and cephalopods, can be identified, opening the possibility of even deeper commonalities for number sense.

Building POMDPs under logical constraints

Bo Wu, Xiaobin Zhang, Hai Lin, Supervisor synthesis of POMDP via automata learning, . Automatica, Volume 129, 2021 DOI: 10.1016/j.automatica.2021.109654.

Partially observable Markov decision process (POMDP) is a comprehensive modeling framework that captures uncertainties from sensing noises, actuation errors, and environments. Traditional POMDP planning finds an optimal policy for reward maximization. However, for safety-critical applications, it is often necessary to guarantee system performance described by high-level temporal logic specifications. Hence, we are motivated to develop a supervisor synthesis framework for POMDP with respect to given formal specifications. We propose an iterative learning-based algorithm, which can learn a permissive policy in the form of a deterministic finite automaton. A human–robot collaboration case study validates the proposed algorithm.