Juan-Antonio Fernández-Madrigal | kipr

Building explanations for AI plans by modifying user’s models to make those plans optimal within them

September 15, 2023 10:11 , Juan-Antonio Fernández-Madrigal

Sarath Sreedharan, Tathagata Chakraborti, Subbarao Kambhampati, Foundations of explanations as model reconciliation, Artificial Intelligence, Volume 301,
2021, DOI: 10.1016/j.artint.2021.103558.

Past work on plan explanations primarily involved the AI system explaining the correctness of its plan and the rationale for its decision in terms of its own model. Such soliloquy is wholly inadequate in most realistic scenarios where users have domain and task models that differ from that used by the AI system. We posit that the explanations are best studied in light of these differing models. In particular, we show how explanation can be seen as a \u201cmodel reconciliation problem\u201d (MRP), where the AI system in effect suggests changes to the user’s mental model so as to make its plan be optimal with respect to that changed user model. We will study the properties of such explanations, present algorithms for automatically computing them, discuss relevant extensions to the basic framework, and evaluate the performance of the proposed algorithms both empirically and through controlled user studies.

Posted in: Artificial Intelligence , Tagged: Neural Networks Explanation, Task planning

Using a physical simulator for sampled rollouts in stochastic optimal control

September 15, 2023 09:47 , Juan-Antonio Fernández-Madrigal

Carius J, Ranftl R, Farshidian F, Hutter M. Constrained stochastic optimal control with learned importance sampling: A path integral approach, The International Journal of Robotics Research. 2022;41(2):189-209, DOI: 10.1177/02783649211047890.

Modern robotic systems are expected to operate robustly in partially unknown environments. This article proposes an algorithm capable of controlling a wide range of high-dimensional robotic systems in such challenging scenarios. Our method is based on the path integral formulation of stochastic optimal control, which we extend with constraint-handling capabilities. Under our control law, the optimal input is inferred from a set of stochastic rollouts of the system dynamics. These rollouts are simulated by a physics engine, placing minimal restrictions on the types of systems and environments that can be modeled. Although sampling-based algorithms are typically not suitable for online control, we demonstrate in this work how importance sampling and constraints can be used to effectively curb the sampling complexity and enable real-time control applications. Furthermore, the path integral framework provides a natural way of incorporating existing control architectures as ancillary controllers for shaping the sampling distribution. Our results reveal that even in cases where the ancillary controller would fail, our stochastic control algorithm provides an additional safety and robustness layer. Moreover, in the absence of an existing ancillary controller, our method can be used to train a parametrized importance sampling policy using data from the stochastic rollouts. The algorithm may thereby bootstrap itself by learning an importance sampling policy offline and then refining it to unseen environments during online control. We validate our results on three robotic systems, including hardware experiments on a quadrupedal robot.

Posted in: Applications of reinforcement learning to control engineering , Tagged: Reinforcement learning, Simulation, Stochastic optimal control

On how physical movements shape the perception of time

September 15, 2023 09:28 , Juan-Antonio Fernández-Madrigal

Rose De Kock, Keri Anne Gladhill, Minaz Numa Ali, Wilsaan Mychal Joiner, Martin Wiener, How movements shape the perception of time, Trends in Cognitive Sciences, Volume 25, Issue 11, 2021, Pages 950-963 DOI: 10.1016/j.tics.2021.08.002.

In order to keep up with a changing environment, mobile organisms must be capable of deciding both where and when to move. This precision necessitates a strong sense of time, as otherwise we would fail in many of our movement goals. Yet, despite this intrinsic link, only recently have researchers begun to understand how these two features interact. Primarily, two effects have been observed: movements can bias time estimates, but they can also make them more precise. Here we review this literature and propose that both effects can be explained by a Bayesian cue combination framework, in which movement itself affords the most precise representation of time, which can influence perception in either feedforward or active sensing modes.

Posted in: Psycho-physiological bases of engineering , Tagged: Psychological time, Time in the brain

More efficient pose-graph optimization by using the cycles (loop closures) in the graph as a basis, and a nice summary of conventional pose-graph optimization

September 15, 2023 09:21 , Juan-Antonio Fernández-Madrigal

F. Bai, T. Vidal-Calleja and G. Grisetti, Sparse Pose Graph Optimization in Cycle Space, .IEEE Transactions on Robotics, vol. 37, no. 5, pp. 1381-1400, Oct 2021 DOI: 10.1109/TRO.2021.3050328.

The state-of-the-art modern pose-graph optimization (PGO) systems are vertex based. In this context, the number of variables might be high, albeit the number of cycles in the graph (loop closures) is relatively low. For sparse problems particularly, the cycle space has a significantly smaller dimension than the number of vertices. By exploiting this observation, in this article, we propose an alternative solution to PGO that directly exploits the cycle space. We characterize the topology of the graph as a cycle matrix, and reparameterize the problem using relative poses, which are further constrained by a cycle basis of the graph. We show that by using a minimum cycle basis, the cycle-based approach has superior convergence properties against its vertex-based counterpart, in terms of convergence speed and convergence to the global minimum. For sparse graphs, our cycle-based approach is also more time efficient than the vertex-based. As an additional contribution of this work, we present an effective algorithm to compute the minimum cycle basis. Albeit known in computer science, we believe that this algorithm is not familiar to the robotics community. All the claims are validated by experiments on both standard benchmarks and simulated datasets. To foster the reproduction of the results, we provide a complete open-source C++ implementation 1 of our approach.

Posted in: Mobile robot SLAM , Tagged: Graph-based SLAM, Pose-graph optimization

Safety in MDPs by measuring the probability of reaching dangerous states

July 21, 2023 09:54 , Juan-Antonio Fernández-Madrigal

Rafal Wisniewski, Luminita-Manuela Bujorianu, Safety of stochastic systems: An analytic and computational approach, . Automatica, Volume 133, 2021 DOI: 10.1016/j.automatica.2021.109839.

We refine the concept of stochastic reach avoidance for a general class of Markov processes introducing a threshold of p for the reaching probability. This new problem is called p-safety, and it aims to ensure that the given process reaches a forbidden set before leaving its ‘working’ state space with a probability of less than p. In the situation when an initial probability measure characterizes the initial states, a variant of p-safety is put forward. We call this form of safety weak p-safety. In this work, we characterize both p-safety and weak p-safety and show how to compute them. We employ semi-definite programming to compute p-safety and linear programming to compute weak p-safety. To get to this point, we use certificates of positivity of polynomials translated into the sum of squares and the Bernstein forms.

Posted in: Artificial Intelligence , Tagged: MDPs, Safety

Solving the “self-recognition on a mirror” problem for robots

July 21, 2023 09:49 , Juan-Antonio Fernández-Madrigal

Arianna Pipitone, Antonio Chella, Robot passes the mirror test by inner speech, . Robotics and Autonomous Systems, Volume 144, 2021 DOI: 10.1016/j.robot.2021.103838.

The mirror test is a well-known task in Robotics. The existing strategies are based on kinesthetic-visual matching techniques and manipulate perceptual and motion data. The proposed work attempts to demonstrate that it is possible to implement a robust robotic self-recognition method by the inner speech, i.e. the self-dialogue that enables reasoning on symbolic information. The robot self-talks and conceptually reasons on the symbolic forms of signals, and infers if the robot it sees in the mirror is itself or not. The idea is supported by the existing literature in psychology, where the importance of inner speech in self-reflection and self-concept emergence for solving the mirror test was empirically demonstrated.

Posted in: Psycho-physiological bases of engineering , Tagged: Mirror test

Learning rewards from diverse human sources

July 21, 2023 09:46 , Juan-Antonio Fernández-Madrigal

Bıyık E, Losey DP, Palan M, Landolfi NC, Shevchuk G, Sadigh D., Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences, . The International Journal of Robotics Research. 2022;41(1):45-67 DOI: 10.1177/02783649211041652.

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework..

Posted in: Applications of reinforcement learning to robots , Tagged: Reinforcement learning, Reward learning

Trying to reach general AI through just decision-making (rewards) instead of using a diversity of paradigms

July 21, 2023 08:37 , Juan-Antonio Fernández-Madrigal

avid Silver, Satinder Singh, Doina Precup, Richard S. Sutton, Reward is enough, . Artificial Intelligence, Volume 299, 2021 DOI: 10.1016/j.artint.2021.103535.

In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.

NOTES:

The computational and physical limitations of the agent to cope with a too complex world is the main reason to use learning instead of pre-built knowledge (evolution): it allows the agent to focus on acquiring skills for its own circumstances first, that are the most important for it.
Argument why classification (supervised learning) is less powerful and efficient than RL.
Same with multi-agent settings vs. one agent confronted with a single complex environment (containing other agents).

Posted in: Psycho-physiological bases of engineering , Tagged: General AI, Mind models, Reinforcement learning

A MPC-based (non-POMDP) approach to sequential decision planning with partial observability in continuous time and space

July 17, 2023 10:07 , Juan-Antonio Fernández-Madrigal

Nishimura H, Schwager M., SACBP: Belief space planning for continuous-time dynamical systems via stochastic sequential action control, . The International Journal of Robotics Research. 2021;40(10-11):1167-1195 DOI: 10.1177/02783649211037697.

We propose a novel belief space planning technique for continuous dynamics by viewing the belief system as a hybrid dynamical system with time-driven switching. Our approach is based on the perturbation theory of differential equations and extends sequential action control to stochastic dynamics. The resulting algorithm, which we name SACBP, does not require discretization of spaces or time and synthesizes control signals in near real-time. SACBP is an anytime algorithm that can handle general parametric Bayesian filters under certain assumptions. We demonstrate the effectiveness of our approach in an active sensing scenario and a model-based Bayesian reinforcement learning problem. In these challenging problems, we show that the algorithm significantly outperforms other existing solution techniques including approximate dynamic programming and local trajectory optimization.

Posted in: Robot motion planning , Tagged: Model Predictive Control, POMDPs

A nice survey on active learning, in particular for robotics

July 17, 2023 09:10 , Juan-Antonio Fernández-Madrigal

Annalisa T. Taylor, Thomas A. Berrueta, Todd D. Murphey, Active learning in robotics: A review of control principles, . Mechatronics, Volume 77, 2021 DOI: 10.1016/j.mechatronics.2021.102576.

Active learning is a decision-making process. In both abstract and physical settings, active learning demands
both analysis and action. This is a review of active learning in robotics, focusing on methods amenable to
the demands of embodied learning systems. Robots must be able to learn efficiently and flexibly through
continuous online deployment. This poses a distinct set of control-oriented challenges??one must choose
suitable measures as objectives, synthesize real-time control, and produce analyses that guarantee performance
and safety with limited knowledge of the environment or robot itself. In this work, we survey the fundamental
components of robotic active learning systems. We discuss classes of learning tasks that robots typically
encounter, measures with which they gauge the information content of observations, and algorithms for
generating action plans. Moreover, we provide a variety of examples ?? from environmental mapping to
nonparametric shape estimation ?? that highlight the qualitative differences between learning tasks, information
measures, and control techniques. We conclude with a discussion of control-oriented open challenges, including
safety-constrained learning and distributed learning.

NOTES:

RL can be considered one of the areas within computational learning theory, that usually ignore physical embodiment aspects of the learning agent. However, that is only so when RL explores through decision-making, not when it explores randomly, without much purpose of enhancing learning itself through its actions.
RL caveats (particularly Deep RL): their large data requirements, lack of generalizability between tasks, as well as their inability to learn incrementally and guarantee
safety.
Bayesian filters can be seen as learner systems: they learn parameters of objects (pose) or environments (maps) aided by some models. However, they are more active learners when they use the robot actions to improve that parameter learning.
Gaussian processes can be effective in learning those models when no parameterical form is available or much first-principle knowledge, for instance, when the robot has to learn the model only observing a small part of the environment (local).
Entropy/information, Fisher’s information (conditional information) and ergodicity are the main ways of measuring information gain in active learning.

Posted in: Psycho-physiological bases of engineering , Tagged: Active learning, Reinforcement learning

« Previous 1 … 21 22 23 24 25 … 77 Next »

Author Archives: Juan-antonio Fernández-madrigal

Building explanations for AI plans by modifying user’s models to make those plans optimal within them

Sarath Sreedharan, Tathagata Chakraborti, Subbarao Kambhampati, Foundations of explanations as model reconciliation, Artificial Intelligence, Volume 301,
2021, DOI: 10.1016/j.artint.2021.103558.

Using a physical simulator for sampled rollouts in stochastic optimal control

Carius J, Ranftl R, Farshidian F, Hutter M. Constrained stochastic optimal control with learned importance sampling: A path integral approach, The International Journal of Robotics Research. 2022;41(2):189-209, DOI: 10.1177/02783649211047890.

On how physical movements shape the perception of time

Rose De Kock, Keri Anne Gladhill, Minaz Numa Ali, Wilsaan Mychal Joiner, Martin Wiener, How movements shape the perception of time, Trends in Cognitive Sciences, Volume 25, Issue 11, 2021, Pages 950-963 DOI: 10.1016/j.tics.2021.08.002.

More efficient pose-graph optimization by using the cycles (loop closures) in the graph as a basis, and a nice summary of conventional pose-graph optimization

F. Bai, T. Vidal-Calleja and G. Grisetti, Sparse Pose Graph Optimization in Cycle Space, .IEEE Transactions on Robotics, vol. 37, no. 5, pp. 1381-1400, Oct 2021 DOI: 10.1109/TRO.2021.3050328.

Safety in MDPs by measuring the probability of reaching dangerous states

Rafal Wisniewski, Luminita-Manuela Bujorianu, Safety of stochastic systems: An analytic and computational approach, . Automatica, Volume 133, 2021 DOI: 10.1016/j.automatica.2021.109839.

Solving the “self-recognition on a mirror” problem for robots

Arianna Pipitone, Antonio Chella, Robot passes the mirror test by inner speech, . Robotics and Autonomous Systems, Volume 144, 2021 DOI: 10.1016/j.robot.2021.103838.

Learning rewards from diverse human sources

Bıyık E, Losey DP, Palan M, Landolfi NC, Shevchuk G, Sadigh D., Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences, . The International Journal of Robotics Research. 2022;41(1):45-67 DOI: 10.1177/02783649211041652.

Trying to reach general AI through just decision-making (rewards) instead of using a diversity of paradigms

avid Silver, Satinder Singh, Doina Precup, Richard S. Sutton, Reward is enough, . Artificial Intelligence, Volume 299, 2021 DOI: 10.1016/j.artint.2021.103535.

A MPC-based (non-POMDP) approach to sequential decision planning with partial observability in continuous time and space

Nishimura H, Schwager M., SACBP: Belief space planning for continuous-time dynamical systems via stochastic sequential action control, . The International Journal of Robotics Research. 2021;40(10-11):1167-1195 DOI: 10.1177/02783649211037697.

A nice survey on active learning, in particular for robotics

Annalisa T. Taylor, Thomas A. Berrueta, Todd D. Murphey, Active learning in robotics: A review of control principles, . Mechatronics, Volume 77, 2021 DOI: 10.1016/j.mechatronics.2021.102576.

Post Navigation

Fields, areas and lines of research

Archives

Author Archives: Juan-antonio Fernández-madrigal

Sarath Sreedharan, Tathagata Chakraborti, Subbarao Kambhampati, Foundations of explanations as model reconciliation, Artificial Intelligence, Volume 301, 2021, DOI: 10.1016/j.artint.2021.103558.

Carius J, Ranftl R, Farshidian F, Hutter M. Constrained stochastic optimal control with learned importance sampling: A path integral approach, The International Journal of Robotics Research. 2022;41(2):189-209, DOI: 10.1177/02783649211047890.

Rose De Kock, Keri Anne Gladhill, Minaz Numa Ali, Wilsaan Mychal Joiner, Martin Wiener, How movements shape the perception of time, Trends in Cognitive Sciences, Volume 25, Issue 11, 2021, Pages 950-963 DOI: 10.1016/j.tics.2021.08.002.

F. Bai, T. Vidal-Calleja and G. Grisetti, Sparse Pose Graph Optimization in Cycle Space, .IEEE Transactions on Robotics, vol. 37, no. 5, pp. 1381-1400, Oct 2021 DOI: 10.1109/TRO.2021.3050328.

Rafal Wisniewski, Luminita-Manuela Bujorianu, Safety of stochastic systems: An analytic and computational approach, . Automatica, Volume 133, 2021 DOI: 10.1016/j.automatica.2021.109839.

Arianna Pipitone, Antonio Chella, Robot passes the mirror test by inner speech, . Robotics and Autonomous Systems, Volume 144, 2021 DOI: 10.1016/j.robot.2021.103838.

Bıyık E, Losey DP, Palan M, Landolfi NC, Shevchuk G, Sadigh D., Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences, . The International Journal of Robotics Research. 2022;41(1):45-67 DOI: 10.1177/02783649211041652.

avid Silver, Satinder Singh, Doina Precup, Richard S. Sutton, Reward is enough, . Artificial Intelligence, Volume 299, 2021 DOI: 10.1016/j.artint.2021.103535.

Nishimura H, Schwager M., SACBP: Belief space planning for continuous-time dynamical systems via stochastic sequential action control, . The International Journal of Robotics Research. 2021;40(10-11):1167-1195 DOI: 10.1177/02783649211037697.

Annalisa T. Taylor, Thomas A. Berrueta, Todd D. Murphey, Active learning in robotics: A review of control principles, . Mechatronics, Volume 77, 2021 DOI: 10.1016/j.mechatronics.2021.102576.

Post Navigation

Fields, areas and lines of research

Transversal topics, methods and tools

Archives

Sarath Sreedharan, Tathagata Chakraborti, Subbarao Kambhampati, Foundations of explanations as model reconciliation, Artificial Intelligence, Volume 301,
2021, DOI: 10.1016/j.artint.2021.103558.