Category Archives: Artificial Intelligence

Finding the policy that generalizes the best in a sample of possible real scenarios by leveraging PAC-Bayes

Majumdar A, Farid A, Sonar A., PAC-Bayes control: learning policies that provably generalize to novel environments. The International Journal of Robotics Research. 2021;40(2-3):574-593 DOI: 10.1177/0278364920959444.

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the probably approximately correct (PAC)-Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (relative entropy programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural network-based grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural network-based policies.

Extracting video summaries from RL processes to explain and understand them

Pedro Sequeira, Melinda Gervasio, Interestingness elements for explainable reinforcement learning: Understanding agents’ capabilities and limitations. Artificial Intelligence, Volume 288, 2020 DOI: 10.1016/j.artint.2020.103367.

We propose an explainable reinforcement learning (XRL) framework that analyzes an agent’s history of interaction with the environment to extract interestingness elements that help explain its behavior. The framework relies on data readily available from standard RL algorithms, augmented with data that can easily be collected by the agent while learning. We describe how to create visual summaries of an agent’s behavior in the form of short video-clips highlighting key interaction moments, based on the proposed elements. We also report on a user study where we evaluated the ability of humans to correctly perceive the aptitude of agents with different characteristics, including their capabilities and limitations, given visual summaries automatically generated by our framework. The results show that the diversity of aspects captured by the different interestingness elements is crucial to help humans correctly understand an agent’s strengths and limitations in performing a task, and determine when it might need adjustments to improve its performance.

Expressing POMDPs policies through Knowledge-Based programs

Bruno Zanuttini, Jérôme Lang, Abdallah Saffidine, François Schwarzentruber Knowledge-based programs as succinct policies for partially observable domains. Artificial Intelligence, Volume 288, 2020 DOI: 10.1016/j.artint.2020.103365.

We suggest to express policies for contingent planning by knowledge-based programs (KBPs). KBPs, introduced by Fagin et al. (1995) [32], are high-level protocols describing the actions that the agent should perform as a function of their current knowledge: branching conditions are epistemic formulas that are interpretable by the agent. The main aim of our paper is to show that KBPs can be seen as a succinct language for expressing policies in single-agent contingent planning. KBP are conceptually very close to languages used for expressing policies in the partially observable planning literature: like them, they have conditional and looping structures, with actions as atomic programs and Boolean formulas on beliefs for choosing the execution path. Now, the specificity of KBPs is that branching conditions refer to the belief state and not to the observations. Because of their structural proximity, KBPs and standard languages for representing policies have the same power of expressivity: every standard policy can be expressed as a KBP, and every KBP can be “unfolded” into a standard policy. However, KBPs are more succinct, more readable, and more explainable than standard policies. On the other hand, they require more online computation time, but we show that this is an unavoidable tradeoff. We study knowledge-based programs along four criteria: expressivity, succinctness, complexity of online execution, and complexity of verification.

A new theory: we are curious about tasks that increase our ability to solve as many future tasks as possible

Franziska Brändle, Charley M. Wu, Eric Schulz, What Are We Curious about?, . Trends in Cognitive Sciences, Volume 24, Issue 9, 2020 DOI: 10.1016/j.tics.2020.05.010.

(no abstract).

Synthesizing a supervisor (a Finite State Machine) instead of finding a standard policy in MDPs, applied to multi-agent systems

B. Wu, X. Zhang and H. Lin Permissive Supervisor Synthesis for Markov Decision Processes Through Learning. IEEE Transactions on Automatic Control, vol. 64, no. 8, pp. 3332-3338, Aug. 2019. DOI: 10.1109/TAC.2018.2879505.

This paper considers the permissive supervisor synthesis for probabilistic systems modeled as Markov Decision Processes (MDP). Such systems are prevalent in power grids, transportation networks, communication networks, and robotics. We propose a novel supervisor synthesis framework using automata learning and compositional model checking to generate the permissive local supervisors in a distributed manner. With the recent advances in assume-guarantee reasoning verification for MDPs, constructing the composed system can be avoided to alleviate the state space explosion. Our framework learns the supervisors iteratively using counterexamples from the verification and is guaranteed to terminate in finite steps and to be correct.

A Survey of Knowledge Representation in Service Robotics

avid Paulius, Yu Sun, A Survey of Knowledge Representation in Service Robotics,Robotics and Autonomous Systems, Volume 118, 2019, Pages 13-30 DOI: 10.1016/j.robot.2019.03.005.

Within the realm of service robotics, researchers have placed a great amount of effort into learning, understanding, and representing motions as manipulations for task execution by robots. The task of robot learning and problem-solving is very broad, as it integrates a variety of tasks such as object detection, activity recognition, task/motion planning, localization, knowledge representation and retrieval, and the intertwining of perception/vision and machine learning techniques. In this paper, we solely focus on knowledge representations and notably how knowledge is typically gathered, represented, and reproduced to solve problems as done by researchers in the past decades. In accordance with the definition of knowledge representations, we discuss the key distinction between such representations and useful learning models that have extensively been introduced and studied in recent years, such as machine learning, deep learning, probabilistic modeling, and semantic graphical structures. Along with an overview of such tools, we discuss the problems which have existed in robot learning and how they have been built and used as solutions, technologies or developments (if any) which have contributed to solving them. Finally, we discuss key principles that should be considered when designing an effective knowledge representation.

A survey on HTN planning

Ilche Georgievski, Marco Aiello, HTN planning: Overview, comparison, and beyond, Artificial Intelligence, Volume 222, 2015, Pages 124-156 DOI: 10.1016/j.artint.2015.02.002.

Hierarchies are one of the most common structures used to understand and conceptualise the world. Within the field of Artificial Intelligence (AI) planning, which deals with the automation of world-relevant problems, Hierarchical Task Network (HTN) planning is the branch that represents and handles hierarchies. In particular, the requirement for rich domain knowledge to characterise the world enables HTN planning to be very useful, and also to perform well. However, the history of almost 40 years obfuscates the current understanding of HTN planning in terms of accomplishments, planning models, similarities and differences among hierarchical planners, and its current and objective image. On top of these issues, the ability of hierarchical planning to truly cope with the requirements of real-world applications has been often questioned. As a remedy, we propose a framework-based approach where we first provide a basis for defining different formal models of hierarchical planning, and define two models that comprise a large portion of HTN planners. Second, we provide a set of concepts that helps in interpreting HTN planners from the aspect of their search space. Then, we analyse and compare the planners based on a variety of properties organised in five segments, namely domain authoring, expressiveness, competence, computation and applicability. Furthermore, we select Web service composition as a real-world and current application, and classify and compare the approaches that employ HTN planning to solve the problem of service composition. Finally, we conclude with our findings and present directions for future work. In summary, we provide a novel and comprehensive viewpoint on a core AI planning technique.

Weighting relations between concepts to form (hierarchically) further concepts

T. Nakamura and T. Nagai, Ensemble-of-Concept Models for Unsupervised Formation of Multiple Categories, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 4, pp. 1043-1057, DOI: 10.1109/TCDS.2017.2745502.

Recent studies have shown that robots can form concepts and understand the meanings of words through inference. The key idea underlying these studies is the “multimodal categorization” of a robot’s experiences. Despite the success in the formation of concepts by robots, a major drawback of previous studies stems from the fact that they have been mainly focused on object concepts. Obviously, human concepts are limited not only to object concepts but also to other kinds such as those connected to the tactile sense and color. In this paper, we propose a novel model called the ensemble-of-concept models (EoCMs) to form various kinds of concepts. In EoCMs, we introduce weights that represent the strength connecting modalities and concepts. By changing these weights, many concepts that are connected to particular modalities can be formed; however, meaningless concepts for humans are included in these concepts. To communicate with humans, robots are required to form meaningful concepts for us. Therefore, we utilize utterances taught by human users as the robot observes objects. The robot connects words included in the teaching utterances with formed concepts and selects meaningful concepts to communicate with users. The experimental results show that the robot can form not only object concepts but also others such as color-related concepts and haptic concepts. Furthermore, using word2vec, we compare the meanings of the words acquired by the robot in connecting them to the concepts formed.

A new variant of A* that is more computationally efficient

Adam Niewola, Leszek Podsedkowski, L* Algorithm—A Linear Computational Complexity Graph Searching Algorithm for Path Planning, Journal of Intelligent & Robotic Systems, September 2018, Volume 91, Issue 3–4, pp 425–444, DOI: 10.1007/s10846-017-0748-6.

The state-of-the-art graph searching algorithm applied to the optimal global path planning problem for mobile robots is the A* algorithm with the heap structured open list. In this paper, we present a novel algorithm, called the L* algorithm, which can be applied to global path planning and is faster than the A* algorithm. The structure of the open list with the use of bidirectional sublists (buckets) ensures the linear computational complexity of the L* algorithm because the nodes in the current bucket can be processed in any sequence and it is not necessary to sort the bucket. Our approach can maintain the optimality and linear computational complexity with the use of the cost expressed by floating-point numbers. The paper presents the requirements of the L* algorithm use and the proof of the admissibility of this algorithm. The experiments confirmed that the L* algorithm is faster than the A* algorithm in various path planning scenarios. We also introduced a method of estimating the execution time of the A* and the L* algorithm. The method was compared with the experimental results.

A survey on decision making for multiagent systems, including multirobot systems

Y. Rizk, M. Awad and E. W. Tunstel, Decision Making in Multiagent Systems: A Survey, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 3, pp. 514-529, DOI: 10.1109/TCDS.2018.2840971.

Intelligent transport systems, efficient electric grids, and sensor networks for data collection and analysis are some examples of the multiagent systems (MAS) that cooperate to achieve common goals. Decision making is an integral part of intelligent agents and MAS that will allow such systems to accomplish increasingly complex tasks. In this survey, we investigate state-of-the-art work within the past five years on cooperative MAS decision making models, including Markov decision processes, game theory, swarm intelligence, and graph theoretic models. We survey algorithms that result in optimal and suboptimal policies such as reinforcement learning, dynamic programming, evolutionary computing, and neural networks. We also discuss the application of these models to robotics, wireless sensor networks, cognitive radio networks, intelligent transport systems, and smart electric grids. In addition, we define key terms in the area and discuss remaining challenges that include incorporating big data advancements to decision making, developing autonomous, scalable and computationally efficient algorithms, tackling more complex tasks, and developing standardized evaluation metrics. While recent surveys have been published on this topic, we present a broader discussion of related models and applications.Note to Practitioners:Future smart cities will rely on cooperative MAS that make decisions about what actions to perform that will lead to the completion of their tasks. Decision making models and algorithms have been developed and reported in the literature to generate such sequences of actions. These models are based on a wide variety of principles including human decision making and social animal behavior. In this paper, we survey existing decision making models and algorithms that generate optimal and suboptimal sequences of actions. We also discuss some of the remaining challenges faced by the research community before more effective MAS deployment can be achieved in this age of Internet of Things, robotics, and mobile devices. These challenges include developing more scalable and efficient algorithms, utilizing the abundant sensory data available, tackling more complex tasks, and developing evaluation standards for decision making.