A survey on HTN planning

Ilche Georgievski, Marco Aiello, HTN planning: Overview, comparison, and beyond, Artificial Intelligence, Volume 222, 2015, Pages 124-156 DOI: 10.1016/j.artint.2015.02.002.

Hierarchies are one of the most common structures used to understand and conceptualise the world. Within the field of Artificial Intelligence (AI) planning, which deals with the automation of world-relevant problems, Hierarchical Task Network (HTN) planning is the branch that represents and handles hierarchies. In particular, the requirement for rich domain knowledge to characterise the world enables HTN planning to be very useful, and also to perform well. However, the history of almost 40 years obfuscates the current understanding of HTN planning in terms of accomplishments, planning models, similarities and differences among hierarchical planners, and its current and objective image. On top of these issues, the ability of hierarchical planning to truly cope with the requirements of real-world applications has been often questioned. As a remedy, we propose a framework-based approach where we first provide a basis for defining different formal models of hierarchical planning, and define two models that comprise a large portion of HTN planners. Second, we provide a set of concepts that helps in interpreting HTN planners from the aspect of their search space. Then, we analyse and compare the planners based on a variety of properties organised in five segments, namely domain authoring, expressiveness, competence, computation and applicability. Furthermore, we select Web service composition as a real-world and current application, and classify and compare the approaches that employ HTN planning to solve the problem of service composition. Finally, we conclude with our findings and present directions for future work. In summary, we provide a novel and comprehensive viewpoint on a core AI planning technique.

Designing robotic architectures by coordinating different modules in a data-flow graphical paradigm

Sebastian Buck, Andreas Zell, CS::APEX: A Framework for Algorithm Prototyping and Experimentation with Robotic Systems. Modeling Perception and High Level Robot Control with Activity Flow Graphs, Journal of Intelligent & Robotic Systems (2019) 94:371–387, DOI: 10.1007/s10846-018-0831-7.

Robotic systems differ drastically in their sensory capabilities, their computational power and their designated tasks. For
efficient algorithm development, however, we need to have a common modeling framework that enables us to generalize and
re-use existing solutions. A modular approach, which is coherent across different platforms, also allows faster prototyping
of new systems, given that existing functionality can be reused from already implemented modules. In this paper we develop
a modeling framework based on data flow graphs that achieves the following goal: We first merge synchronous data flow
and reactive programming into hybrid flow graphs, where we explicitly model synchronous and asynchronous data flow.
Then we transfer concepts from finite-state machines to achieve a coherent framework which we call Activity Flow Graphs.
The flow of activity enables us to model high level states directly in the data flow graph. The result is a single computation
graph that can express both perception and high level control aspects of any robotic system. This theoretical foundation is
the core of our open-source software framework CS::APEX, which allows the creation, manipulation and evaluation of
Activity Flow Graphs and enables rapid prototyping and experimentation and can be used with any robot supporting the
Robot Operating System (ROS). We then demonstrate the framework with two high level models for a fetch-and-delivery
robot and a person following robot.

Learning multiple-factors metrics for measuring the similarity between objects

H. Ye, D. Zhan, Y. Jiang and Z. Zhou, What Makes Objects Similar: A Unified Multi-Metric Learning Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 5, pp. 1257-1270, DOI: 10.1109/TPAMI.2018.2829192.

Linkages are essentially determined by similarity measures that may be derived from multiple perspectives. For example, spatial linkages are usually generated based on localities of heterogeneous data. Semantic linkages, however, can come from even more properties, such as different physical meanings behind social relations. Many existing metric learning models focus on spatial linkages but leave the rich semantic factors unconsidered. We propose a Unified Multi-Metric Learning (Um$^2$2l) framework to exploit multiple types of metrics with respect to overdetermined similarities between linkages. In Um$^2$2l, types of combination operators are introduced for distance characterization from multiple perspectives, and thus can introduce flexibilities for representing and utilizing both spatial and semantic linkages. Besides, we propose a uniform solver for Um$^2$2l, and the theoretical analysis reflects the generalization ability of Um$^2$2l as well. Extensive experiments on diverse applications exhibit the superior classification performance and comprehensibility of Um$^2$2l. Visualization results also validate its ability to physical meanings discovery.

Taking into account the influence of a recommender in the change of behaviour of the agent using it

Jonathan P. Epperlein, Sergiy Zhuk, Robert Shorten, Recovering Markov models from closed-loop data, Automatica, Volume 103, 2019, Pages 116-125, DOI: 10.1016/j.automatica.2019.01.022.

Situations in which recommender systems are used to augment decision making are becoming prevalent in many application domains. Almost always, these prediction tools (recommenders) are created with a view to affecting behavioural change. Clearly, successful applications actuating behavioural change, affect the original model underpinning the predictor, leading to an inconsistency. This feedback loop is often not considered in standard machine learning techniques which rely upon machine learning/statistical learning machinery. The objective of this paper is to develop tools that recover unbiased user models in the presence of recommenders. More specifically, we assume that we observe a time series which is a trajectory of a Markov chain R modulated by another Markov chain S, i.e. the transition matrix of R is unknown and depends on the current state of S. The transition matrix of the latter is also unknown. In other words, at each time instant, S selects a transition matrix for R within a given set which consists of known and unknown matrices. The state of S, in turn, depends on the current state of R thus introducing a feedback loop. We propose an Expectation–Maximisation (EM) type algorithm, which estimates the transition matrices of S and R. Experimental results are given to demonstrate the efficacy of the approach.

Predicting the structure of indoor environments for mobile robots

Matteo Luperto, Francesco Amigoni, Predicting the global structure of indoor environments: A constructive machine learning approach, Autonomous Robots, April 2019, Volume 43, Issue 4, pp 813–835, DOI: 10.1007/s10514-018-9732-7.

Consider a mobile robot exploring an initially unknown school building and assume that it has already discovered some corridors, classrooms, offices, and bathrooms. What can the robot infer about the presence and the locations of other classrooms and offices and, more generally, about the structure of the rest of the building? This paper presents a system that makes a step towards providing an answer to the above question. The proposed system is based on a generative model that is able to represent the topological structures and the semantic labeling schemas of buildings and to generate plausible hypotheses for unvisited portions of these environments. We represent the buildings as undirected graphs, whose nodes are rooms and edges are physical connections between them. Given an initial knowledge base of graphs, our approach, relying on constructive machine learning techniques, segments each graph for finding significant subgraphs and clusters them according to their similarity, which is measured using graph kernels. A graph representing a new building or an unvisited part of a building is eventually generated by sampling subgraphs from clusters and connecting them.

On the definition of “action” in robotics and other fields

Philipp Zech Erwan Renaudo, Simon Haller, Xiang Zhang, Justus Piater, Action representations in robotics: A taxonomy and systematic classification, The International Journal of Robotics Research, 2019, DOI: 10.1177/0278364919835020.

Understanding and defining the meaning of “action” is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey, we thus first review existing ideas and theories on the notion and meaning of action. Subsequently, we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a meticulous literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook towards promising research directions.

An interesting review of criticisms of deep learning in cognitive science

Radoslaw M. Cichy, Daniel Kaiser, Deep Neural Networks as Scientific Models, Trends in Cognitive Sciences, Volume 23, Issue 4, 2019, Pages 305-317, DOI: 10.1016/j.tics.2019.01.009.

Artificial deep neural networks (DNNs) initially inspired by the brain enable computers to solve cognitive tasks at which humans excel. In the absence of explanations for such cognitive phenomena, in turn cognitive scientists have started using DNNs as models to investigate biological cognition and its neural basis, creating heated debate. Here, we reflect on the case from the perspective of philosophy of science. After putting DNNs as scientific models into context, we discuss how DNNs can fruitfully contribute to cognitive science. We claim that beyond their power to provide predictions and explanations of cognitive phenomena, DNNs have the potential to contribute to an often overlooked but ubiquitous and fundamental use of scientific models: exploration.

On how attention, modelled by bayesian inference (for category learning), can structure the way reinforcement learning works

Angela Radulescu, Yael Niv, Ian Ballard, Holistic Reinforcement Learning: The Role of Structure and Attention, Trends in Cognitive Sciences, Volume 23, Issue 4, 2019, Pages 278-292, DOI: 10.1016/j.tics.2019.01.010.

Compact representations of the environment allow humans to behave efficiently in a complex world. Reinforcement learning models capture many behavioral and neural effects but do not explain recent findings showing that structure in the environment influences learning. In parallel, Bayesian cognitive models predict how humans learn structured knowledge but do not have a clear neurobiological implementation. We propose an integration of these two model classes in which structured knowledge learned via approximate Bayesian inference acts as a source of selective attention. In turn, selective attention biases reinforcement learning towards relevant dimensions of the environment. An understanding of structure learning will help to resolve the fundamental challenge in decision science: explaining why people make the decisions they do.

On how value of actions (in the RL sense) can be coded in the brain

Rory J. Bufacchi, Gian Domenico Iannetti, The Value of Actions, in Time and Space, Trends in Cognitive Sciences, Volume 23, Issue 4, 2019, Pages 270-271, DOI: 10.1016/j.tics.2019.01.011.

This value-output function can be a neural network, in which case the assumptions about the future are stored in the precise network configuration. The values that such a network outputs, or at least the intermediate steps necessary for calculating the final values, are the ‘action relevances’ we mention in our original paper (in the case of the brain, the inputs to such a value-calculating network should be state estimators, which likely include activity coming from the ventral stream, frontal areas, and limbic regions [3]). Our claim was thus that PPS-related measures reflect the instantaneous value of particular types of actions, and not that PPS measures explicitly reflect the value of any possible action at any given time (i.e., for any possible state): PPS measures reflect the instantaneous output of a function rather than the infinite array of values that the output of this function could take. We might have contributed to this misunderstanding when claiming that a field is ‘a quantity that has a magnitude for each point in space and time’. We should have clarified that the magnitude of a PPS measure can be seen as a specific sample from a field in the here and now rather than as a database containing all possible field values.

Improving Q-learning by initialization of the Q matrix and a nice related work of that approach

Ee Soong Low, Pauline Ong, Kah Chun Cheah, Solving the optimal path planning of a mobile robot using improved Q-learning, Robotics and Autonomous Systems, Volume 115, 2019, Pages 143-161, DOI: 10.1016/j.robot.2019.02.013.

Q-learning, a type of reinforcement learning, has gained increasing popularity in autonomous mobile robot path planning recently, due to its self-learning ability without requiring a priori model of the environment. Yet, despite such advantage, Q-learning exhibits slow convergence to the optimal solution. In order to address this limitation, the concept of partially guided Q-learning is introduced wherein, the flower pollination algorithm (FPA) is utilized to improve the initialization of Q-learning. Experimental evaluation of the proposed improved Q-learning under the challenging environment with a different layout of obstacles shows that the convergence of Q-learning can be accelerated when Q-values are initialized appropriately using the FPA. Additionally, the effectiveness of the proposed algorithm is validated in a real-world experiment using a three-wheeled mobile robot.