Tag Archives: Task Planning

Building explanations for AI plans by modifying user’s models to make those plans optimal within them

Sarath Sreedharan, Tathagata Chakraborti, Subbarao Kambhampati, Foundations of explanations as model reconciliation, Artificial Intelligence, Volume 301,
2021, DOI: 10.1016/j.artint.2021.103558.

Past work on plan explanations primarily involved the AI system explaining the correctness of its plan and the rationale for its decision in terms of its own model. Such soliloquy is wholly inadequate in most realistic scenarios where users have domain and task models that differ from that used by the AI system. We posit that the explanations are best studied in light of these differing models. In particular, we show how explanation can be seen as a \u201cmodel reconciliation problem\u201d (MRP), where the AI system in effect suggests changes to the user’s mental model so as to make its plan be optimal with respect to that changed user model. We will study the properties of such explanations, present algorithms for automatically computing them, discuss relevant extensions to the basic framework, and evaluate the performance of the proposed algorithms both empirically and through controlled user studies.

Mixing logical planning with NNs for decision making

Zuo, G., Pan, T., Zhang, T. et al., SOAR Improved Artificial Neural Network for Multistep Decision-making Tasks, . Cogn Comput 13, 612–625 (2021) DOI: 10.1007/s12559-020-09716-6.

Recently, artificial neural networks (ANNs) have been applied to various robot-related research areas due to their powerful spatial feature abstraction and temporal information prediction abilities. Decision-making has also played a fundamental role in the research area of robotics. How to improve ANNs with the characteristics of decision-making is a challenging research issue. ANNs are connectionist models, which means they are naturally weak in long-term planning, logical reasoning, and multistep decision-making. Considering that a small refinement of the inner network structures of ANNs will usually lead to exponentially growing data costs, an additional planning module seems necessary for the further improvement of ANNs, especially for small data learning. In this paper, we propose a state operator and result (SOAR) improved ANN (SANN) model, which takes advantage of both the long-term cognitive planning ability of SOAR and the powerful feature detection ability of ANNs. It mimics the cognitive mechanism of the human brain to improve the traditional ANN with an additional logical planning module. In addition, a data fusion module is constructed to combine the probability vector obtained by SOAR planning and the original data feature array. A data fusion module is constructed to convert the information from the logical sequences in SOAR to the probabilistic vector in ANNs. The proposed architecture is validated in two types of robot multistep decision-making experiments for a grasping task: a multiblock simulated experiment and a multicup experiment in a real scenario. The experimental results show the efficiency and high accuracy of our proposed architecture. The integration of SOAR and ANN is a good compromise between logical planning with small data and probabilistic classification with big data. It also has strong potential for more complicated tasks that require robust classification, long-term planning, and fast learning. Some potential applications include recognition of grasping order in multiobject environment and cooperative grasping of multiagents.

Improving POMDP solving efficiency by eliminating variables in the state structure

Eric A. Hansen, An integrated approach to solving influence diagrams and finite-horizon partially observable decision processes, . Artificial Intelligence, Volume 294, 2021 DOI: 10.1016/j.artint.2020.103431.

We show how to integrate a variable elimination approach to solving influence diagrams with a value iteration approach to solving finite-horizon partially observable Markov decision processes (POMDPs). The integration of these approaches creates a variable elimination algorithm for influence diagrams that has much more relaxed constraints on elimination order, which allows improved scalability in many cases. The new algorithm can also be viewed as a generalization of the value iteration algorithm for POMDPs that solves non-Markovian as well as Markovian problems, in addition to leveraging a factored representation for improved efficiency. The development of a single algorithm that integrates and generalizes both of these classic algorithms, one for influence diagrams and the other for POMDPs, unifies these two approaches to solving Bayesian decision problems in a way that combines their complementary advantages.

A survey on HTN planning

Ilche Georgievski, Marco Aiello, HTN planning: Overview, comparison, and beyond, Artificial Intelligence, Volume 222, 2015, Pages 124-156 DOI: 10.1016/j.artint.2015.02.002.

Hierarchies are one of the most common structures used to understand and conceptualise the world. Within the field of Artificial Intelligence (AI) planning, which deals with the automation of world-relevant problems, Hierarchical Task Network (HTN) planning is the branch that represents and handles hierarchies. In particular, the requirement for rich domain knowledge to characterise the world enables HTN planning to be very useful, and also to perform well. However, the history of almost 40 years obfuscates the current understanding of HTN planning in terms of accomplishments, planning models, similarities and differences among hierarchical planners, and its current and objective image. On top of these issues, the ability of hierarchical planning to truly cope with the requirements of real-world applications has been often questioned. As a remedy, we propose a framework-based approach where we first provide a basis for defining different formal models of hierarchical planning, and define two models that comprise a large portion of HTN planners. Second, we provide a set of concepts that helps in interpreting HTN planners from the aspect of their search space. Then, we analyse and compare the planners based on a variety of properties organised in five segments, namely domain authoring, expressiveness, competence, computation and applicability. Furthermore, we select Web service composition as a real-world and current application, and classify and compare the approaches that employ HTN planning to solve the problem of service composition. Finally, we conclude with our findings and present directions for future work. In summary, we provide a novel and comprehensive viewpoint on a core AI planning technique.

Extending STRIPS-like symbolic planners with metrical/physical constraints for the domain of robotic manipulation

Caelan Reed Garrett, Tomás Lozano-Pérez, and Leslie Pack Kaelbling, FFRob: Leveraging symbolic planning for efficient task and motion planning, The International Journal of Robotics Research Vol 37, Issue 1, pp. 104 – 136, DOI: 10.1177/0278364917739114
.

Mobile manipulation problems involving many objects are challenging to solve due to the high dimensionality and multi-modality of their hybrid configuration spaces. Planners that perform a purely geometric search are prohibitively slow for solving these problems because they are unable to factor the configuration space. Symbolic task planners can efficiently construct plans involving many variables but cannot represent the geometric and kinematic constraints required in manipulation. We present the FFRob algorithm for solving task and motion planning problems. First, we introduce extended action specification (EAS) as a general purpose planning representation that supports arbitrary predicates as conditions. We adapt existing heuristic search ideas for solving strips planning problems, particularly delete-relaxations, to solve EAS problem instances. We then apply the EAS representation and planners to manipulation problems resulting in FFRob. FFRob iteratively discretizes task and motion planning problems using batch sampling of manipulation primitives and a multi-query roadmap structure that can be conditionalized to evaluate reachability under different placements of movable objects. This structure enables the EAS planner to efficiently compute heuristics that incorporate geometric and kinematic planning constraints to give a tight estimate of the distance to the goal. Additionally, we show FFRob is probabilistically complete and has a finite expected runtime. Finally, we empirically demonstrate FFRob’s effectiveness on complex and diverse task and motion planning tasks including rearrangement planning and navigation among movable objects.

Using MDPs when the transition probability matrix is just partially specified, therefore getting closer to a model-free approach

Karina V. Delgado, Leliane N. de Barros, Daniel B. Dias, Scott Sanner, Real-time dynamic programming for Markov decision processes with imprecise probabilities, Artificial Intelligence, Volume 230, January 2016, Pages 192-223, ISSN 0004-3702, DOI: 10.1016/j.artint.2015.09.005.

Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-IP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.

See also:

  • Karina Valdivia Delgado, Scott Sanner, Leliane Nunes de Barros, Efficient solutions to factored MDPs with imprecise transition probabilities, Artif. Intell. 175 (9–10) (2011) 1498–1527.
  • Satia, J. K., and Lave Jr., R. E. 1970. MDPs with uncertain transition probabilities. Operations Research 21:728–740
  • White III, C. C., and El-Deib, H. K. 1994. MDPs with Imprecise Transition Probabilities. Operations Research 42(4):739–749

Planning tasks in mobile robots with MDPs that maximize the probability of satisfying user’s requirements specified through temporal logics, with estimation of transition probabilities through simulation only when needed

Jing Wang, Xuchu Ding, Morteza Lahijanian, Ioannis Ch. Paschalidis, and Calin A. Belta, Temporal logic motion control using actor–critic methods, The International Journal of Robotics Research September 2015 34: 1329-1344, first published on May 26, 2015. DOI: 10.1177/0278364915581505.

This paper considers the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov decision process (MDP). The robot control problem becomes finding the control policy which maximizes the probability of satisfying the temporal logic task on the MDP. For a large environment, obtaining transition probabilities for each state–action pair, as well as solving the necessary optimization problem for the optimal policy, are computationally intensive. To address these issues, we propose an approximate dynamic programming framework based on a least-squares temporal difference learning method of the actor–critic type. This framework operates on sample paths of the robot and optimizes a randomized control policy with respect to a small set of parameters. The transition probabilities are obtained only when needed. Simulations confirm that convergence of the parameters translates to an approximately optimal policy.

Nice related work on efficient POMDPs and two novel approaches to reduce their computational cost

Grady, D.K.; Moll, M.; Kavraki, L.E., Extending the Applicability of POMDP Solutions to Robotic Tasks, in Robotics, IEEE Transactions on , vol.31, no.4, pp.948-961, Aug. 2015 DOI: 10.1109/TRO.2015.2441511

Partially observable Markov decision processes (POMDPs) are used in many robotic task classes from soccer to household chores. Determining an approximately optimal action policy for POMDPs is PSPACE-complete, and the exponential growth of computation time prohibits solving large tasks. This paper describes two techniques to extend the range of robotic tasks that can be solved using a POMDP. Our first technique reduces the motion constraints of a robot and, then, uses state-of-the-art robotic motion planning techniques to respect the true motion constraints at runtime. We then propose a novel task decomposition that can be applied to some indoor robotic tasks. This decomposition transforms a long time horizon task into a set of shorter tasks. We empirically demonstrate the performance gain provided by these two techniques through simulated execution in a variety of environments. Comparing a direct formulation of a POMDP to solving our proposed reductions, we conclude that the techniques proposed in this paper can provide significant enhancement to current POMDP solution techniques, extending the POMDP instances that can be solved to include large continuous-state robotic tasks.

Semantic and syntactic bootstrapped learning for robots, inspired in similar processes in humans, that use language as a scaffolding mechanism to improve learning in unknown situations

Worgotter, F.; Geib, C.; Tamosiunaite, M.; Aksoy, E.E.; Piater, J.; Hanchen Xiong; Ude, A.; Nemec, B.; Kraft, D.; Kruger, N.; Wachter, M.; Asfour, T., Structural Bootstrapping—A Novel, Generative Mechanism for Faster and More Efficient Acquisition of Action-Knowledge, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.140,154, June 2015, DOI: 10.1109/TAMD.2015.2427233.

Humans, but also robots, learn to improve their behavior. Without existing knowledge, learning either needs to be explorative and, thus, slow or-to be more efficient-it needs to rely on supervision, which may not always be available. However, once some knowledge base exists an agent can make use of it to improve learning efficiency and speed. This happens for our children at the age of around three when they very quickly begin to assimilate new information by making guided guesses how this fits to their prior knowledge. This is a very efficient generative learning mechanism in the sense that the existing knowledge is generalized into as-yet unexplored, novel domains. So far generative learning has not been employed for robots and robot learning remains to be a slow and tedious process. The goal of the current study is to devise for the first time a general framework for a generative process that will improve learning and which can be applied at all different levels of the robot’s cognitive architecture. To this end, we introduce the concept of structural bootstrapping-borrowed and modified from child language acquisition-to define a probabilistic process that uses existing knowledge together with new observations to supplement our robot’s data-base with missing information about planning-, object-, as well as, action-relevant entities. In a kitchen scenario, we use the example of making batter by pouring and mixing two components and show that the agent can efficiently acquire new knowledge about planning operators, objects as well as required motor pattern for stirring by structural bootstrapping. Some benchmarks are shown, too, that demonstrate how structural bootstrapping improves performance.

Developmental approach for a robot manipulator that learns in several bootstrapped stages, strongly inspired in infant development

Ugur, E.; Nagai, Y.; Sahin, E.; Oztop, E., Staged Development of Robot Skills: Behavior Formation, Affordance Learning and Imitation with Motionese, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.119,139, June 2015, DOI: 10.1109/TAMD.2015.2426192.

Inspired by infant development, we propose a three staged developmental framework for an anthropomorphic robot manipulator. In the first stage, the robot is initialized with a basic reach-and- enclose-on-contact movement capability, and discovers a set of behavior primitives by exploring its movement parameter space. In the next stage, the robot exercises the discovered behaviors on different objects, and learns the caused effects; effectively building a library of affordances and associated predictors. Finally, in the third stage, the learned structures and predictors are used to bootstrap complex imitation and action learning with the help of a cooperative tutor. The main contribution of this paper is the realization of an integrated developmental system where the structures emerging from the sensorimotor experience of an interacting real robot are used as the sole building blocks of the subsequent stages that generate increasingly more complex cognitive capabilities. The proposed framework includes a number of common features with infant sensorimotor development. Furthermore, the findings obtained from the self-exploration and motionese guided human-robot interaction experiments allow us to reason about the underlying mechanisms of simple-to-complex sensorimotor skill progression in human infants.