Category Archives: Robotics

Multi-agent reinfocerment learning for working with high-dimensional spaces

David L. Leottau, Javier Ruiz-del-Solar, Robert Babuška, Decentralized Reinforcement Learning of Robot Behaviors, Artificial Intelligence, Volume 256, 2018, Pages 130-159, DOI: 10.1016/j.artint.2017.12.001.

A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In addition to proposing this methodology, three specific multi agent DRL approaches are considered: DRL-Independent, DRL Cooperative-Adaptive (CA), and DRL-Lenient. These approaches are validated and analyzed with an extensive empirical study using four different problems: 3D Mountain Car, SCARA Real-Time Trajectory Generation, Ball-Dribbling in humanoid soccer robotics, and Ball-Pushing using differential drive robots. The experimental validation provides evidence that DRL implementations show better performances and faster learning times than their centralized counterparts, while using less computational resources. DRL-Lenient and DRL-CA algorithms achieve the best final performances for the four tested problems, outperforming their DRL-Independent counterparts. Furthermore, the benefits of the DRL-Lenient and DRL-CA are more noticeable when the problem complexity increases and the centralized scheme becomes intractable given the available computational resources and training time.

Using interactive reinforcement learning with the advisor being another reinforcement learning agent

Francisco Cruz, Sven Magg, Yukie Nagai & Stefan Wermter, Improving interactive reinforcement learning: What makes a good teacher?, Connection Science, DOI: 10.1080/09540091.2018.1443318.

Interactive reinforcement learning (IRL) has become an important apprenticeship approach to speed up convergence in classic reinforcement learning (RL) problems. In this regard, a variant of IRL is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using RL methods to afterward becoming an advisor for other learner-agents. In this work, we analyse internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behaviour in terms of the state visit frequency of the learner-agents. Moreover, we analyse system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

using fuzzy Petri nets for mobile robot navigation

Seung-yun Kim, Yilin Yang, A self-navigating robot using Fuzzy Petri nets, Robotics and Autonomous Systems, Volume 101, 2018, Pages 153-165, DOI: 10.1016/j.robot.2017.11.008.

Petri nets (PNs) are capable of modeling nearly any conceivable system and can provide a better understanding of the idealized action sequence in which to most effectively describe or execute said system through their powerful analytical capabilities. However, because real world instances are rarely as consistent and ideal as simulated models, basic PN modeling and simulation properties may be insufficient in practical application. We remedy this through specialization in Fuzzy Petri nets (FPNs). Fuzzy logic is incorporated to better model a self-navigating robot algorithm, thanks to its versatile multi-valued logic reasoning. By using FPNs, it is possible to simulate, assess, and communicate the process and reasoning of the navigational algorithm and apply it to real world programming. In this paper, we propose a series of specific fuzzy algorithms intended to be implemented in concert on a mobile robot platform in order to optimize the sequence of actions needed for a given task, primarily the navigation of an unknown maze. A set of varied maze configurations were developed and simulated as PN and FPN models, providing a testing environment to examine the efficiency of several methodologies. Five methods, including an original proposal in this paper, were compared across 30,000 simulations, evaluating in particular performance in processing cost in time. Our experiments concluded with results suggesting a very competitive task completion time at a considerable fraction in processing cost compared to the closest performing alternatives.

Hybridizing RRT with deliberative path planning to improve performance

Dong, Y., Camci, E. & Kayacan, Faster RRT-based Nonholonomic Path Planning in 2D Building Environments Using Skeleton-constrained Path Biasing, J Intell Robot Syst (2018) 89: 387, DOI: 10.1007/s10846-017-0567-9.

This paper presents a faster RRT-based path planning approach for regular 2-dimensional (2D) building environments. To minimize the planning time, we adopt the idea of biasing the RRT tree-growth in more focused ways. We propose to calculate the skeleton of the 2D environment first, then connect a geometrical path on the skeleton, and grow the RRT tree via the seeds generated locally along this path. We conduct batched simulations to find the universal parameters in manipulating the seeds generation. We show that the proposed skeleton-biased locally-seeded RRT (skilled-RRT) is faster than the other baseline planners (RRT, RRT*, A*-RRT, Theta*-RRT, and MARRT) through experimental tests using different vehicles in different 2D building environments. Given mild assumptions of the 2D environments, we prove that the proposed approach is probabilistically complete. We also present an application of the skilled-RRT for unmanned ground vehicle. Compared to the other baseline algorithms (Theta*-RRT and MARRT), we show the applicability and fast planning of the skilled-RRT in real environment.

Using active perception for object classification

Patten, T., Martens, W. & Fitch, R., Monte Carlo planning for active object classification, Auton Robot (2018) 42: 391, DOI: 10.1007/s10514-017-9626-0.

Classifying objects in complex unknown environments is a challenging problem in robotics and is fundamental in many applications. Modern sensors and sophisticated perception algorithms extract rich 3D textured information, but are limited to the data that are collected from a given location or path. We are interested in closing the loop around perception and planning, in particular to plan paths for better perceptual data, and focus on the problem of planning scanning sequences to improve object classification from range data. We formulate a novel time-constrained active classification problem and propose solution algorithms that employ a variation of Monte Carlo tree search to plan non-myopically. Our algorithms use a particle filter combined with Gaussian process regression to estimate joint distributions of object class and pose. This estimator is used in planning to generate a probabilistic belief about the state of objects in a scene, and also to generate beliefs for predicted sensor observations from future viewpoints. These predictions consider occlusions arising from predicted object positions and shapes. We evaluate our algorithms in simulation, in comparison to passive and greedy strategies. We also describe similar experiments where the algorithms are implemented online, using a mobile ground robot in a farm environment. Results indicate that our non-myopic approach outperforms both passive and myopic strategies, and clearly show the benefit of active perception for outdoor object classification.

Using two different environment representations: a detailed one for SLAM, a coarse one for selecting actions for active perception

Nelson, E., Corah, M. & Michael, N., Environment model adaptation for mobile robot exploration,Auton Robot (2018) 42: 257, DOI: 10.1007/s10514-017-9669-2.

In this work, we propose a methodology to adapt a mobile robot’s environment model during exploration as a means of decreasing the computational complexity associated with information metric evaluation and consequently increasing the speed at which the system is able to plan actions and travel through an unknown region given finite computational resources. Recent advances in exploration compute control actions by optimizing information-theoretic metrics on the robot’s map. These metrics are generally computationally expensive to evaluate, limiting the speed at which a robot is able to explore. To reduce computational cost, we propose keeping two representations of the environment: one full resolution representation for planning and collision checking, and another with a coarse resolution for rapidly evaluating the informativeness of planned actions. To generate the coarse representation, we employ the Principal of Relevant Information from rate distortion theory to compress a robot’s occupancy grid map. We then propose a method for selecting a coarse representation that sacrifices a minimal amount of information about expected future sensor measurements using the Information Bottleneck Method. We outline an adaptive strategy that changes the robot’s environment representation in response to its surroundings to maximize the computational efficiency of exploration. On computationally constrained systems, this reduction in complexity enables planning over longer predictive horizons, leading to faster navigation. We simulate and experimentally evaluate mutual information based exploration through cluttered indoor environments with exploration rates that adapt based on environment complexity leading to an order-of-magnitude increase in the maximum rate of exploration in contrast to non-adaptive techniques given the same finite computational resources.

A novel approach to use POMDP in practical active perception, where rewards are needed to penalize uncertainty and therefore reomve the piecewise-linear and convex property of the value function

Satsangi, Y., Whiteson, S., Oliehoek, F.A. et al., Exploiting submodular value functions for scaling up active perception, Auton Robot (2018) 42: 209, DOI: 10.1007/s10514-017-9666-5.

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze ρ POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a ρ POMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given ρ POMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and ρ POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.

A nice review on the topic of active perception

Ruzena BajcsyYiannis AloimonosJohn K. Tsotsos, Revisiting active perception, Auton Robot (2018) 42: 177, DOI: 10.1007/s10514-017-9615-3.

Despite the recent successes in robotics, artificial intelligence and computer vision, a complete artificial agent necessarily must include active perception. A multitude of ideas and methods for how to accomplish this have already appeared in the past, their broader utility perhaps impeded by insufficient computational power or costly hardware. The history of these ideas, perhaps selective due to our perspectives, is presented with the goal of organizing the past literature and highlighting the seminal contributions. We argue that those contributions are as relevant today as they were decades ago and, with the state of modern computational tools, are poised to find new life in the robotic perception systems of the next decade.

Achieving smooth motion in robotic manipulators on-line through their controller, and a nice state-of-the-art of the problem of smooth motion

Yu-Sheng Lu, Yi-Yi Lin, Smooth motion control of rigid robotic manipulators with constraints on high-order kinematic variables, Mechatronics,
Volume 49, 2018, Pages 11-25, DOI: 10.1016/j.mechatronics.2017.11.003.

This paper presents a design for a jerk-constrained, time-optimal controller (JCTOC) that allows the smooth control of rigid robotic manipulators, in which time-optimal output responses are attained with confined jerk. A snap-constrained, time-optimal control (SCTOC) scheme is also proposed to produce even smoother output responses that are time-optimal, with a constraint on the maximum admissible snap. In contrast to conventional path-planning approaches that involve a bounded jerk/snap, the proposed JCTOC and SCTOC practically limit the corresponding high-order kinematic variables in real time. Using the structure of the computed torque control, the PD control, the JCTOC and the SCTOC are experimentally compared in terms of specific performance indices, including a chatter index, which is used to measure the unevenness of a signal.

Extending STRIPS-like symbolic planners with metrical/physical constraints for the domain of robotic manipulation

Caelan Reed Garrett, Tomás Lozano-Pérez, and Leslie Pack Kaelbling, FFRob: Leveraging symbolic planning for efficient task and motion planning, The International Journal of Robotics Research Vol 37, Issue 1, pp. 104 – 136, DOI: 10.1177/0278364917739114
.

Mobile manipulation problems involving many objects are challenging to solve due to the high dimensionality and multi-modality of their hybrid configuration spaces. Planners that perform a purely geometric search are prohibitively slow for solving these problems because they are unable to factor the configuration space. Symbolic task planners can efficiently construct plans involving many variables but cannot represent the geometric and kinematic constraints required in manipulation. We present the FFRob algorithm for solving task and motion planning problems. First, we introduce extended action specification (EAS) as a general purpose planning representation that supports arbitrary predicates as conditions. We adapt existing heuristic search ideas for solving strips planning problems, particularly delete-relaxations, to solve EAS problem instances. We then apply the EAS representation and planners to manipulation problems resulting in FFRob. FFRob iteratively discretizes task and motion planning problems using batch sampling of manipulation primitives and a multi-query roadmap structure that can be conditionalized to evaluate reachability under different placements of movable objects. This structure enables the EAS planner to efficiently compute heuristics that incorporate geometric and kinematic planning constraints to give a tight estimate of the distance to the goal. Additionally, we show FFRob is probabilistically complete and has a finite expected runtime. Finally, we empirically demonstrate FFRob’s effectiveness on complex and diverse task and motion planning tasks including rearrangement planning and navigation among movable objects.