Category Archives: Robotics

A novel method of mathematical compression of the value function for polynomial (in the state) time complexity of value iteration / policy iteration

Alex Gorodetsky, Sertac Karaman, and Youssef Marzouk, High-dimensional stochastic optimal control using continuous tensor decompositions, The International Journal of Robotics Research Vol 37, Issue 2-3, pp. 340 – 377, DOI: 10.1177/0278364917753994.

Motion planning and control problems are embedded and essential in almost all robotics applications. These problems are often formulated as stochastic optimal control problems and solved using dynamic programming algorithms. Unfortunately, most existing algorithms that guarantee convergence to optimal solutions suffer from the curse of dimensionality: the run time of the algorithm grows exponentially with the dimension of the state space of the system. We propose novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure. The proposed algorithms are based on continuous tensor decompositions recently developed by the authors. Essentially, the algorithms represent high-dimensional functions (e.g. the value function) in a compressed format, and directly perform dynamic programming computations (e.g. value iteration, policy iteration) in this format. Under certain technical assumptions, the new algorithms guarantee convergence towards optimal solutions with arbitrary precision. Furthermore, the run times of the new algorithms scale polynomially with the state dimension and polynomially with the ranks of the value function. This approach realizes substantial computational savings in “compressible” problem instances, where value functions admit low-rank approximations. We demonstrate the new algorithms in a wide range of problems, including a simulated six-dimensional agile quadcopter maneuvering example and a seven-dimensional aircraft perching example. In some of these examples, we estimate computational savings of up to 10 orders of magnitude over standard value iteration algorithms. We further demonstrate the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.

A standard format for robotic maps

F. Amigoni et al, A Standard for Map Data Representation: IEEE 1873-2015 Facilitates Interoperability Between Robots, IEEE Robotics & Automation Magazine, vol. 25, no. 1, pp. 65-76, DOI: 10.1109/MRA.2017.2746179.

The availability of environment maps for autonomous robots enables them to complete several tasks. A new IEEE standard, IEEE 1873-2015, Robot Map Data Representation for Navigation (MDR) [15], sponsored by the IEEE Robotics and Automation Society (RAS) and approved by the IEEE Standards Association Standards Board in September 2015, defines a common representation for two-dimensional (2-D) robot maps and is intended to facilitate interoperability among navigating robots. The standard defines an extensible markup language (XML) data format for exchanging maps between different systems. This article illustrates how metric maps, topological maps, and their combinations can be represented according to the standard.

A study of the influence of teleoperation in the remote driving of robots

Storms, J. & Tilbury, D. J, A New Difficulty Index for Teleoperated Robots Driving through Obstacles, Intell Robot Syst (2018) 90: 147, DOI: 10.1007/s10846-017-0651-1.

Teleoperation allows humans to reach environments that would otherwise be too difficult or dangerous. The distance between the human operator and remote robot introduces a number of issues that can negatively impact system performance including degraded and delayed information exchange between the robot and human. Some operation scenarios and environments can tolerate these degraded conditions, while others cannot. However, little work has been done to investigate how factors such as communication delay, automation, and environment characteristics interact to affect teleoperation system performance. This paper presents results from a user study analyzing the effects of teleoperation factors including communication delay, autonomous assistance, and environment layout on user performance. A mobile robot driving task is considered in which subjects drive a robot to a goal location around obstacles as quickly (minimize time) and safely (avoid collisions) as possible. An environment difficulty index (ID) is defined in the paper and is shown to be able to predict the average time it takes for the human to drive the robot to a goal location with different obstacle configurations. The ID is also shown to predict the path chosen by the human better than travel time along that path.

Multi-agent reinfocerment learning for working with high-dimensional spaces

David L. Leottau, Javier Ruiz-del-Solar, Robert Babuška, Decentralized Reinforcement Learning of Robot Behaviors, Artificial Intelligence, Volume 256, 2018, Pages 130-159, DOI: 10.1016/j.artint.2017.12.001.

A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In addition to proposing this methodology, three specific multi agent DRL approaches are considered: DRL-Independent, DRL Cooperative-Adaptive (CA), and DRL-Lenient. These approaches are validated and analyzed with an extensive empirical study using four different problems: 3D Mountain Car, SCARA Real-Time Trajectory Generation, Ball-Dribbling in humanoid soccer robotics, and Ball-Pushing using differential drive robots. The experimental validation provides evidence that DRL implementations show better performances and faster learning times than their centralized counterparts, while using less computational resources. DRL-Lenient and DRL-CA algorithms achieve the best final performances for the four tested problems, outperforming their DRL-Independent counterparts. Furthermore, the benefits of the DRL-Lenient and DRL-CA are more noticeable when the problem complexity increases and the centralized scheme becomes intractable given the available computational resources and training time.

Using interactive reinforcement learning with the advisor being another reinforcement learning agent

Francisco Cruz, Sven Magg, Yukie Nagai & Stefan Wermter, Improving interactive reinforcement learning: What makes a good teacher?, Connection Science, DOI: 10.1080/09540091.2018.1443318.

Interactive reinforcement learning (IRL) has become an important apprenticeship approach to speed up convergence in classic reinforcement learning (RL) problems. In this regard, a variant of IRL is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using RL methods to afterward becoming an advisor for other learner-agents. In this work, we analyse internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behaviour in terms of the state visit frequency of the learner-agents. Moreover, we analyse system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

using fuzzy Petri nets for mobile robot navigation

Seung-yun Kim, Yilin Yang, A self-navigating robot using Fuzzy Petri nets, Robotics and Autonomous Systems, Volume 101, 2018, Pages 153-165, DOI: 10.1016/j.robot.2017.11.008.

Petri nets (PNs) are capable of modeling nearly any conceivable system and can provide a better understanding of the idealized action sequence in which to most effectively describe or execute said system through their powerful analytical capabilities. However, because real world instances are rarely as consistent and ideal as simulated models, basic PN modeling and simulation properties may be insufficient in practical application. We remedy this through specialization in Fuzzy Petri nets (FPNs). Fuzzy logic is incorporated to better model a self-navigating robot algorithm, thanks to its versatile multi-valued logic reasoning. By using FPNs, it is possible to simulate, assess, and communicate the process and reasoning of the navigational algorithm and apply it to real world programming. In this paper, we propose a series of specific fuzzy algorithms intended to be implemented in concert on a mobile robot platform in order to optimize the sequence of actions needed for a given task, primarily the navigation of an unknown maze. A set of varied maze configurations were developed and simulated as PN and FPN models, providing a testing environment to examine the efficiency of several methodologies. Five methods, including an original proposal in this paper, were compared across 30,000 simulations, evaluating in particular performance in processing cost in time. Our experiments concluded with results suggesting a very competitive task completion time at a considerable fraction in processing cost compared to the closest performing alternatives.

Hybridizing RRT with deliberative path planning to improve performance

Dong, Y., Camci, E. & Kayacan, Faster RRT-based Nonholonomic Path Planning in 2D Building Environments Using Skeleton-constrained Path Biasing, J Intell Robot Syst (2018) 89: 387, DOI: 10.1007/s10846-017-0567-9.

This paper presents a faster RRT-based path planning approach for regular 2-dimensional (2D) building environments. To minimize the planning time, we adopt the idea of biasing the RRT tree-growth in more focused ways. We propose to calculate the skeleton of the 2D environment first, then connect a geometrical path on the skeleton, and grow the RRT tree via the seeds generated locally along this path. We conduct batched simulations to find the universal parameters in manipulating the seeds generation. We show that the proposed skeleton-biased locally-seeded RRT (skilled-RRT) is faster than the other baseline planners (RRT, RRT*, A*-RRT, Theta*-RRT, and MARRT) through experimental tests using different vehicles in different 2D building environments. Given mild assumptions of the 2D environments, we prove that the proposed approach is probabilistically complete. We also present an application of the skilled-RRT for unmanned ground vehicle. Compared to the other baseline algorithms (Theta*-RRT and MARRT), we show the applicability and fast planning of the skilled-RRT in real environment.

Using active perception for object classification

Patten, T., Martens, W. & Fitch, R., Monte Carlo planning for active object classification, Auton Robot (2018) 42: 391, DOI: 10.1007/s10514-017-9626-0.

Classifying objects in complex unknown environments is a challenging problem in robotics and is fundamental in many applications. Modern sensors and sophisticated perception algorithms extract rich 3D textured information, but are limited to the data that are collected from a given location or path. We are interested in closing the loop around perception and planning, in particular to plan paths for better perceptual data, and focus on the problem of planning scanning sequences to improve object classification from range data. We formulate a novel time-constrained active classification problem and propose solution algorithms that employ a variation of Monte Carlo tree search to plan non-myopically. Our algorithms use a particle filter combined with Gaussian process regression to estimate joint distributions of object class and pose. This estimator is used in planning to generate a probabilistic belief about the state of objects in a scene, and also to generate beliefs for predicted sensor observations from future viewpoints. These predictions consider occlusions arising from predicted object positions and shapes. We evaluate our algorithms in simulation, in comparison to passive and greedy strategies. We also describe similar experiments where the algorithms are implemented online, using a mobile ground robot in a farm environment. Results indicate that our non-myopic approach outperforms both passive and myopic strategies, and clearly show the benefit of active perception for outdoor object classification.

Using two different environment representations: a detailed one for SLAM, a coarse one for selecting actions for active perception

Nelson, E., Corah, M. & Michael, N., Environment model adaptation for mobile robot exploration,Auton Robot (2018) 42: 257, DOI: 10.1007/s10514-017-9669-2.

In this work, we propose a methodology to adapt a mobile robot’s environment model during exploration as a means of decreasing the computational complexity associated with information metric evaluation and consequently increasing the speed at which the system is able to plan actions and travel through an unknown region given finite computational resources. Recent advances in exploration compute control actions by optimizing information-theoretic metrics on the robot’s map. These metrics are generally computationally expensive to evaluate, limiting the speed at which a robot is able to explore. To reduce computational cost, we propose keeping two representations of the environment: one full resolution representation for planning and collision checking, and another with a coarse resolution for rapidly evaluating the informativeness of planned actions. To generate the coarse representation, we employ the Principal of Relevant Information from rate distortion theory to compress a robot’s occupancy grid map. We then propose a method for selecting a coarse representation that sacrifices a minimal amount of information about expected future sensor measurements using the Information Bottleneck Method. We outline an adaptive strategy that changes the robot’s environment representation in response to its surroundings to maximize the computational efficiency of exploration. On computationally constrained systems, this reduction in complexity enables planning over longer predictive horizons, leading to faster navigation. We simulate and experimentally evaluate mutual information based exploration through cluttered indoor environments with exploration rates that adapt based on environment complexity leading to an order-of-magnitude increase in the maximum rate of exploration in contrast to non-adaptive techniques given the same finite computational resources.

A novel approach to use POMDP in practical active perception, where rewards are needed to penalize uncertainty and therefore reomve the piecewise-linear and convex property of the value function

Satsangi, Y., Whiteson, S., Oliehoek, F.A. et al., Exploiting submodular value functions for scaling up active perception, Auton Robot (2018) 42: 209, DOI: 10.1007/s10514-017-9666-5.

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze ρ POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a ρ POMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given ρ POMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and ρ POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.