Value iteration applied to continuous LTI systems control

Tao Bian, Zhong-Ping Jiang, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica, Volume 71, September 2016, Pages 348-360, ISSN 0005-1098, DOI: 10.1016/j.automatica.2016.05.003.

This paper presents a novel non-model-based, data-driven adaptive optimal controller design for linear continuous-time systems with completely unknown dynamics. Inspired by the stochastic approximation theory, a continuous-time version of the traditional value iteration (VI) algorithm is presented with rigorous convergence analysis. This VI method is crucial for developing new adaptive dynamic programming methods to solve the adaptive optimal control problem and the stochastic robust optimal control problem for linear continuous-time systems. Fundamentally different from existing results, the a priori knowledge of an initial admissible control policy is no longer required. The efficacy of the proposed methodology is illustrated by two examples and a brief comparative study between VI and earlier policy-iteration methods.

A variant of particle filters that uses feedback to model how particles move towards the real posterior

T. Yang, P.~G. Mehta, S.~P. Meyn, Feedback particle filter, IEEE Transactions on Automatic Control, 58 (10) (2013), pp. 2465â–2480, DOI: 10.1109/TAC.2013.2258825.

The feedback particle filter introduced in this paper is a new approach to approximate nonlinear filtering, motivated by techniques from mean-field game theory. The filter is defined by an ensemble of controlled stochastic systems (the particles). Each particle evolves under feedback control based on its own state, and features of the empirical distribution of the ensemble. The feedback control law is obtained as the solution to an optimal control problem, in which the optimization criterion is the Kullback-Leibler divergence between the actual posterior, and the common posterior of any particle. The following conclusions are obtained for diffusions with continuous observations: 1) The optimal control solution is exact: The two posteriors match exactly, provided they are initialized with identical priors. 2) The optimal filter admits an innovation error-based gain feedback structure. 3) The optimal feedback gain is obtained via a solution of an Euler-Lagrange boundary value problem; the feedback gain equals the Kalman gain in the linear Gaussian case. Numerical algorithms are introduced and implemented in two general examples, and a neuroscience application involving coupled oscillators. In some cases it is found that the filter exhibits significantly lower variance when compared to the bootstrap particle filter.

Integration of the ICP algorithm with a Kalman filter to improve relative localization, with a good state-of-the-art of ICP algorithms

F. Aghili and C. Y. Su, “Robust Relative Navigation by Integration of ICP and Adaptive Kalman Filter Using Laser Scanner and IMU,” in IEEE/ASME Transactions on Mechatronics, vol. 21, no. 4, pp. 2015-2026, Aug. 2016.DOI: 10.1109/TMECH.2016.2547905.

This paper presents a robust six-degree-of-freedom relative navigation by combining the iterative closet point (ICP) registration algorithm and a noise-adaptive Kalman filter in a closed-loop configuration together with measurements from a laser scanner and an inertial measurement unit (IMU). In this approach, the fine-alignment phase of the registration is integrated with the filter innovation step for estimation correction, while the filter estimate propagation provides the coarse alignment needed to find the corresponding points at the beginning of ICP iteration cycle. The convergence of the ICP point matching is monitored by a fault-detection logic, and the covariance associated with the ICP alignment error is estimated by a recursive algorithm. This ICP enhancement has proven to improve robustness and accuracy of the pose-tracking performance and to automatically recover correct alignment whenever the tracking is lost. The Kalman filter estimator is designed so as to identify the required parameters such as IMU biases and location of the spacecraft center of mass. The robustness and accuracy of the relative navigation algorithm is demonstrated through a hardware-in-the loop simulation setting, in which actual vision data for the relative navigation are generated by a laser range finder scanning a spacecraft mockup attached to a robotic motion simulator.

Massive parallelization of POMDPs with a very good state-of-the-art review

Taekhee Lee, Young J. Kim (2015), Massively parallel motion planning algorithms under uncertainty using POMDP , The International Journal of Robotics Research, Vol 35, Issue 8, pp. 928 – 942, DOI: 10.1177/0278364915594856.

We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to utilize the massive data parallelism available on modern GPUs. Our GPU-based method uses the two workload distribution techniques, compute/data interleaving and workload balancing, in order to obtain the maximum parallel performance at the highest level. Here we also present a CPU–GPU hybrid method that takes advantage of both CPU and GPU parallelism in order to solve highly complex POMDP planning problems. The CPU is responsible for data preparation, while the GPU performs Monte Cacrlo simulations; these operations are performed concurrently using the compute/data overlap technique between the CPU and GPU. To the best of the authors’ knowledge, our algorithms are the first parallel algorithms that efficiently execute POMDP in a massively parallel fashion utilizing the GPU or a hybrid of the GPU and CPU. Our algorithms outperform the existing CPU-based algorithm by a factor of 75–99 based on the chosen benchmark.

Combining efficiently symbolic planning with geometric planning

Fabien Lagriffoul, Benjamin Andres (2016), Combining task and motion planning: A culprit detection problem , The International Journal of Robotics Research, Vol 35, Issue 8, pp. 890 – 927, DOI: 10.1177/0278364915619022.

Solving problems combining task and motion planning requires searching across a symbolic search space and a geometric search space. Because of the semantic gap between symbolic and geometric representations, symbolic sequences of actions are not guaranteed to be geometrically feasible. This compels us to search in the combined search space, in which frequent backtracks between symbolic and geometric levels make the search inefficient. We address this problem by guiding symbolic search with rich information extracted from the geometric level through culprit detection mechanisms.

A novel clock synchronization architecture for networked systems based on forcing the synchronization, with a nice summary of uses of clock synchronization and of existing synchronization architectures

S. Bolognani, R. Carli, E. Lovisari and S. Zampieri, “A Randomized Linear Algorithm for Clock Synchronization in Multi-Agent Systems,” in IEEE Transactions on Automatic Control, vol. 61, no. 7, pp. 1711-1726, July 2016. DOI: 10.1109/TAC.2015.2479136.

A broad family of randomized clock synchronization protocols based on a second order consensus algorithm is proposed. Under mild conditions on the graph connectivity, it is proved that the parameters of the algorithm can always be tuned in such a way that the clock synchronization is achieved in the probabilistic mean-square sense. This family of algorithms contains, as particular cases, several known approaches which range from distributed asynchronous to hierarchical synchronous protocols. This is illustrated by specializing the algorithm for the well-known broadcast and gossip scenarios in wireless communications, and for the standard hierarchical protocol used in the context of wired communications in data networks. In these cases, we show how the feasible range for the algorithm parameters can be explicitly computed. Finally, the performance of this strategy is validated by actual implementation in a real testbed and by numerical simulations.

Interesting approach for communicating robots: through the passive recognition of others patterns of motion

Barnali Das, Micael S. Couceiro, Patricia A. Vargas, MRoCS: A new multi-robot communication system based on passive action recognition, Robotics and Autonomous Systems, Volume 82, August 2016, Pages 46-60, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.04.002.

Multi-robot search-and-rescue missions often face major challenges in adverse environments due to the limitations of traditional implicit and explicit communication. This paper proposes a novel multi-robot communication system (MRoCS), which uses a passive action recognition technique that overcomes the shortcomings of traditional models. The proposed MRoCS relies on individual motion, by mimicking the waggle dance of honey bees and thus forming and recognising different patterns accordingly. The system was successfully designed and implemented in simulation and with real robots. Experimental results show that, the pattern recognition process successfully reported high sensitivity with good precision in all cases for three different patterns thus corroborating our hypothesis.

A formal study of the guarantees that deep neural network offer for classification

R. Giryes, G. Sapiro and A. M. Bronstein, “Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?,” in IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3444-3457, July1, 1 2016. DOI: 10.1109/TSP.2016.2546221.

Three important properties of a classification machinery are i) the system preserves the core information of the input data; ii) the training examples convey information about unseen data; and iii) the system is able to treat differently points from different classes. In this paper, we show that these fundamental properties are satisfied by the architecture of deep neural networks. We formally prove that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data. Similar points at the input of the network are likely to have a similar output. The theoretical analysis of deep networks here presented exploits tools used in the compressed sensing and dictionary learning literature, thereby making a formal connection between these important topics. The derived results allow drawing conclusions on the metric learning properties of the network and their relation to its structure, as well as providing bounds on the required size of the training set such that the training examples would represent faithfully the unseen data. The results are validated with state-of-the-art trained networks.

A new theoretical framework for modeling concepts that allows them to combine reflecting the way humans do, with a good related-work on other concept frameworks in AI

Martha Lewis, Jonathan Lawry, Hierarchical conceptual spaces for concept combination, Artificial Intelligence, Volume 237, August 2016, Pages 204-227, ISSN 0004-3702, DOI: 10.1016/j.artint.2016.04.008.

We introduce a hierarchical framework for conjunctive concept combination based on conceptual spaces and random set theory. The model has the flexibility to account for composition of concepts at various levels of complexity. We show that the conjunctive model includes linear combination as a special case, and that the more general model can account for non-compositional behaviours such as overextension, non-commutativity, preservation of necessity and impossibility of attributes and to some extent, attribute loss or emergence. We investigate two further aspects of human concept use, the conjunction fallacy and the “guppy effect”.

Interesting hypothesis about how cognitive abilities can be modelled with closed control loops that run in parallel -using hierarchies of abstraction and prediction-, traditionally used just for low-level behaviours

Giovanni Pezzulo, Paul Cisek, Navigating the Affordance Landscape: Feedback Control as a Process Model of Behavior and Cognition, Trends in Cognitive Sciences, Volume 20, Issue 6, June 2016, Pages 414-424, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.03.013.

We discuss how cybernetic principles of feedback control, used to explain sensorimotor behavior, can be extended to provide a foundation for understanding cognition. In particular, we describe behavior as parallel processes of competition and selection among potential action opportunities (‘affordances’) expressed at multiple levels of abstraction. Adaptive selection among currently available affordances is biased not only by predictions of their immediate outcomes and payoffs but also by predictions of what new affordances they will make available. This allows animals to purposively create new affordances that they can later exploit to achieve high-level goals, resulting in intentional action that links across multiple levels of control. Finally, we discuss how such a ‘hierarchical affordance competition’ process can be mapped to brain structure.