Using MDPs when the transition probability matrix is just partially specified, therefore getting closer to a model-free approach

Karina V. Delgado, Leliane N. de Barros, Daniel B. Dias, Scott Sanner, Real-time dynamic programming for Markov decision processes with imprecise probabilities, Artificial Intelligence, Volume 230, January 2016, Pages 192-223, ISSN 0004-3702, DOI: 10.1016/j.artint.2015.09.005.

Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-IP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.

See also:

  • Karina Valdivia Delgado, Scott Sanner, Leliane Nunes de Barros, Efficient solutions to factored MDPs with imprecise transition probabilities, Artif. Intell. 175 (9–10) (2011) 1498–1527.
  • Satia, J. K., and Lave Jr., R. E. 1970. MDPs with uncertain transition probabilities. Operations Research 21:728–740
  • White III, C. C., and El-Deib, H. K. 1994. MDPs with Imprecise Transition Probabilities. Operations Research 42(4):739–749

Nice summary of reinforcement learning in control (Adaptive Dynamic Programming) and the use of Q-learning plus NN approximators for solving a control problem under a game theory framework

Kyriakos G. Vamvoudakis, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Automatica, Volume 61, November 2015, Pages 274-281, ISSN 0005-1098, DOI: 10.1016/j.automatica.2015.08.017.

This work proposes a novel Q-learning algorithm to solve the problem of non-zero sum Nash games of linear time invariant systems with N -players (control inputs) and centralized uncertain/unknown dynamics. We first formulate the Q-function of each player as a parametrization of the state and all other the control inputs or players. An integral reinforcement learning approach is used to develop a model-free structure of N -actors/ N -critics to estimate the parameters of the N -coupled Q-functions online while also guaranteeing closed-loop stability and convergence of the control policies to a Nash equilibrium. A 4th order, simulation example with five players is presented to show the efficacy of the proposed approach.

Electronic circuit for harvesting energy autonomously in a multi-sensor device

Dias, P.C.; Morais, F.J.O.; de Morais Franca, M.B.; Ferreira, E.C.; Cabot, A.; Siqueira Dias, J.A., Autonomous Multisensor System Powered by a Solar Thermoelectric Energy Harvester With Ultralow-Power Management Circuit, in Instrumentation and Measurement, IEEE Transactions on , vol.64, no.11, pp.2918-2925, Nov. 2015, DOI: 10.1109/TIM.2015.2444253.

An autonomous multisensor system powered by an energy harvester fabricated with a flat-panel solar thermoelectric generator with an ultralow-power management circuit is presented. The multisensor system was tested in an agricultural application, where every 15 min the values of the temperature, air humidity, and solar radiation have to be measured and stored in a mass memory device (a Secure Digital card), with their respective time stamp. The energy-harvesting switching dc-dc converter is based on a low-input-voltage commercial integrated circuit (LTC3108), which charges a 1.65-F supercapacitor up to 5.0 V. A novel ultralow-power management circuit was developed to replace the internal power management circuitry of the LTC3108, and using this circuit, the operation of the system when no energy can be harvested from the environment is extended from 136 h to more than 266 h. The solar thermoelectric generator used for the energy harvesting is composed of a bismuth telluride thermoelectric generator with a 110-mV/°C Seebeck coefficient sandwiched between a 40 cm \times 40 cm anodized aluminum flat panel and an aluminum heatsink. On a sunny winter day in the southern hemisphere (12 August 2014, at Campinas, SP—Brazil, Latitude: 22° 54’), the energy supplied by the harvesting system to the supercapacitor was 7 J.

Robust Estimation of Unbalanced Mixture Models on Samples with Outliers

Galimzianova, A.; Pernus, F.; Likar, B.; Spiclin, Z., Robust Estimation of Unbalanced Mixture Models on Samples with Outliers, in Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.11, pp.2273-2285, Nov. 1 2015, DOI: 10.1109/TPAMI.2015.2404835.

Mixture models are often used to compactly represent samples from heterogeneous sources. However, in real world, the samples generally contain an unknown fraction of outliers and the sources generate different or unbalanced numbers of observations. Such unbalanced and contaminated samples may, for instance, be obtained by high density data sensors such as imaging devices. Estimation of unbalanced mixture models from samples with outliers requires robust estimation methods. In this paper, we propose a novel robust mixture estimator incorporating trimming of the outliers based on component-wise confidence level ordering of observations. The proposed method is validated and compared to the state-of-the-art FAST-TLE method on two data sets, one consisting of synthetic samples with a varying fraction of outliers and a varying balance between mixture weights, while the other data set contained structural magnetic resonance images of the brain with tumors of varying volumes. The results on both data sets clearly indicate that the proposed method is capable to robustly estimate unbalanced mixtures over a broad range of outlier fractions. As such, it is applicable to real-world samples, in which the outlier fraction cannot be estimated in advance.

Comparison of EKF and UKF for robot localization and a method of selection of a subset of the available sonar sensors

Luigi D’Alfonso, Walter Lucia, Pietro Muraca, Paolo Pugliese, Mobile robot localization via EKF and UKF: A comparison based on real data, Robotics and Autonomous Systems, Volume 74, Part A, December 2015, Pages 122-127, ISSN 0921-8890, DOI: 10.1016/j.robot.2015.07.007.

In this work we compare the performance of two well known filters for nonlinear models, the Extended Kalman Filter and the Unscented Kalman Filter, in estimating the position and orientation of a mobile robot. The two filters fuse the measurements taken by ultrasonic sensors located onboard the robot. The experimental results on real data show a substantial equivalence of the two filters, although in principle the approximating properties of the UKF are much better. A switching sensors activation policy is also devised, which allows to obtain an accurate estimate of the robot state using only a fraction of the available sensors, with a relevant saving of battery power.

One of the first thorough studies of Monte Carlo Localization with line-segment maps

Biswajit Sarkar, Surojit Saha, Prabir K. Pal, A novel method for computation of importance weights in Monte Carlo localization on line segment-based maps, Robotics and Autonomous Systems, Volume 74, Part A, December 2015, Pages 51-65, ISSN 0921-8890, DOI: 10.1016/j.robot.2015.07.001.

Monte Carlo localization is a powerful and popular approach in mobile robot localization. Line segment-based maps provide a compact and scalable representation of indoor environments for mobile robot navigation. But Monte Carlo localization has seldom been studied in the context of line segment-based maps. A key step of the approach–and one that can endow it with or rob it of the attributes of accuracy, robustness and efficiency–is the computation of the so called importance weight associated with each particle. In this paper, we propose a new method for the computation of importance weights on maps represented with line segments, and extensively study its performance in pose tracking. We also compare our method with three other methods reported in the literature and present the results and insights thus gathered. The comparative study, conducted using both simulated and real data, on maps built from real data available in the public domain clearly establish that the proposed method is more accurate, robust and efficient than the other methods.

Multi-agent Q-learning applied to the defense against DDoS attacks with some provisions for scaling

Kleanthis Malialisa, Sam Devlina & Daniel Kudenkoa, Distributed reinforcement learning for adaptive and robust network intrusion response, Connection Science, Volume 27, Issue 3, 2015, DOI: 10.1080/09540091.2015.1031082.

Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

Modelling emotions in adaptive agents through the action selection part of reinforcement learning, plus some references on the neurophysiological bases of RL and a good review of literature on emotions

Joost Broekens , Elmer Jacobs , Catholijn M. Jonker, A reinforcement learning model of joy, distress, hope and fear, Connection Science, Vol. 27, Iss. 3, 2015, DOI: 10.1080/09540091.2015.1031081.

In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, V(s), models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework – coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human–robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

A clarification and systematization of UKF

Menegaz, H.M.T.; Ishihara, J.Y.; Borges, G.A.; Vargas, A.N., A Systematization of the Unscented Kalman Filter Theory, in Automatic Control, IEEE Transactions on , vol.60, no.10, pp.2583-2598, Oct. 2015 DOI: 10.1109/TAC.2015.2404511.

In this paper, we propose a systematization of the (discrete-time) Unscented Kalman Filter (UKF) theory. We gather all available UKF variants in the literature, present corrections to theoretical inconsistencies, and provide a tool for the construction of new UKF’s in a consistent way. This systematization is done, mainly, by revisiting the concepts of Sigma-Representation, Unscented Transformation (UT), Scaled Unscented Transformation (SUT), UKF, and Square-Root Unscented Kalman Filter (SRUKF). Inconsistencies are related to 1) matching the order of the transformed covariance and cross-covariance matrices of both the UT and the SUT; 2) multiple UKF definitions; 3) issue with some reduced sets of sigma points described in the literature; 4) the conservativeness of the SUT; 5) the scaling effect of the SUT on both its transformed covariance and cross-covariance matrices; and 6) possibly ill-conditioned results in SRUKF’s. With the proposed systematization, the symmetric sets of sigma points in the literature are formally justified, and we are able to provide new consistent variations for UKF’s, such as the Scaled SRUKF’s and the UKF’s composed by the minimum number of sigma points. Furthermore, our proposed SRUKF has improved computational properties when compared to state-of-the-art methods.

Survey on Model-Driven Software Engineering for real-time embedded systems and robotics

Brugali, D., Model-Driven Software Engineering in Robotics: Models Are Designed to Use the Relevant Things, Thereby Reducing the Complexity and Cost in the Field of Robotics, in Robotics & Automation Magazine, IEEE , vol.22, no.3, pp.155-166, Sept. 2015, DOI: 10.1109/MRA.2015.2452201.

A model is an abstract representation of a real system or phenomenon [1]. The idea of a model is to capture important properties of reality and to eglect irrelevant details. The properties that are relevant and that can be neglected depend on the purpose of creating a model. A model can make a particular system or phenomenon easier to understand, quantify, visualize, simulate, or predict.