Category Archives: Probability And Statistics

A universal approximator for the value function in continuous-state VI

William B. Haskell; Rahul Jain; Hiteshi Sharma; Pengqian Yu, TA Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs, IEEE Transactions on Automatic Control ( Volume: 65, Issue: 1, Jan. 2020), DOI: 10.1109/TAC.2019.2907414.

We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The “empirical” nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach.

Nice related work on change-point detection and a novel algorithm for off-line detection of abrupt changes in multivariate signals

Charles Truong; Laurent Oudre; Nicolas Vayatis, Greedy Kernel Change-Point Detection, IEEE Transactions on Signal Processing ( Volume: 67, Issue: 24, Dec.15, 15 2019), DOI: 10.1109/TSP.2019.2953670.

We consider the problem of detecting abrupt changes in the underlying stochastic structure of multivariate signals. A novel non-parametric and model-free off-line change-point detection method based on a kernel mapping is presented. This approach is sequential and alternates between two steps: a greedy detection to estimate a new breakpoint and a projection to remove its contribution to the signal. The resulting algorithm is able to segment time series for which no accurate model is available: it is computationally more efficient than exact kernel change-point detection and more precise than window-based approximations. The proposed method also offers some theoretical consistency properties. For the special case of a linear kernel, an even faster implementation is provided. The proposed strategy is compared to standard parametric and non-parametric procedures on a real-world data set composed of 262 accelerometer recordings.

Estimating aging of integrated circuits with machine learning

Ke Huang; Xinqiao Zhang; Naghmeh Karimi, Real-Time Prediction for IC Aging Based on Machine Learning, IEEE Transactions on Instrumentation and Measurement, Volume: 68, Issue: 12, Dec. 2019, DOI: 10.1109/TIM.2019.2899477.

Estimating the aging-related degradation and failure of nanoscale integrated circuits (ICs), before they actually occur, is crucial for developing aging prevention/mitigation actions and in turn avoiding unexpected in-field circuit failures. Real-time monitoring of IC operating conditions can be efficiently used for predicting aging degradation and in turn timing failures caused by device aging. The existing approaches only take some specific operating conditions (e.g., workload or temperature) into account. In this paper, we propose a novel method for real-time IC aging prediction by extending the prediction schemes to a comprehensive model which takes into account any time-variant dynamic operating conditions relevant to aging prediction. Using a machine learning prediction model and the notion of equivalent aging time, we show that our approach outperforms the existing methods in terms of aging-prediction accuracy under different scenarios of time-variant operating conditions.

On the effects of large variances in the transition function for Q-learning

D. Lee and W. B. Powell, Bias-Corrected Q-Learning With Multistate Extension. IEEE Transactions on Automatic Control, vol. 64, no. 10, pp. 4011-4023, DOI: 10.1109/TAC.2019.2912443.

Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.

A nice (short) survey of deep RL

Matthew Botvinick, Sam Ritter, Jane X. Wang, Zeb Kurth-Nelson, Charles Blundell, Demis Hassabis, Reinforcement Learning, Fast and Slow, Trends in Cognitive Sciences, Volume 23, Issue 5, 2019, Pages 408-422 DOI: 10.1016/j.tics.2019.02.006.

Deep reinforcement learning (RL) methods have driven impressive advances in artificial intelligence in recent years, exceeding human performance in domains ranging from Atari to Go to no-limit poker. This progress has drawn the attention of cognitive scientists interested in understanding human learning. However, the concern has been raised that deep RL may be too sample-inefficient – that is, it may simply be too slow – to provide a plausible model of how humans learn. In the present review, we counter this critique by describing recently developed techniques that allow deep RL to operate more nimbly, solving problems much more quickly than previous methods. Although these techniques were developed in an AI context, we propose that they may have rich implications for psychology and neuroscience. A key insight, arising from these AI methods, concerns the fundamental connection between fast RL and slower, more incremental forms of learning.

An interesting review of criticisms of deep learning in cognitive science

Radoslaw M. Cichy, Daniel Kaiser, Deep Neural Networks as Scientific Models, Trends in Cognitive Sciences, Volume 23, Issue 4, 2019, Pages 305-317, DOI: 10.1016/j.tics.2019.01.009.

Artificial deep neural networks (DNNs) initially inspired by the brain enable computers to solve cognitive tasks at which humans excel. In the absence of explanations for such cognitive phenomena, in turn cognitive scientists have started using DNNs as models to investigate biological cognition and its neural basis, creating heated debate. Here, we reflect on the case from the perspective of philosophy of science. After putting DNNs as scientific models into context, we discuss how DNNs can fruitfully contribute to cognitive science. We claim that beyond their power to provide predictions and explanations of cognitive phenomena, DNNs have the potential to contribute to an often overlooked but ubiquitous and fundamental use of scientific models: exploration.

A new framework for fitting jump models

Alberto Bemporad, Valentina Breschi, Dario Piga, Stephen P. Boyd, Fitting jump models, Automatica, Volume 96, 2018, Pages 11-21, DOI: 10.1016/j.automatica.2018.06.022.

We describe a new framework for fitting jump models to a sequence of data. The key idea is to alternate between minimizing a loss function to fit multiple model parameters, and minimizing a discrete loss function to determine which set of model parameters is active at each data point. The framework is quite general and encompasses popular classes of models, such as hidden Markov models and piecewise affine models. The shape of the chosen loss functions to minimize determines the shape of the resulting jump model.

Regression to help in finding the optimal policy in MDPs based on duality theory

H. Zhu, F. Ye and E. Zhou, Solving the Dual Problems of Dynamic Programs via Regression, IEEE Transactions on Automatic Control, vol. 63, no. 5, pp. 1340-1355, DOI: 10.1109/TAC.2017.2747405.

In recent years, information relaxation and duality in dynamic programs have been studied extensively, and the resulted primal-dual approach has become a powerful procedure in solving dynamic programs by providing lower-upper bounds on the optimal value function. Theoretically, with the so-called value-based optimal dual penalty, the optimal value function could be recovered exactly via strong duality. However, in practice, obtaining tight dual bounds usually requires good approximations of the optimal dual penalty, which could be time consuming if analytical computation is not possible and nested simulation has to be used to estimate the conditional expectations inside the dual penalty. In this paper, we will develop a framework of a regression approach to approximating the optimal dual penalty in a nonnested manner, by exploring the structure of the function space consisting of all feasible dual penalties. The resulted approximations maintain to be feasible dual penalties, and thus yielding valid dual bounds on the optimal value function. We show that the proposed framework is computationally efficient, and the resulted dual penalties lead to numerically tractable dual problems. Finally, we apply the framework to a high-dimensional dynamic trading problem to demonstrate its effectiveness in solving the dual problems of complex dynamic programs.

POMDPs aware of the data association problem

Shashank Pathak, Antony Thomas, and Vadim Indelman, A unified framework for data association aware robust belief space planning and perception, The International Journal of Robotics Research Vol 37, Issue 2-3, pp. 287 – 315, DOI: 10.1177/0278364918759606.

We develop a belief space planning approach that advances the state of the art by incorporating reasoning about data association within planning, while considering additional sources of uncertainty. Existing belief space planning approaches typically assume that data association is given and perfect, an assumption that can be harder to justify during operation in the presence of localization uncertainty, or in ambiguous and perceptually aliased environments. By contrast, our data association aware belief space planning (DA-BSP) approach explicitly reasons about data association within belief evolution owing to candidate actions, and as such can better accommodate these challenging real-world scenarios. In particular, we show that, owing to perceptual aliasing, a posterior belief can become a mixture of probability distribution functions and design cost functions, which measure the expected level of ambiguity and posterior uncertainty given candidate action. Furthermore, we also investigate more challenging situations, such as when prior belief is multimodal and when data association aware planning is performed over several look-ahead steps. Our framework models the belief as a Gaussian mixture model. Another unique aspect of this approach is that the number of components of this Gaussian mixture model can increase as well as decrease, thereby reflecting reality more accurately. Using these and standard costs (e.g. control penalty, distance to goal) within the objective function yields a general framework that reliably represents action impact and, in particular, is capable of active disambiguation. Our approach is thus applicable to both robust perception in a passive setting with data given a priori and in an active setting, such as in autonomous navigation in perceptually aliased environments. We demonstrate key aspects of DA-BSP in a theoretical example, in a Gazebo-based realistic simulation, and also on the real robotic platform using a Pioneer robot in an office environment.

Using EKF estimation in a PI controller for improving its performance under noise

Y. Zhou, Q. Zhang, H. Wang, P. Zhou and T. Chai, EKF-Based Enhanced Performance Controller Design for Nonlinear Stochastic Systems, IEEE Transactions on Automatic Control, vol. 63, no. 4, pp. 1155-1162, DOI: 10.1109/TAC.2017.2742661.

In this paper, a novel control algorithm is presented to enhance the performance of the tracking property for a class of nonlinear and dynamic stochastic systems subjected to non-Gaussian noises. Although the existing standard PI controller can be used to obtain the basic tracking of the systems, the desired tracking performance of the stochastic systems is difficult to achieve due to the random noises. To improve the tracking performance, an enhanced performance loop is constructed using the EKF-based state estimates without changing the existing closed loop with a PI controller. Meanwhile, the gain of the enhanced performance loop can be obtained based upon the entropy optimization of the tracking error. In addition, the stability of the closed loop system is analyzed in the mean-square sense. The simulation results are given to illustrate the effectiveness of the proposed control algorithm.