Tag Archives: Probability Distribution Estimation

A new framework for fitting jump models

Alberto Bemporad, Valentina Breschi, Dario Piga, Stephen P. Boyd, Fitting jump models, Automatica, Volume 96, 2018, Pages 11-21, DOI: 10.1016/j.automatica.2018.06.022.

We describe a new framework for fitting jump models to a sequence of data. The key idea is to alternate between minimizing a loss function to fit multiple model parameters, and minimizing a discrete loss function to determine which set of model parameters is active at each data point. The framework is quite general and encompasses popular classes of models, such as hidden Markov models and piecewise affine models. The shape of the chosen loss functions to minimize determines the shape of the resulting jump model.

Optimal routing in communication networks with probabilistic models of delays that are acquired on-line

M. S. Talebi, Z. Zou, R. Combes, A. Proutiere and M. Johansson, Stochastic Online Shortest Path Routing: The Value of Feedback, IEEE Transactions on Automatic Control, vol. 63, no. 4, pp. 915-930, DOI: 10.1109/TAC.2017.2747409.

This paper studies online shortest path routing over multihop networks. Link costs or delays are time varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. Three algorithms, with a tradeoff between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms. We also assess numerically the performance of the proposed algorithms and compare it to that of existing algorithms.

Improving the estimation of the offset parameter of heavy-tailed distributions through the injection of noise

Y. Pan, F. Duan, F. Chapeau-Blondeau and D. Abbott, Noise Enhancement in Robust Estimation of Location, IEEE Transactions on Signal Processing, vol. 66, no. 8, pp. 1953-1966, DOI: 10.1109/TSP.2018.2802463.

In this paper, we investigate the noise benefits to maximum likelihood type estimators (M-estimator) for the robust estimation of a location parameter. Two distinct noise benefits are shown to be accessible under these conditions. With symmetric heavy-tailed noise distributions, the asymptotic efficiency of the estimation can be enhanced by injecting extra noise into the M-estimators. With an asymmetric contaminated noise model having a convex cumulative distribution function, we demonstrate that addition of noise can reduce the maximum bias of the median estimator. These findings extend the analysis of stochastic resonance effects for noise-enhanced signal and information processing.

How very simple digital signal processing techniques, such as numerical filtering and linear interpolation, may provide PDF estimates with improved statistical properties over the histogram and close to, or better than, what can be obtained using Kernel based estimators

P. Carbone, D. Petri and K. Barbé, “Nonparametric Probability Density Estimation via Interpolation Filtering,” in IEEE Transactions on Instrumentation and Measurement, vol. 66, no. 4, pp. 681-690, April 2017.DOI: 10.1109/TIM.2017.2657398.

In this paper, we discuss nonparametric estimation of the probability density function (PDF) of a univariate random variable. This problem has been the subject of a vast amount of scientific literature in many domains, while statisticians are mainly interested in the analysis of the properties of proposed estimators, and engineers treat the histogram as a ready-to-use tool for a data set analysis. By considering histogram data as a numerical sequence, a simple approach for PDF estimation is presented in this paper. It is based on basic notions related to the reconstruction of a continuous-time signal from a sequence of samples. When estimating continuous PDFs, it is shown that the proposed approach is as accurate as kernel-based estimators, widely adopted in the statistical literature. Conversely, it can provide better accuracy when the PDF to be estimated exhibits a discontinuous behavior. The main statistical properties of the proposed estimators are derived and then verified by simulations related to the common cases of normal and uniform density functions. The obtained results are also used to derive optimal, i.e., minimum integral of the mean square error, estimators.

Calculating (experimental) probability distributions of the execution of sequential software

Laurent David, Isabelle Puaut, Static Determination of Probabilistic Execution Times, Proceedings of the 12th 16th Euromicro Conference on Real-Time Systems (ECRTS’04). Link.

Most previous research done in probabilistic schedulability analysis assumes a known distribution of execution times for each task of a real-time application. This is however not trivial to determine it with a high level of confidence. Methods based on measurements are often biased since not in general exhaustive on all the possible execution paths, whereas methods based on static analysis are mostly Worst-Case Execution Time – WCET – oriented. Using static analysis, this work proposes a method to obtain probabilistic distributions of execution times. It assumes that the given real time application is divided into multiple tasks, whose source code is known. Ignoring in this paper hardware considerations and based only on the source code of the tasks, the proposed technique allows designers to associate to any execution path an execution time and a probability to go through this path. A source code example is presented to illustrate the method.

Robust Estimation of Unbalanced Mixture Models on Samples with Outliers

Galimzianova, A.; Pernus, F.; Likar, B.; Spiclin, Z., Robust Estimation of Unbalanced Mixture Models on Samples with Outliers, in Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.11, pp.2273-2285, Nov. 1 2015, DOI: 10.1109/TPAMI.2015.2404835.

Mixture models are often used to compactly represent samples from heterogeneous sources. However, in real world, the samples generally contain an unknown fraction of outliers and the sources generate different or unbalanced numbers of observations. Such unbalanced and contaminated samples may, for instance, be obtained by high density data sensors such as imaging devices. Estimation of unbalanced mixture models from samples with outliers requires robust estimation methods. In this paper, we propose a novel robust mixture estimator incorporating trimming of the outliers based on component-wise confidence level ordering of observations. The proposed method is validated and compared to the state-of-the-art FAST-TLE method on two data sets, one consisting of synthetic samples with a varying fraction of outliers and a varying balance between mixture weights, while the other data set contained structural magnetic resonance images of the brain with tumors of varying volumes. The results on both data sets clearly indicate that the proposed method is capable to robustly estimate unbalanced mixtures over a broad range of outlier fractions. As such, it is applicable to real-world samples, in which the outlier fraction cannot be estimated in advance.

On the not-so-domain-generic nature of statistical learning in the human brain

Ram Frost, Blair C. Armstrong, Noam Siegelman, Morten H. Christiansen, 2015, Domain generality versus modality specificity: the paradox of statistical learning, Trends in Cognitive Sciences, Volume 19, Issue 3, March 2015, Pages 117-125, DOI: 10.1016/j.tics.2014.12.010.

Statistical learning (SL) is typically considered to be a domain-general mechanism by which cognitive systems discover the underlying distributional properties of the input. However, recent studies examining whether there are commonalities in the learning of distributional information across different domains or modalities consistently reveal modality and stimulus specificity. Therefore, important questions are how and why a hypothesized domain-general learning mechanism systematically produces such effects. Here, we offer a theoretical framework according to which SL is not a unitary mechanism, but a set of domain-general computational principles that operate in different modalities and, therefore, are subject to the specific constraints characteristic of their respective brain regions. This framework offers testable predictions and we discuss its computational and neurobiological plausibility.

Estimating an empirical distribution from a number of estimates distributed among several agents, minimizing the information exchange between the agents

Sarwate, A.D.; Javidi, T., Distributed Learning of Distributions via Social Sampling, Automatic Control, IEEE Transactions on , vol.60, no.1, pp.34,45, Jan. 2015, DOI: 10.1109/TAC.2014.2329611

A protocol for distributed estimation of discrete distributions is proposed. Each agent begins with a single sample from the distribution, and the goal is to learn the empirical distribution of the samples. The protocol is based on a simple message-passing model motivated by communication in social networks. Agents sample a message randomly from their current estimates of the distribution, resulting in a protocol with quantized messages. Using tools from stochastic approximation, the algorithm is shown to converge almost surely. Examples illustrate three regimes with different consensus phenomena. Simulations demonstrate this convergence and give some insight into the effect of network topology.