Category Archives: Machine Learning

A new method of clustering of data with many advantages w.r.t. others

A. Sharma, K. A. Boroevich, D. Shigemizu, Y. Kamatani, M. Kubo and T. Tsunoda, “Hierarchical Maximum Likelihood Clustering Approach,” in IEEE Transactions on Biomedical Engineering, vol. 64, no. 1, pp. 112-122, Jan. 2017. DOI: 10.1109/TBME.2016.2542212.

In this paper, we focused on developing a clustering approach for biological data. In many biological analyses, such as multiomics data analysis and genome-wide association studies analysis, it is crucial to find groups of data belonging to subtypes of diseases or tumors. Methods: Conventionally, the k-means clustering algorithm is overwhelmingly applied in many areas including biological sciences. There are, however, several alternative clustering algorithms that can be applied, including support vector clustering. In this paper, taking into consideration the nature of biological data, we propose a maximum likelihood clustering scheme based on a hierarchical framework. Results: This method can perform clustering even when the data belonging to different groups overlap. It can also perform clustering when the number of samples is lower than the data dimensionality. Conclusion: The proposed scheme is free from selecting initial settings to begin the search process. In addition, it does not require the computation of the first and second derivative of likelihood functions, as is required by many other maximum likelihood-based methods. Significance: This algorithm uses distribution and centroid information to cluster a sample and was applied to biological data. A MATLAB implementation of this method can be downloaded from the web link http://www.riken.jp/en/research/labs/ims/med_sci_math/.

Robust Estimation of Unbalanced Mixture Models on Samples with Outliers

Galimzianova, A.; Pernus, F.; Likar, B.; Spiclin, Z., Robust Estimation of Unbalanced Mixture Models on Samples with Outliers, in Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.11, pp.2273-2285, Nov. 1 2015, DOI: 10.1109/TPAMI.2015.2404835.

Mixture models are often used to compactly represent samples from heterogeneous sources. However, in real world, the samples generally contain an unknown fraction of outliers and the sources generate different or unbalanced numbers of observations. Such unbalanced and contaminated samples may, for instance, be obtained by high density data sensors such as imaging devices. Estimation of unbalanced mixture models from samples with outliers requires robust estimation methods. In this paper, we propose a novel robust mixture estimator incorporating trimming of the outliers based on component-wise confidence level ordering of observations. The proposed method is validated and compared to the state-of-the-art FAST-TLE method on two data sets, one consisting of synthetic samples with a varying fraction of outliers and a varying balance between mixture weights, while the other data set contained structural magnetic resonance images of the brain with tumors of varying volumes. The results on both data sets clearly indicate that the proposed method is capable to robustly estimate unbalanced mixtures over a broad range of outlier fractions. As such, it is applicable to real-world samples, in which the outlier fraction cannot be estimated in advance.

On the not-so-domain-generic nature of statistical learning in the human brain

Ram Frost, Blair C. Armstrong, Noam Siegelman, Morten H. Christiansen, 2015, Domain generality versus modality specificity: the paradox of statistical learning, Trends in Cognitive Sciences, Volume 19, Issue 3, March 2015, Pages 117-125, DOI: 10.1016/j.tics.2014.12.010.

Statistical learning (SL) is typically considered to be a domain-general mechanism by which cognitive systems discover the underlying distributional properties of the input. However, recent studies examining whether there are commonalities in the learning of distributional information across different domains or modalities consistently reveal modality and stimulus specificity. Therefore, important questions are how and why a hypothesized domain-general learning mechanism systematically produces such effects. Here, we offer a theoretical framework according to which SL is not a unitary mechanism, but a set of domain-general computational principles that operate in different modalities and, therefore, are subject to the specific constraints characteristic of their respective brain regions. This framework offers testable predictions and we discuss its computational and neurobiological plausibility.