Abstract:
In this paper we introduce a study on the use of the unsupervised representation learning on biomedical data i.e. on Growth weight data and Wisconsin Diagnostic Breast Cancer obtaining good performances in terms of clustering In this study, we propose an adaptation of the unsupervised topological learning to deals with biomedical datasets based on a new approximation strategy to visualize high dimensional datasets. In data containing high-dimensional data manifold, the level of the discrepancy changes depending on the dimension of intrinsic data manifold. Then the strength of the repelling power is dependent of dataset. The proposed approach is based on t-SNE (Stochastic Neighbor Embedding) dimensionality reduction method with a different inhomogenous approximation strategy of the t-Distribution. In order to avoid the exponential computation we propose an inhomogenous approximation of the t-Distribution having the precision order of 10−3. By using this inhomogenous approximation we allow to optimize approximately the t-Distribution with respect to the number of degree of freedom and also to reduce the computational time. We illustrate the power of the proposed approach with two bio-medical real datasets and the obtained results outperform classical SNE and t-SNE methods.