Universiteit Leiden

nl en

Proefschrift

Improved Strategies for Distance Based Clustering of Objects on Subsets of Attributes in High-Dimensional Data

This monograph focuses on clustering of objects in high-dimensional data, given the restriction that the objects do not cluster on all the attributes, not even on a single subset of attributes, but often on different subsets of attributes in the data.

Auteur
Kampert, M.M.D.
Datum
03 juli 2019
Links
Thesis in Leiden Repository

This monograph focuses on clustering of objects in high-dimensional data, given the restriction that the objects do not cluster on all the attributes, not even on a single subset of attributes, but often on different subsets of attributes in the data. With the objective to reveal such a clustering structure, Friedman and Meulman (2004) proposed a framework and a specific algorithm, called COSA. In this monograph we propose various improvements to the original COSA algorithm. The first improvement targets the optimization strategy for the tuning parameters in COSA. Further, a reformulation of the COSA criterion brings down the number of tuning parameters from two to one, enables incorporation of pre-specified initial weights for the attribute distances and allows for a solution that consists of zero-valued attribute weights. The third improvement consists of a new definition of the COSA distances that yields a better separation between objects from different clusters. We compared the `old' and the improved COSA with other state of the art methods. The comparison is based on simulated and real omics data sets.

Deze website maakt gebruik van cookies. Meer informatie