Universiteit Leiden

nl en

Dissertation

Improved Strategies for Distance Based Clustering of Objects on Subsets of Attributes in High-Dimensional Data

This monograph focuses on clustering of objects in high-dimensional data, given the restriction that the objects do not cluster on all the attributes, not even on a single subset of attributes, but often on different subsets of attributes in the data.

Author
Kampert, M.M.D.
Date
03 July 2019
Links
Thesis in Leiden Repository

This monograph focuses on clustering of objects in high-dimensional data, given the restriction that the objects do not cluster on all the attributes, not even on a single subset of attributes, but often on different subsets of attributes in the data. With the objective to reveal such a clustering structure, Friedman and Meulman (2004) proposed a framework and a specific algorithm, called COSA. In this monograph we propose various improvements to the original COSA algorithm. The first improvement targets the optimization strategy for the tuning parameters in COSA. Further, a reformulation of the COSA criterion brings down the number of tuning parameters from two to one, enables incorporation of pre-specified initial weights for the attribute distances and allows for a solution that consists of zero-valued attribute weights. The third improvement consists of a new definition of the COSA distances that yields a better separation between objects from different clusters. We compared the `old' and the improved COSA with other state of the art methods. The comparison is based on simulated and real omics data sets.

This website uses cookies.  More information.