| Issue |
EAS Publications Series
Volume 77, 2016
Statistics for Astrophysics: Clustering and Classification
|
|
|---|---|---|
| Page(s) | 121 - 169 | |
| DOI | https://doi.org/10.1051/eas/1677007 | |
| Published online | 26 May 2016 | |
D. Fraix-Burnet and S. Girard (eds)
EAS Publications Series, 77 (2016) 121-169
Clustering of Variables for Mixed Data
1 Bordeaux Institute of Technology (Bordeaux INP) & Inria Bordeaux Sud Ouest, CQFD team & Bordeaux Institute of Mathematics (IMB, UMR 5251 CNRS), Bordeaux, France
2 University of Bordeaux & Inria Bordeaux Sud Ouest, CQFD team & Bordeaux Institute of Mathematics (IMB, UMR 5251 CNRS), Bordeaux, France
Abstract
This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.
© EAS, EDP Sciences, 2016
