Issue |
EAS Publications Series
Volume 77, 2016
Statistics for Astrophysics: Clustering and Classification
|
|
---|---|---|
Page(s) | 121 - 169 | |
DOI | https://doi.org/10.1051/eas/1677007 | |
Published online | 26 May 2016 |
D. Fraix-Burnet and S. Girard (eds)
EAS Publications Series, 77 (2016) 121-169
Clustering of Variables for Mixed Data
1 Bordeaux Institute of Technology (Bordeaux INP) & Inria Bordeaux Sud Ouest, CQFD team & Bordeaux Institute of Mathematics (IMB, UMR 5251 CNRS), Bordeaux, France
2 University of Bordeaux & Inria Bordeaux Sud Ouest, CQFD team & Bordeaux Institute of Mathematics (IMB, UMR 5251 CNRS), Bordeaux, France
This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.
© EAS, EDP Sciences, 2016