Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: Some Theoretical aspects

Autores UPV


Cross-validation has become one of the principal methods to adjust the meta-parameters in predictive models. Extensions of the cross-validation idea have been proposed to select the number of components in principal components analysis (PCA). The element-wise k-fold (ekf) cross-validation is among the most used algorithms for principal components analysis cross-validation. This is the method programmed in the PLS_Toolbox, and it has been stated to outperform other methods under most circumstances in a numerical experiment. The ekf algorithm is based on missing data imputation, and it can be programmed using any method for this purpose. In this paper, the ekf algorithm with the simplest missing data imputation method, trimmed score imputation, is analyzed. A theoretical study is driven to identify in which situations the application of ekf is adequate and, more importantly, in which situations it is not. The results presented show that the ekf method may be unable to assess the extent to which a model represents a test set and may lead to discard principal components with important information. On a second paper of this series, other imputation methods are studied within the ekf algorithm