Abstract
Cross-validation has become one of the principal methods to adjust the meta-parameters in predictive models.
Extensions of the cross-validation idea have been proposed to select the number of components in principal
components analysis (PCA). The element-wise k-fold (ekf) cross-validation is among the most used algorithms for
principal components analysis cross-validation. This is the method programmed in the PLS_Toolbox, and it has been
stated to outperform other methods under most circumstances in a numerical experiment. The ekf algorithm is
based on missing data imputation, and it can be programmed using any method for this purpose. In this paper,
the ekf algorithm with the simplest missing data imputation method, trimmed score imputation, is analyzed. A
theoretical study is driven to identify in which situations the application of ekf is adequate and, more importantly,
in which situations it is not. The results presented show that the ekf method may be unable to assess the extent to
which a model represents a test set and may lead to discard principal components with important information. On a
second paper of this series, other imputation methods are studied within the ekf algorithm