Variable importance: Comparison of selectivity ratio and significance multivariate correlation for interpretation of latent‐variable regression models
Journal article, Peer reviewed
MetadataVis full innførsel
OriginalversjonJournal of Chemometrics. 2020, 34(4), e3211 10.1002/cem.3211
This work examines the performance of significance multivariate correlation (sMC) and selectivity ratio (SR) for ranking variables according to their importance in latent-variable regressions (LVRs) models. Both indices are based on target projection (TP) of a validated LVR model obtained by partial least squares (PLS). The matrix of explanatory x-variables is projected on the normalized regression vector to obtain a score vector that is proportional to the vector of predicted values for the response variable y. sMC for each x-variable is calculated by dividing the squared variance explained by the decomposition obtained from these two vectors on the squared residuals. This is similar to how SR is calculated except that for SR, the regression vector is replaced by the loading matrix obtained by projecting the data matrix of x-variables onto the score matrix obtained by TP. The two indices for variable importance are compared for three different applications with data representing instrumental profiles from liquid chromatography, infrared spectroscopy, and proton nuclear magnetic spectroscopy. Results show that SR outperforms sMC for interpretation and biomarker selection. The main drawback of sMC appears to be the mixing of predictive and orthogonal variation resulting from the direct use of the normalized regression vector in the calculation. SR uses a loading vector that is proportional to the covariances between x-variables and the predicted response variable.