SIMCA [1-2] is a well-established class modeling method based on building a disjoint principal component analysis model for each of the investigated classes. Its underlying classification rule is defined on the basis of the distance of every sample from (Orthogonal Distance) and within (Scores Distance) the model space of the concerned category. However, the way these distance measures are combined, and the distributional assumptions on which this classification rule is based lead to different implementations of the methodology. Although all over the years several works (one of the most recent being ) have surveyed the properties of such distinct implementations, far less studied is how they are affected by the optimization approach used to tune the SIMCA model parameters, i.e., the class subspace dimensionality/complexity and the significance level defining the distance boundary. For this reason, the main aim of this work is to assess the interplay between SIMCA versions (namely, two variants of the so-called alternative SIMCA – alt-SIMCA  – combined index-based SIMCA – CI-SIMCA  – and Data Driven SIMCA – DD-SIMCA ) and three different SIMCA model optimization strategies: i) significance level fixed at 95% and class model complexity optimized in cross-validation according to a “rigorous” criterion (i.e., by minimizing the difference with respect to the nominal classification sensitivity); ii) significance level fixed at 95% and class model complexity optimized in cross-validation according to a “compliant” criterion (i.e., by maximizing the classification efficiency) and iii) simultaneous significance level and model complexity tuning through the Receiver Operating Characteristic (ROC) curve-based procedure proposed in .
A flowchart of comparative assessment is shown below.