News

20.08.21:
Complete program ready

Links

Interplay of decision rules and parameter optimization strategies in SIMCA

Raffaele Vitale1, Valeria Carboni2, Caterina Durante2, Marina Cocchi2

  1. Université de Lille, LASIR - Laboratoire de Spectrochimie Infrarouge et Raman, Lille (FR)
  2. Università di Modena e Reggio Emilia, Dipartimento di Scienze Chimiche e Geologiche, Modena (IT)

e-mail: marina.cocchi@unimore.it

SIMCA [1-2] is a well-established class modeling method based on building a disjoint principal component analysis model for each of the investigated classes. Its underlying classification rule is defined on the basis of the distance of every sample from (Orthogonal Distance) and within (Scores Distance) the model space of the concerned category. However, the way these distance measures are combined, and the distributional assumptions on which this classification rule is based lead to different implementations of the methodology. Although all over the years several works (one of the most recent being [3]) have surveyed the properties of such distinct implementations, far less studied is how they are affected by the optimization approach used to tune the SIMCA model parameters, i.e., the class subspace dimensionality/complexity and the significance level defining the distance boundary. For this reason, the main aim of this work is to assess the interplay between SIMCA versions (namely, two variants of the so-called alternative SIMCA – alt-SIMCA [2] – combined index-based SIMCA – CI-SIMCA [4] – and Data Driven SIMCA – DD-SIMCA [3]) and three different SIMCA model optimization strategies: i) significance level fixed at 95% and class model complexity optimized in cross-validation according to a “rigorous” criterion (i.e., by minimizing the difference with respect to the nominal classification sensitivity); ii) significance level fixed at 95% and class model complexity optimized in cross-validation according to a “compliant” criterion (i.e., by maximizing the classification efficiency) and iii) simultaneous significance level and model complexity tuning through the Receiver Operating Characteristic (ROC) curve-based procedure proposed in [5].

A flowchart of comparative assessment is shown below.

References

  1. Wold, S. Pattern Recognition by Means of Disjoint Principal Components Models. Pattern Recogn. 1976, 8, 127-136.
  2. SIMCA Model Builder GUI (http://wiki.eigenvector.com/index.php?title=SIMCA_Model_Builder_GUI)
  3. Pomerantsev A.L., Rodionova O.Y. Popular decision rules in SIMCA: Critical review. J. Chemometrics 2020;34:e3250.
  4. Joe Qin S. Statistical process monitoring: basics and beyond. J Chemometr. 2003;17, 480-502.
  5. Vitale, R., Marini, F., Ruckebusch, C. Anal. Chem. 2018, 90, 10738-10747.