PeakLab v1 Documentation Contents AIST Software Home AIST Software Support
Comparison of PLS, PCR, and Direct Spectral Fitting
Comparison of PLS, PCR Fitting
In our certification of PeakLab's modeling algorithms, we used the data from the following white papers:
Modeling Spectra - Part I - UV-VIS Data
Modeling Spectra - Part II - FTNIR Data
Modeling Spectra - Part III - Field Site NIR Data
UV-VIS Liquid Transmittance Spectral Modeling
In this first series of evaluations the model data matrix used in the Modeling Spectra - Part I - UV-VIS Data was fitted with PeakLab's direct spectral modeling and compared with Systat's PLS (Partial Least Squares) procedure (NIPALS) and Unscrambler's PLS and PCR (Principal Component Regression) procedure. The Leave One Out prediction error criterion was used in all cases, and the prediction error is reported as a ppm (parts per million) of the normalized statistical error. This corresponds with 1e6 * (1 - rē prediction). The higher the number, the greater the prediction error. We show 6 columns of results. The first three (in blue) show the best rē of prediction, irrespective of any overfitting which may be occurring. The second three (in green) show the Unscrambler autoselection of factor or principal component count and PeakLab's BIC (Bayesian Information Criterion) optimization of predictor count. For the PeakLab FPGLM models, the prediction errors are based on an average of the ten best retained models of that specific wavelength count (the best model wil have a slightly better error than this average shown in the tables). For the Sparse PLS and Smart Stepwise models, the prediction error for the best performing model is shown.
Data |
Spectra |
Variable |
Software |
Algorithm |
Range |
Sampling |
Type |
p/Fac/PC |
PredErr |
Type |
p/Fac/PC |
PredErr |
UVVIS |
Liquid Trans |
pTotal |
Unscrambler |
PLS-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
4 factors |
6083 |
AutoSelect |
1 factor |
7897 |
UVVIS |
Liquid Trans |
pTotal |
Systat |
PLS-NIPALS |
320-480 nm |
5 nm |
Bestr2pred |
3 factors |
6257 |
|
|
|
UVVIS |
Liquid Trans |
pTotal |
Unscrambler |
PCR-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
5 PC |
6104 |
AutoSelect |
1 PC |
7907 |
UVVIS |
Liquid Trans |
pTotal |
PeakLab |
FPGLM |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
4082 |
BIC |
6 WL |
4408 |
UVVIS |
Liquid Trans |
pTotal |
PeakLab |
PartialPLS |
320-480 nm |
5 nm |
Bestr2pred |
15 WL (15,43,8) |
5006 |
|
|
|
UVVIS |
Liquid Trans |
pTotal |
PeakLab |
PartialPLS |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
15 WL (15,18,8) |
4164 |
|
|
|
UVVIS |
Liquid Trans |
pTotal |
PeakLab |
SmartStep |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
10 WL (10,8) |
3857 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UVVIS |
Liquid Trans |
pBDMC |
Unscrambler |
PLS-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
3 factors |
46529 |
AutoSelect |
3 factors |
46529 |
UVVIS |
Liquid Trans |
pBDMC |
Systat |
PLS-NIPALS |
320-480 nm |
5 nm |
Bestr2pred |
4 factors |
53843 |
|
|
|
UVVIS |
Liquid Trans |
pBDMC |
Unscrambler |
PCR-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
3 PC |
46481 |
AutoSelect |
3 PC |
46481 |
UVVIS |
Liquid Trans |
pBDMC |
PeakLab |
FPGLM |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
7 WL |
31698 |
BIC |
4 WL |
34381 |
UVVIS |
Liquid Trans |
pBDMC |
PeakLab |
PartialPLS |
320-480 nm |
5 nm |
Bestr2pred |
14 WL (14,43,4) |
43485 |
|
|
|
UVVIS |
Liquid Trans |
pBDMC |
PeakLab |
PartialPLS |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
11 WL (11,4,7) |
30908 |
|
|
|
UVVIS |
Liquid Trans |
pBDMC |
PeakLab |
SmartStep |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
10 WL (10,6) |
25906 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UVVIS |
Liquid Trans |
pC |
Unscrambler |
PLS-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
3 factors |
9083 |
AutoSelect |
3 factors |
9083 |
UVVIS |
Liquid Trans |
pC |
Systat |
PLS-NIPALS |
320-480 nm |
5 nm |
Bestr2pred |
5 factors |
8302 |
|
|
|
UVVIS |
Liquid Trans |
pC |
Unscrambler |
PCR-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
5 PC |
7911 |
AutoSelect |
3 PC |
9225 |
UVVIS |
Liquid Trans |
pC |
PeakLab |
FPGLM |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
5401 |
BIC |
3 WL |
6617 |
UVVIS |
Liquid Trans |
pC |
PeakLab |
PartialPLS |
320-480 nm |
5 nm |
Bestr2pred |
12 WL (12,24,8) |
6622 |
|
|
|
UVVIS |
Liquid Trans |
pC |
PeakLab |
PartialPLS |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
12 WL (12,10,8) |
5549 |
|
|
|
UVVIS |
Liquid Trans |
pC |
PeakLab |
SmartStep |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
10 WL (10,8) |
5003 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UVVIS |
Liquid Trans |
pDMC |
Unscrambler |
PLS-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
3 factors |
35936 |
AutoSelect |
1 factor |
37154 |
UVVIS |
Liquid Trans |
pDMC |
Systat |
PLS-NIPALS |
320-480 nm |
5 nm |
Bestr2pred |
3 factors |
35819 |
|
|
|
UVVIS |
Liquid Trans |
pDMC |
Unscrambler |
PCR-NIPALS |
320-480 nm |
1 nm |
Bestr2pred |
3 PC |
35965 |
AutoSelect |
1 PC |
37155 |
UVVIS |
Liquid Trans |
pDMC |
PeakLab |
FPGLM |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
25959 |
BIC |
3 WL |
29967 |
UVVIS |
Liquid Trans |
pDMC |
PeakLab |
PartialPLS |
320-480 nm |
5 nm |
Bestr2pred |
9 WL (9,32,3) |
32997 |
|
|
|
UVVIS |
Liquid Trans |
pDMC |
PeakLab |
PartialPLS |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
10 WL (10,8,6) |
26906 |
|
|
|
UVVIS |
Liquid Trans |
pDMC |
PeakLab |
SmartStep |
320-480 nm |
5 nm to 1 nm |
Bestr2pred |
9 WL (9,8) |
23629 |
|
|
|
As described in the Modeling Spectra - Part I - UV-VIS Data white paper, we fit four different entities, a total curcuminoid percentage (pTotal), a bis-demthoxycurcumin percentage (pBDMC), a curcumin percentage (pC) and a demethoxycurcumin percentrage (pDMC). The Systat PLS fits were done with a 5 nm sampling interval, and can be compared with the 5 nm Sparse PLS entries from PeakLab. The Unscrambler PLS and PCR models were run with a 1 nm interval, and can be compared with the 5 nm to 1 nm PeakLab models. All fitting used the same upper UV and lower Visible wavelength band. In this UVVIS liquid transmittance spectra, the PeakLab FPGLM (full permutation GLM), Sparse PLS, and Smart Stepwise models all outperformed the PLS and PCR models. We note that the prediction estimates in blue are useful for comparison, since they show the best prediction realized in the modeling, though not necessarily the one you would use for your production models. The green columns represent the Unscrambler autoselect of factors and PeakLab's BIC optimization of predictor count.
FTNIR Powder Reflection Spectral Modeling
In this second series of evaluations the model data matrix used in the Modeling Spectra - Part II - FTNIR Data was fitted with PeakLab's direct spectral modeling and similarly compared with Systat's PLS (Partial Least Squares) procedure (NIPALS) and Unscrambler's PLS and PCR (Principal Component Regression) procedure. Again, the Leave One Out prediction error criterion was used in all cases, and the prediction error is reported as a ppm (parts per million) of the normalized statistical error. In this case, we are looking at powder reflectance spectra, and the prediction errors for the total curcuminoids are appreciably higher for the reasons we cover in the white paper.
Data |
Spectra |
Variable |
Software |
Algorithm |
Range |
Sampling |
Type |
p/Fac/PC |
PredErr |
Type |
p/Fac/PC |
PredErr |
FTNIR |
Powder Refl |
pTotal |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
7 factors |
70606 |
AutoSelect |
6 factors |
75189 |
FTNIR |
Powder Refl |
pTotal |
Systat |
PLS-NIPALS |
1650-1850 nm |
2 nm |
Bestr2pred |
7 factors |
71767 |
|
|
|
FTNIR |
Powder Refl |
pTotal |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
8 PC |
71946 |
AutoSelect |
5 PC |
94001 |
FTNIR |
Powder Refl |
pTotal |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
57486 |
BIC |
8 WL |
57486 |
FTNIR |
Powder Refl |
pTotal |
PeakLab |
PartialPLS |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
15 WL (15,18,8) |
57953 |
|
|
|
FTNIR |
Powder Refl |
pTotal |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
9 WL (9,7) |
54855 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FTNIR |
Powder Refl |
pBDMC |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
9 factors |
116321 |
AutoSelect |
5 factors |
154974 |
FTNIR |
Powder Refl |
pBDMC |
Systat |
PLS-NIPALS |
1650-1850 nm |
2 nm |
Bestr2pred |
9 factors |
118473 |
|
|
|
FTNIR |
Powder Refl |
pBDMC |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
7 PC |
114036 |
AutoSelect |
7 PC |
114036 |
FTNIR |
Powder Refl |
pBDMC |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
91558 |
BIC |
6 WL |
98931 |
FTNIR |
Powder Refl |
pBDMC |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
9 WL (9,8) |
87438 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FTNIR |
Powder Refl |
pC |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
9 factors |
91980 |
AutoSelect |
6 factors |
114036 |
FTNIR |
Powder Refl |
pC |
Systat |
PLS-NIPALS |
1650-1850 nm |
2 nm |
Bestr2pred |
9 factors |
89904 |
|
|
|
FTNIR |
Powder Refl |
pC |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
8 PC |
116734 |
AutoSelect |
5 PC |
135127 |
FTNIR |
Powder Refl |
pC |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
66941 |
BIC |
6 WL |
68935 |
FTNIR |
Powder Refl |
pC |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
9 WL (9,8) |
65160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FTNIR |
Powder Refl |
pDMC |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
7 factors |
94929 |
AutoSelect |
5 factors |
104226 |
FTNIR |
Powder Refl |
pDMC |
Systat |
PLS-NIPALS |
1650-1850 nm |
2 nm |
Bestr2pred |
7 factors |
96364 |
|
|
|
FTNIR |
Powder Refl |
pDMC |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
7 PC |
97352 |
AutoSelect |
6 PC |
99629 |
FTNIR |
Powder Refl |
pDMC |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
80180 |
BIC |
7 WL |
81403 |
FTNIR |
Powder Refl |
pDMC |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
9 WL (9,8) |
78141 |
|
|
|
These are especially challenging fits, which is why you see much higher optimized factor, principal component, or predictor WL counts. Note that PeakLab's PartialPLS models require matched wavelengths for the averaging, and weaker fits will not isolate the WLs as accurately. This is why there are no PartialPLS models shown for the three component models. As with the UVVIS liquid spectra modeling, the FTNIR powder modeling again shows that PeakLab's full-permutation GLM and smart stepwise models outperform the best and optimized PLS and PCR selections.
Handheld NIR Powder Reflection Spectral Modeling
In this third series of evaluations the model data matrix used in the Modeling Spectra - Part III - Field Site NIR Data was fitted with PeakLab's direct spectral modeling and similarly compared with Systat's PLS (Partial Least Squares) procedure (NIPALS) and Unscrambler's PLS and PCR (Principal Component Regression) procedure, and once again, the Leave One Out prediction error criterion was used. In this fitting, we use near infrared spectra that have lower resolution and a lower signal to noise than the FTNIR data. To this data, we have added moisture modeling in the 1550-1950 wavelength band.
Data |
Spectra |
Variable |
Software |
Algorithm |
Range |
Sampling |
Type |
p/Fac/PC |
PredErr |
Type |
p/Fac/PC |
PredErr |
NIR |
Powder Refl |
pTotal |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
11 factors |
47176 |
AutoSelect |
5 factors |
65539 |
NIR |
Powder Refl |
pTotal |
Systat |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
11 factors |
48317 |
|
|
|
NIR |
Powder Refl |
pTotal |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
8 PC |
62552 |
AutoSelect |
5 PC |
65539 |
NIR |
Powder Refl |
pTotal |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
51441 |
BIC |
12 WL |
46334 |
NIR |
Powder Refl |
pTotal |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
12 WL (12,7) |
46334 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NIR |
Powder Refl |
pBDMC |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
10 factors |
92485 |
AutoSelect |
5 factors |
104880 |
NIR |
Powder Refl |
pBDMC |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
8 PC |
102818 |
AutoSelect |
5 PC |
105630 |
NIR |
Powder Refl |
pBDMC |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
93580 |
BIC |
7 WL |
95368 |
NIR |
Powder Refl |
pBDMC |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
12 WL (12,7) |
88434 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NIR |
Powder Refl |
pC |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
10 factors |
72957 |
AutoSelect |
5 factors |
98934 |
NIR |
Powder Refl |
pC |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
9 PC |
92531 |
AutoSelect |
5 PC |
101740 |
NIR |
Powder Refl |
pC |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
77726 |
BIC |
12 WL |
69778 |
NIR |
Powder Refl |
pC |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
12 WL (12,5) |
69778 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NIR |
Powder Refl |
pDMC |
Unscrambler |
PLS-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
10 factors |
57407 |
AutoSelect |
5 factors |
74667 |
NIR |
Powder Refl |
pDMC |
Unscrambler |
PCR-NIPALS |
1650-1850 nm |
1 nm |
Bestr2pred |
10 PC |
72143 |
AutoSelect |
5 PC |
76142 |
NIR |
Powder Refl |
pDMC |
PeakLab |
FPGLM |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
60847 |
BIC |
8 WL |
60847 |
NIR |
Powder Refl |
pDMC |
PeakLab |
SmartStep |
1650-1850 nm |
5 nm to 1 nm |
Bestr2pred |
12 WL (12,8) |
55898 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NIR |
Powder Refl |
Moisture |
Unscrambler |
PLS-NIPALS |
1550-1950 nm |
1 nm |
Bestr2pred |
11 factors |
25942 |
AutoSelect |
4 factors |
48248 |
NIR |
Powder Refl |
Moisture |
Unscrambler |
PCR-NIPALS |
1550-1950 nm |
1 nm |
Bestr2pred |
9 PC |
31952 |
AutoSelect |
4 PC |
49867 |
NIR |
Powder Refl |
Moisture |
PeakLab |
FPGLM |
1550-1950 nm |
5 nm to 1 nm |
Bestr2pred |
8 WL |
25783 |
BIC |
6 WL |
26960 |
NIR |
Powder Refl |
Moisture |
PeakLab |
SmartStep |
1550-1950 nm |
5 nm to 1 nm |
Bestr2pred |
10 WL (10,7) |
24547 |
|
|
|
In this instance, the green columns where the Unscrambler and PeakLab BIC optimizations are shown, the PeakLab full-permutation GLM and smart stepwise models again outperform the best and optimized PLS and PCR selections. For the best rē of prediction comparisons in the blue, the handheld NIR spectra's weaker resolution and S/N are reflected in very high factor counts in the PLS algorithms and higher predictor counts in the PeakLab fits. We can assume that the PLS and PCR models as well as the direct spectral models are likely fitting some measure of noise at these high factor and prediction counts. In general, however, a direct spectral fit is not as prone to overfitting, as evidenced by the BIC's optimizing to 12 predictors in two of the PeakLab model fits. In a direct spectral fit, every WL in the model must be statistically significant, and thus less likely to represent the fitting of noise.
Confirmation
If you wish to confirm these results in your software, bear in mind that PLS and PCR are iterative, and the data fitted on each iteration will change, and not necessarily identically across applications, as shown in the Systat and Unscrambler results in the first two lines of this last table. Also certain of the correlation and principal component algorithms deal with edge (data boundary) effects differently. You should not expect to see identical PLS or PCR model fits across different software.
The fits that you generate with the FPGLM and SmartStep models in PeakLab can be replicated in any multivariate (GLM) software using the identified WLs in the selected model as the x-predictors. An example is shown in the last section of the Modeling Spectra - Part II - FTNIR Data white paper. To see the exact results shown in these tables, you will need to specify the Leave One Out estimate of prediction, and you will need to convert the rē of prediction to a ppm prediction error (1e6 * (1 - rē prediction)).
The data used for these white papers and the above certifications are available in the PeakLab Data folder.