PeakLab v1 Documentation Contents            AIST Software Home            AIST Software Support

Comparison of PLS, PCR, and Direct Spectral Fitting


Comparison of PLS, PCR Fitting

In our certification of PeakLab's modeling algorithms, we used the data from the following white papers:

Modeling Spectra - Part I - UV-VIS Data

Modeling Spectra - Part II - FTNIR Data

Modeling Spectra - Part III - Field Site NIR Data

 

UV-VIS Liquid Transmittance Spectral Modeling

In this first series of evaluations the model data matrix used in the Modeling Spectra - Part I - UV-VIS Data was fitted with PeakLab's direct spectral modeling and compared with Systat's PLS (Partial Least Squares) procedure (NIPALS) and Unscrambler's PLS and PCR (Principal Component Regression) procedure. The Leave One Out prediction error criterion was used in all cases, and the prediction error is reported as a ppm (parts per million) of the normalized statistical error. This corresponds with 1e6 * (1 - rē prediction). The higher the number, the greater the prediction error. We show 6 columns of results. The first three (in blue) show the best rē of prediction, irrespective of any overfitting which may be occurring. The second three (in green) show the Unscrambler autoselection of factor or principal component count and PeakLab's BIC (Bayesian Information Criterion) optimization of predictor count. For the PeakLab FPGLM models, the prediction errors are based on an average of the ten best retained models of that specific wavelength count (the best model wil have a slightly better error than this average shown in the tables). For the Sparse PLS and Smart Stepwise models, the prediction error for the best performing model is shown.

Data

Spectra

Variable

Software

Algorithm

Range

Sampling

Type

p/Fac/PC

PredErr

Type

p/Fac/PC

PredErr

UVVIS

Liquid Trans

pTotal

Unscrambler

PLS-NIPALS

320-480 nm

1 nm

Bestr2pred

4 factors

6083

AutoSelect

1 factor

7897

UVVIS

Liquid Trans

pTotal

Systat

PLS-NIPALS

320-480 nm

5 nm

Bestr2pred

3 factors

6257

 

 

 

UVVIS

Liquid Trans

pTotal

Unscrambler

PCR-NIPALS

320-480 nm

1 nm

Bestr2pred

5 PC

6104

AutoSelect

1 PC

7907

UVVIS

Liquid Trans

pTotal

PeakLab

FPGLM

320-480 nm

5 nm to 1 nm

Bestr2pred

8 WL

4082

BIC

6 WL

4408

UVVIS

Liquid Trans

pTotal

PeakLab

PartialPLS

320-480 nm

5 nm

Bestr2pred

15 WL (15,43,8)

5006

 

 

 

UVVIS

Liquid Trans

pTotal

PeakLab

PartialPLS

320-480 nm

5 nm to 1 nm

Bestr2pred

15 WL (15,18,8)

4164

 

 

 

UVVIS

Liquid Trans

pTotal

PeakLab

SmartStep

320-480 nm

5 nm to 1 nm

Bestr2pred

10 WL (10,8)

3857

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

UVVIS

Liquid Trans

pBDMC

Unscrambler

PLS-NIPALS

320-480 nm

1 nm

Bestr2pred

3 factors

46529

AutoSelect

3 factors

46529

UVVIS

Liquid Trans

pBDMC

Systat

PLS-NIPALS

320-480 nm

5 nm

Bestr2pred

4 factors

53843

 

 

 

UVVIS

Liquid Trans

pBDMC

Unscrambler

PCR-NIPALS

320-480 nm

1 nm

Bestr2pred

3 PC

46481

AutoSelect

3 PC

46481

UVVIS

Liquid Trans

pBDMC

PeakLab

FPGLM

320-480 nm

5 nm to 1 nm

Bestr2pred

7 WL

31698

BIC

4 WL

34381

UVVIS

Liquid Trans

pBDMC

PeakLab

PartialPLS

320-480 nm

5 nm

Bestr2pred

14 WL (14,43,4)

43485

 

 

 

UVVIS

Liquid Trans

pBDMC

PeakLab

PartialPLS

320-480 nm

5 nm to 1 nm

Bestr2pred

11 WL (11,4,7)

30908

 

 

 

UVVIS

Liquid Trans

pBDMC

PeakLab

SmartStep

320-480 nm

5 nm to 1 nm

Bestr2pred

10 WL (10,6)

25906

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

UVVIS

Liquid Trans

pC

Unscrambler

PLS-NIPALS

320-480 nm

1 nm

Bestr2pred

3 factors

9083

AutoSelect

3 factors

9083

UVVIS

Liquid Trans

pC

Systat

PLS-NIPALS

320-480 nm

5 nm

Bestr2pred

5 factors

8302

 

 

 

UVVIS

Liquid Trans

pC

Unscrambler

PCR-NIPALS

320-480 nm

1 nm

Bestr2pred

5 PC

7911

AutoSelect

3 PC

9225

UVVIS

Liquid Trans

pC

PeakLab

FPGLM

320-480 nm

5 nm to 1 nm

Bestr2pred

8 WL

5401

BIC

3 WL

6617

UVVIS

Liquid Trans

pC

PeakLab

PartialPLS

320-480 nm

5 nm

Bestr2pred

12 WL (12,24,8)

6622

 

 

 

UVVIS

Liquid Trans

pC

PeakLab

PartialPLS

320-480 nm

5 nm to 1 nm

Bestr2pred

12 WL (12,10,8)

5549

 

 

 

UVVIS

Liquid Trans

pC

PeakLab

SmartStep

320-480 nm

5 nm to 1 nm

Bestr2pred

10 WL (10,8)

5003

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

UVVIS

Liquid Trans

pDMC

Unscrambler

PLS-NIPALS

320-480 nm

1 nm

Bestr2pred

3 factors

35936

AutoSelect

1 factor

37154

UVVIS

Liquid Trans

pDMC

Systat

PLS-NIPALS

320-480 nm

5 nm

Bestr2pred

3 factors

35819

 

 

 

UVVIS

Liquid Trans

pDMC

Unscrambler

PCR-NIPALS

320-480 nm

1 nm

Bestr2pred

3 PC

35965

AutoSelect

1 PC

37155

UVVIS

Liquid Trans

pDMC

PeakLab

FPGLM

320-480 nm

5 nm to 1 nm

Bestr2pred

8 WL

25959

BIC

3 WL

29967

UVVIS

Liquid Trans

pDMC

PeakLab

PartialPLS

320-480 nm

5 nm

Bestr2pred

9 WL (9,32,3)

32997

 

 

 

UVVIS

Liquid Trans

pDMC

PeakLab

PartialPLS

320-480 nm

5 nm to 1 nm

Bestr2pred

10 WL (10,8,6)

26906

 

 

 

UVVIS

Liquid Trans

pDMC

PeakLab

SmartStep

320-480 nm

5 nm to 1 nm

Bestr2pred

9 WL (9,8)

23629

 

 

 

As described in the Modeling Spectra - Part I - UV-VIS Data white paper, we fit four different entities, a total curcuminoid percentage (pTotal), a bis-demthoxycurcumin percentage (pBDMC), a curcumin percentage (pC) and a demethoxycurcumin percentrage (pDMC). The Systat PLS fits were done with a 5 nm sampling interval, and can be compared with the 5 nm Sparse PLS entries from PeakLab. The Unscrambler PLS and PCR models were run with a 1 nm interval, and can be compared with the 5 nm to 1 nm PeakLab models. All fitting used the same upper UV and lower Visible wavelength band. In this UVVIS liquid transmittance spectra, the PeakLab FPGLM (full permutation GLM), Sparse PLS, and Smart Stepwise models all outperformed the PLS and PCR models. We note that the prediction estimates in blue are useful for comparison, since they show the best prediction realized in the modeling, though not necessarily the one you would use for your production models. The green columns represent the Unscrambler autoselect of factors and PeakLab's BIC optimization of predictor count.

FTNIR Powder Reflection Spectral Modeling

In this second series of evaluations the model data matrix used in the Modeling Spectra - Part II - FTNIR Data was fitted with PeakLab's direct spectral modeling and similarly compared with Systat's PLS (Partial Least Squares) procedure (NIPALS) and Unscrambler's PLS and PCR (Principal Component Regression) procedure. Again, the Leave One Out prediction error criterion was used in all cases, and the prediction error is reported as a ppm (parts per million) of the normalized statistical error. In this case, we are looking at powder reflectance spectra, and the prediction errors for the total curcuminoids are appreciably higher for the reasons we cover in the white paper.

Data

Spectra

Variable

Software

Algorithm

Range

Sampling

Type

p/Fac/PC

PredErr

Type

p/Fac/PC

PredErr

FTNIR

Powder Refl

pTotal

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

7 factors

70606

AutoSelect

6 factors

75189

FTNIR

Powder Refl

pTotal

Systat

PLS-NIPALS

1650-1850 nm

2 nm

Bestr2pred

7 factors

71767

 

 

 

FTNIR

Powder Refl

pTotal

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

8 PC

71946

AutoSelect

5 PC

94001

FTNIR

Powder Refl

pTotal

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

57486

BIC

8 WL

57486

FTNIR

Powder Refl

pTotal

PeakLab

PartialPLS

1650-1850 nm

5 nm to 1 nm

Bestr2pred

15 WL (15,18,8)

57953

 

 

 

FTNIR

Powder Refl

pTotal

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

9 WL (9,7)

54855

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FTNIR

Powder Refl

pBDMC

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

9 factors

116321

AutoSelect

5 factors

154974

FTNIR

Powder Refl

pBDMC

Systat

PLS-NIPALS

1650-1850 nm

2 nm

Bestr2pred

9 factors

118473

 

 

 

FTNIR

Powder Refl

pBDMC

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

7 PC

114036

AutoSelect

7 PC

114036

FTNIR

Powder Refl

pBDMC

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

91558

BIC

6 WL

98931

FTNIR

Powder Refl

pBDMC

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

9 WL (9,8)

87438

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FTNIR

Powder Refl

pC

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

9 factors

91980

AutoSelect

6 factors

114036

FTNIR

Powder Refl

pC

Systat

PLS-NIPALS

1650-1850 nm

2 nm

Bestr2pred

9 factors

89904

 

 

 

FTNIR

Powder Refl

pC

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

8 PC

116734

AutoSelect

5 PC

135127

FTNIR

Powder Refl

pC

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

66941

BIC

6 WL

68935

FTNIR

Powder Refl

pC

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

9 WL (9,8)

65160

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FTNIR

Powder Refl

pDMC

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

7 factors

94929

AutoSelect

5 factors

104226

FTNIR

Powder Refl

pDMC

Systat

PLS-NIPALS

1650-1850 nm

2 nm

Bestr2pred

7 factors

96364

 

 

 

FTNIR

Powder Refl

pDMC

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

7 PC

97352

AutoSelect

6 PC

99629

FTNIR

Powder Refl

pDMC

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

80180

BIC

7 WL

81403

FTNIR

Powder Refl

pDMC

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

9 WL (9,8)

78141

 

 

 

These are especially challenging fits, which is why you see much higher optimized factor, principal component, or predictor WL counts. Note that PeakLab's PartialPLS models require matched wavelengths for the averaging, and weaker fits will not isolate the WLs as accurately. This is why there are no PartialPLS models shown for the three component models. As with the UVVIS liquid spectra modeling, the FTNIR powder modeling again shows that PeakLab's full-permutation GLM and smart stepwise models outperform the best and optimized PLS and PCR selections.

Handheld NIR Powder Reflection Spectral Modeling

In this third series of evaluations the model data matrix used in the Modeling Spectra - Part III - Field Site NIR Data was fitted with PeakLab's direct spectral modeling and similarly compared with Systat's PLS (Partial Least Squares) procedure (NIPALS) and Unscrambler's PLS and PCR (Principal Component Regression) procedure, and once again, the Leave One Out prediction error criterion was used. In this fitting, we use near infrared spectra that have lower resolution and a lower signal to noise than the FTNIR data. To this data, we have added moisture modeling in the 1550-1950 wavelength band.

Data

Spectra

Variable

Software

Algorithm

Range

Sampling

Type

p/Fac/PC

PredErr

Type

p/Fac/PC

PredErr

NIR

Powder Refl

pTotal

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

11 factors

47176

AutoSelect

5 factors

65539

NIR

Powder Refl

pTotal

Systat

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

11 factors

48317

 

 

 

NIR

Powder Refl

pTotal

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

8 PC

62552

AutoSelect

5 PC

65539

NIR

Powder Refl

pTotal

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

51441

BIC

12 WL

46334

NIR

Powder Refl

pTotal

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

12 WL (12,7)

46334

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NIR

Powder Refl

pBDMC

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

10 factors

92485

AutoSelect

5 factors

104880

NIR

Powder Refl

pBDMC

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

8 PC

102818

AutoSelect

5 PC

105630

NIR

Powder Refl

pBDMC

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

93580

BIC

7 WL

95368

NIR

Powder Refl

pBDMC

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

12 WL (12,7)

88434

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NIR

Powder Refl

pC

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

10 factors

72957

AutoSelect

5 factors

98934

NIR

Powder Refl

pC

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

9 PC

92531

AutoSelect

5 PC

101740

NIR

Powder Refl

pC

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

77726

BIC

12 WL

69778

NIR

Powder Refl

pC

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

12 WL (12,5)

69778

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NIR

Powder Refl

pDMC

Unscrambler

PLS-NIPALS

1650-1850 nm

1 nm

Bestr2pred

10 factors

57407

AutoSelect

5 factors

74667

NIR

Powder Refl

pDMC

Unscrambler

PCR-NIPALS

1650-1850 nm

1 nm

Bestr2pred

10 PC

72143

AutoSelect

5 PC

76142

NIR

Powder Refl

pDMC

PeakLab

FPGLM

1650-1850 nm

5 nm to 1 nm

Bestr2pred

8 WL

60847

BIC

8 WL

60847

NIR

Powder Refl

pDMC

PeakLab

SmartStep

1650-1850 nm

5 nm to 1 nm

Bestr2pred

12 WL (12,8)

55898

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NIR

Powder Refl

Moisture

Unscrambler

PLS-NIPALS

1550-1950 nm

1 nm

Bestr2pred

11 factors

25942

AutoSelect

4 factors

48248

NIR

Powder Refl

Moisture

Unscrambler

PCR-NIPALS

1550-1950 nm

1 nm

Bestr2pred

9 PC

31952

AutoSelect

4 PC

49867

NIR

Powder Refl

Moisture

PeakLab

FPGLM

1550-1950 nm

5 nm to 1 nm

Bestr2pred

8 WL

25783

BIC

6 WL

26960

NIR

Powder Refl

Moisture

PeakLab

SmartStep

1550-1950 nm

5 nm to 1 nm

Bestr2pred

10 WL (10,7)

24547

 

 

 

In this instance, the green columns where the Unscrambler and PeakLab BIC optimizations are shown, the PeakLab full-permutation GLM and smart stepwise models again outperform the best and optimized PLS and PCR selections. For the best rē of prediction comparisons in the blue, the handheld NIR spectra's weaker resolution and S/N are reflected in very high factor counts in the PLS algorithms and higher predictor counts in the PeakLab fits. We can assume that the PLS and PCR models as well as the direct spectral models are likely fitting some measure of noise at these high factor and prediction counts. In general, however, a direct spectral fit is not as prone to overfitting, as evidenced by the BIC's optimizing to 12 predictors in two of the PeakLab model fits. In a direct spectral fit, every WL in the model must be statistically significant, and thus less likely to represent the fitting of noise.

Confirmation

If you wish to confirm these results in your software, bear in mind that PLS and PCR are iterative, and the data fitted on each iteration will change, and not necessarily identically across applications, as shown in the Systat and Unscrambler results in the first two lines of this last table. Also certain of the correlation and principal component algorithms deal with edge (data boundary) effects differently. You should not expect to see identical PLS or PCR model fits across different software.

The fits that you generate with the FPGLM and SmartStep models in PeakLab can be replicated in any multivariate (GLM) software using the identified WLs in the selected model as the x-predictors. An example is shown in the last section of the Modeling Spectra - Part II - FTNIR Data white paper. To see the exact results shown in these tables, you will need to specify the Leave One Out estimate of prediction, and you will need to convert the rē of prediction to a ppm prediction error (1e6 * (1 - rē prediction)).

The data used for these white papers and the above certifications are available in the PeakLab Data folder.

 



PeakLab v1 Documentation