PeakLab v1 Documentation Contents            AIST Software Home            AIST Software Support

Whittaker Baselines

Whittaker-Based Baseline Estimation

Whittaker smoothing is the modern "gold standard" for baseline correction in high-resolution chromatography and spectroscopy (DAD, FID, Mass Spec). Unlike morphological filters (Rolling Ball, Convex Hull) which treat the signal as a physical shape, Whittaker methods treat the baseline as a penalized least squares problem.

The Mathematical Foundation

The Whittaker algorithm seeks to find a baseline that minimizes a cost function balancing two competing goals:

Fidelity

How closely the baseline follows the original data.

Smoothness

How much the baseline resists rapid changes.

This is expressed via second-order finite differences:

Eqn_Whittaker.png

Where lambda is the smoothing parameter. In PeakLab, we utilize reweighted Whittaker algorithms, which apply a weight vector w to the fidelity term. This allows the algorithm to "ignore" positive-ion peaks while staying pinned to the noise floor. These are iterative fitting algorithms.

Strengths and Applications

Noise Centering

Ideal for detectors where noise is centered around the baseline (e.g., UV-Vis, DAD).

Numerical Stability

Optimized via pentadiagonal matrix solvers, ensuring rapid convergence even for large datasets.

Flexibility

Different weighting schemes (arPLS, asPLS, drPLS) allow for varying levels of peak suppression and curvature adaptation.

Comparison: When to Use Whittaker vs. Morphological

Feature

Whittaker (asPLS, arPLS, etc.)

Morphological (Rolling Ball, Convex Hull)

Noise Type

Gaussian / Centered Noise

"Count" type / Poisson / Unipolar Noise

Peak Shape

Best for varying widths and overlapping peaks

Best for isolated, well-defined peaks

Curvature

Adapts to complex, shifting baselines

Can "dip" into wide peaks if the ball radius is too small

Core References
 

     Whittaker, E. T. (1922). A new method of graduation. Proceedings of the Edinburgh Mathematical Society. (The original foundation).

     Eilers, P. H. C. (2003). A Perfect Smoother. Analytical Chemistry. (The definitive modern adaptation for analytical chemistry).

     Baek, S.-H., et al. (2015). Baseline correction using asymmetrically reweighted penalized least squares. Analyst. (The "arPLS" standard).

Whittaker Parameter Tuning (lambda and p)

Most Whittaker-based routines (asls, arpls, airpls, etc.) rely on two primary variables that define the "stiffness" and "directionality" of the baseline.

Smoothness (lambda)

     Physical Meaning: The "Tension" of the baseline.

     High lambda: The baseline behaves like a rigid steel rod; it remains a straight line and ignores all fluctuations.

     Low lambda: The baseline behaves like a flexible string; it follows every curve and noise wiggle.

Asymmetry (p)

     Physical Meaning: The "Directional Bias".

     Definition: In algorithms like asls, p is the weight assigned to positive residuals (peaks) and 1-p is assigned to negative residuals (noise/valleys).

     The Value: Usually set very small (e.g., 0.001 to $0.01).

     Mechanism: By setting p low, the algorithm "penalizes" the baseline for going into a peak but allows it to stay centered within the noise floor.

     Note on 'arpls/iarpls': These use a self-tuning weighting function instead of a fixed p, but they still use lambda to define the overall stiffness.

Whittaker Algorithms

arpls - Asymmetrically Reweighted Penalized Least Squares

Iteratively suppresses positive residuals (peaks) more than negative ones to estimate a smooth baseline without being biased by signal peaks.

lsrpls - Locally Symmetric Reweighted Penalized Least Squares

Applies symmetric weighting to penalized least squares, improving baseline fit in symmetric peak regions.

brpls - Bayesian Reweighted Penalized Least Squares

Uses Bayesian modeling to estimate peak proportions and reweight the baseline fit.

drpls - Doubly Reweighted Penalized Least Squares

Applies two layers of reweighting to better suppress peaks and noise during baseline estimation.

aspls - Adaptive Smoothness Penalized Least Squares

Dynamically adjusts the smoothing parameter across the signal to better handle variable baseline curvature.

iarpls - Improved Asymmetrically Reweighted Penalized Least Squares

Enhances arpls by refining the weighting scheme for better convergence and peak suppression.

psalsa - Peaked Signal's Asymmetric Least Squares Algorithm

Like asls but uses exponential decay weighting for values above the baseline, allowing better handling of noisy, peak-heavy data.

derpsalsa - Derivative Peak-Screening Asymmetric Least Squares Algorithm

Enhances psalsa by screening peaks using smoothed first and second derivatives before reweighting.

asls - Asymmetric Least Squares Smoothing

Classic baseline method that penalizes positive residuals more than negative ones to suppress peak.

iasls - Improved Asymmetric Least Squares Smoothing

Extends asls by incorporating both first and second derivatives of residuals.

airpls - Adaptive Iteratively Reweighted Penalized Least Squares

Iteratively updates weights based on residuals, adaptively suppressing peaks and improving convergence.

 

Whittaker Algorithm Optimization to Match Human Designed Baselines

PyBaselineOptimization.png

The best way to determine which algorithm to use, we suggest you use the human element. In the Baseline procedure, you will see an Optimize button that opens the above dialog. We recommend you use the SD Variation for the Baseline Detection and the Non-Parm Linear for the Model.

To save time, adjust the automatic settings to get the baseline reasonably close, and then using the mouse highlight and unhighlight the baseline and peak regions until to the human eye you have what you perceive as the perfect human designed baseline baseline. You will probably want to set the non-parametric points to the minimum of 3 if you have very small zones of baseline resolved zones to work with.

BaselineOptirmize1.png

Once you have created the optimum human-designed baseline as in the example above, it is recommended that you save your manual baseline for future use in similar data sets, or as a starting point for future optimizations.

Do not make any changes in the dialog settings. If you do, the automated algorithms will re-estimate or re-fit the baseline points. Because of the amount of effort to create a human baseline, it is recommended that you save the baseline after creating it. Right click the graph of the human baseline after it is complete for these save options:

Save this Baseline/Non-Baseline State

Use this option to save an ASCII CSV containing this baseline state information. This can be subsequently imported to recreate the zones in time that you have specified as baseline in any data set irrespective of its x range or sampling rate.

Import this Data Set's Baseline/Non-Baseline State
Import All Data Sets' Baseline/Non-Baseline State

These options import the saved baseline state information for the current data, or for all data sets currently loaded.

Save this Baseline as an XY File

You can also save the actual baseline curve as an XY file. This will save the actual fitted baseline (the while line in the sample), at the x values in the data set.

Import this Data Set's Baseline from XY File

This option will import the XY fitted baseline and apply it to the current data set.
 
After optionally saving your human target upon which the optimizations are to train, click the Optimize button. You must leave the algorithm set for the non-parametric model or whatever model was used to construct the human baseline.

Only when you have the best human-engineered baseline you can devise, click the Optimize button. Select the algorithms you wish to train to match your human baseline. Click OK to initiate a set of generic algorithm (differential evolution) optimizations where the parameter(s) of each Whittaker algorithm are optimized to match as closely as possible your human designed baseline. PeakLab launches an embedded python procedure to perform these training optimizations.

Note that you are training these algorithms how to process just this specific type of separation or spectral analyses. Baselines of an entirely different rate of change or data with significantly different S/N, sampling rate, or width of peaks will require separate optimizations.

When the optimizations are complete, you will see a summary similar to the following:

PyBaselineOptimizationSummary.png
 

In this example, the arpls, lsrpls, and drpls outperformed the other algorithms in matching the human designed baseline. When you click OK, the parameter(s) for each of the algorithms will be automatically updated in the dialog so that you can view the optimizations.

Choose the Partial option each time it appears so that the human designed baseline is never updated if you wish to keep it displayed. The Whittaker baselines algorithms override your selected points as each determines it own set of peaks and baseline points in the data. Your own specified points or those automatically determined by the selected method are not used, even though they are shown on the plot as a reference.

BaselineOptirmize2.png

If the arpls algorithm is chosen as the Model after this optimization, the optimized arpls lambda will be used, generating the white baseline above. You can then select all of the different baselines in you wish to see which one best manages your specific baseline.

 

BEADS and XPS Baseline Algorithms

The BEADS and XPS baseline algorithms are non-Whittaker algorithms that are also offered in the Baseline option. Please be wary of optimizing the beads algorithm alongside the Whittaker algorithms. BEADS is notoriously hard to manually tune, and as such the optimization may be of appreciable value, but it is a computationally expensive iterative algorithm and the GA must optimize five different parameters. For large data sets, it can be excruciatingly slow.