Model Discovery

PeakLab v1 Documentation Contents AIST Software Home AIST Software Support

Model Discovery

PeakLab Models - History

In PeakFit, PeakLab's precursor, there were just two principal chromatographic models that could fit a concentration series where tailed and fronted peaks became more so with increasing concentration. There were the Haarhoff-VanderLinde (HVL) diffusion model an the Wade-Thomas Nonlinear Chromatography (NLC) kinetic model. In PeakFit's long history, across three decades, we never found any other models to perform as well with real-world chromatographic peaks.

In those times. decades ago, PeakFit offered a few other models for chromatographic fitting, such as the EMG and an asymmetric generalized Gaussian better known as the 4-parameter Log-Normal. These models were referenced in the early literature for chromatographic modeling, and were primarily present for historical comparison. In general, the HVL and NLC were the models almost always used by serious researchers modeling chromatographic peaks.

Although the NLC typically performed slightly better in fitting most real-world chromatographic peaks, the HVL was usually the model of choice since it is much simpler than the NLC (having a closed-form solution), and its performance can be easily validated in most statistical and scientific modeling software. The HVL is also easier to understand since it reduces to a Gaussian at infinite dilution. The NLC model, on the other hand, contains modified Bessel functions and Bessel function integrals (there is no closed form solution), and validation is difficult or impossible in statistical software. Further, the NLC reduces to the Giddings kinetic model in the limit of infinite dilution which, unless one is modeling the kinetics of peak formation, is harder to understand (peaks with the same kinetic broadening value are actually wider with time, and a retention scale is required).

PeakLab Models - State of the Art Generalizations

In PeakLab, there are now over 200 chromatographic models available. Although many of these are experimental, it does mean that a model discovery process may be needed if you are doing complex chromatographic research.

Before we discuss the model discovery capabilities built into PeakLab, we will make a few important points that may limit your need to engage any form of model discovery.

First, nearly all of the advanced PeakLab chromatographic models are based on 'generalizing' the HVL or NLC models. The core HVL and NLC theory, which has worked for decades with a wide variety of analytic chromatographic shapes, remains intact. In selecting any one of the many different chromatographic models, you will likely be choosing an extension of either the HVL or NLC model.

In PeakLab, the core density of the HVL and NLC models can be 'once generalized', allowing third moment variations, or 'twice-generalized', allowing both third and fourth moment adjustments.

The HVL, with its core zero distortion density (ZDD), of a Gaussian, and the NLC, with its ZDD of a Giddings density, have fixed third (skewness) and fourth (kurtosis) moments.

In the real world, these Gaussian or Giddings theoretical infinite dilution densities will not be completely accurate. For example, the HVL's Gaussian core density may not be present as consequence of skewness arising from asymmetric particle/pore size distributions in media or from a less than ideal packing of the column. We also know injectors can alter this Gaussian assumption even when no column is present.

The Once-Generalized GenHVL

For PeakLab, the default analytic chromatography model is the GenHVL. Instead of a Gaussian or Giddings core density, the GenHVL uses a once-generalized density, the ZDD - Gen Default - Asymmetric Normal. This adds a third-moment adjustable asymmetry parameter to the core Gaussian density.

You can see the GenHVL as the 'go to' model for most analytic isocratic peaks. It may be the only peak model you will ever need. With the addition of this third moment adjustable parameter, the GenHVL becomes a universal model that can fit any NLC or HVL shape as well as a variety of many other real-world analytic chromatographic shapes.

In extending or generalizing the HVL and NLC models, the entire universe of continuous multivariate density functions in the statistical sciences are available, and perfectly valid for potentially modeling the nonideality in the peak shapes. PeakLab contains many different once-generalized densities.

ZDD - Gen Default

ZDD - [Z] The [Z] is identical to the default, except for fitting the mean/mode/median of the underlying Gaussian as the a₁ parameter.

ZDD - [Q] The [Q] density has only a 4th moment compression/dilation generalization using the Error density. There is no third moment adjustment, a locked symmetry.

ZDD - [E] The [E] produces a third moment adjustment by using an EMG for the core density. The additional parameter is an exponential time constant for the convolution.

ZDD - [G] The [G] produces a third moment adjustment by using a GMG (also the Skew Normal) for the core density. The additional parameter is the half-Gaussian SD for the convolution.

ZDD - [S] The [S] density also has only a 4th moment compression/dilation generalization using the Student's t (also the Pearson VII) density. There is no third moment adjustment, a locked symmetry.

ZDD - [V] The [V] is unique since it uses a generalized GMG, essentially offering two distinct third moment adjustments, essentially combining the [Z] and [G] skewness.

Apart from the defaults, the "[ZDD]" is used in the model nomenclature to specify a non-default density. In general, you will want to see the a₁ location of the center of mass of the IRF-removed peak. The [Z] density's a₁ is the center of the underlying Gaussian. The two 4^th moment only densities, [Q] and [S], the convolution densities [E] (Gauss Ä Exp) and [G] (Gauss Ä HalfGauss), and the unusual [V] combining of two different third moment adjustments, can be regarded as experimental.

In general, for analytic peaks, you can use the GenHVL for diffusion/statistical peak modeling, and the GenNLC for kinetic modeling. The two models fit identical shapes. The difference is only in the parameterization.

The Twice-Generalized Gen2HVL

For the non-isocratic gradient peak shapes, as well as preparative or overload shapes, as well as other fourth-moment non-idealities, such as saturating the signal at the detector, there is a default twice-generalized chromatography model, the Gen2HVL. This model uses a twice generalized density, the ZDD - Gen2 Default - Generalized Error. By the addition of both a third and fourth moment adjustable parameter, the Gen2HVL is a universal model that can fit virtually any chromatographic peak shape.

PeakLab contains a number of twice-generalized densities.

ZDD - Gen2 Default

ZDD - [Y] The [Y] is identical to the Gen2 default, except for fitting the mean/mode/median of the underlying Gaussian as the a₁ parameter.

ZDD - [Yp] The [Yp] is identical to the [Y] except for the starting parameters which estimate a preparative ('p') or overload state. (Overload Shapes Only)

ZDD - [YpE] The [YpE] is identical to the [Yp] except that the a₄ 4th moment power of decay is locked at 1.0 ('E', double sided exponential) for a full overload envelope (High Overload Shapes Only)

ZDD - [Yp2] The [Yp2] is identical to the [Yp] except that a separate left and right side width are fitted (Overload Shapes Only)

ZDD - [Yp2E] The [Yp2E] is identical to the [Yp2] except that the a₄ 4th moment power of decay is locked at 1.0 ('E', double sided exponential) for a full overload envelope (High Overload Shapes Only)

ZDD - [K] The [K] density is a ln generalized two-sided kinetic model formed from orders 0 and 1 (High Overload Shapes Only)

ZDD - [T] The [T] is an alternative to the twice generalized default. It uses a ln-generalized Student's t to offer both third and fourth moment adjustments.

ZDD - [W] The [W] is similar to the [Y], except the asymmetry uses a linear rather than a logarithmic transform.

The twice generalized default again assumes that you want to see the a₁ location as the center of mass of the IRF-removed peak. The [Y] density's a₁ is the center of the underlying Gaussian. The [Yp], [YpE], [Yp2], [Yp2E], and [K] densities are only used to fit the very difficult overloaded peak shapes. These are needed only for modeling preparative shapes. The [W] is an experimental density that remains in the program since it showed some promise with unusual gradient peaks.

The [T] is the only model that can be seen as an alternative to the twice generalized Gen2 default density for gauging 4th moment deviations in analytic peaks. The [T]'s fourth moment parameter is incapable of compression, and its dilation varies from that of a Gaussian (infinite) to a Lorentzian (1.0). Although this is probably the most widely referenced model you will find in the statistical sciences with both 3^rd and 4^th moment adjustments, this [T] generalized Student's t density should be deemed highly experimental in a chromatographic model. This generalized Student's t density (statistical functions) is likely much more useful as a separate model for fitting Voigt-like shapes in spectroscopy where there is an asymmetry in the spectral peaks.

For well designed gradient peaks where the gradient cancels most of the IRF, you can fit the Gen2HVL with no IRF. If you wish to directly fit the gradient with a deconvolution model, you should start with the GenHVL<*g*> model.

PeakLab Models - Adding an Instrument Response Function to a Peak

In PeakLab, instrumental distortions are not treated as a data item, but as a convolution distortion (Peak Ä IRF) to each individual peak. The allows a set of baseline resolved peaks to be fit with both a common instrument response function (IRF) which is identical for all peaks, or independently. In general, you will probably want to treat an IRF as constant across a chromatographic elution. This allows you to fit an estimate of the IRF that can be subsequently used in the IRF Deconvolution to remove the IRF prior to fitting using DSP methods.

A model follows a common nomenclature. "GenHVL[Z]<ge>" would describe a GenHVL[Z] peak (a generalized HVL using an alternative 'Z' core density) Ä 'ge' (IRF, area-weighted sum of g,e components). The item in the [], if such exists, is the density which is substituted in the common chromatographic distortion model. The item in the <>, if such exists, is the IRF which is convolved with the peak.

An IRF is thus an essential part of a convolution peak model. In PeakLab, for performance reasons, these convolution models are fit within the Fourier domain, and inverted when needed to display a time-domain plot of the fit in progress.

In our experience we have found only a few IRFs which very effectively model the instrumental distortions in chromatographic data. In most cases an IRF that sums a half-Gaussian probabilistic component and a first order kinetic exponential component, a "ge" IRF, as in the GenHVL<ge> model, will usually be a good starting point.

The other IRFs most likely to fit the instrumental distortions in chromatographic peaks are the <e2>, <g2>, and <e> convolutions, that of two exponentials, two half-Gaussians, and a single exponential. We have found a GenHVL<e2> at times more effective in short UHPLC columns, and a GenHVL<g2> useful for the Agilent DAD detectors. The single exponential IRF in the GenHVL<e> is mostly useful with low S/N data where a two component IRF fit is not possible.

If you wish to explore an experimental peak shape that is not in PeakLab's built-in set of models, it is usually impossible to be certain of its efficacy without also removing or fitting the IRF for that particular data. To make experimental modeling research as simple as possible, any user-defined peak you create will be automatically generated with the option of several of the more common IRF convolutions. Nothing additional is needed to fit a user-defined peak, for example, with a <ge> IRF.

If you are fitting gradient peaks, the added instrument response function can be a deconvolution instead of a convolution, or even a mixture of both. A GenHVL<*g*> ,model uses the 'g' half-Gaussian in a deconvolution (a Fourier filter is automatically applied to deal with the noise added by the deconvolution). In the PeakLab nomenclature <g> is a convolution and <*g*> is a deconvolution.

To estimate the strength of the gradient across a peak, a closed form Gen2HVL model can be fit, and the a₄ power of decay's compression (the power greater than 2.0) can serve as an estimate of gradient strength. Equally, the GenHVL<*g*> deconvolution model can be fit, and the g half-Gaussian SD will be a direct estimate of the gradient strength (how much unwinding of the gradient is needed to restore an isocratic shape). For gradient peaks, Gen2HVL and GenHVL<*g*> fits will probably get you a very sound estimate of the gradient present during a given peak's elution.

Model Discovery

If you are processing only isocratic analytic peaks, and you are fine with a statistical definition of broadening, you can probably fit just the GenHVL<ge>, GenHVL<e2>, the GenHVL<g2> and the GenHVL<e> models using a reference standard, ideally a single peak with a clean decay (no contaminants in the right-side tail). It is likely that one of these models will work quite well, and its IRF will be that which you can remove in the IRF Deconvolution procedure thereafter for all similar runs, and thereby enabling the fit of the closed form GenHVL model. It should not take a great deal of experience to learn the IRF and its parameters for a given instrument, prep, and analysis procedure. Once an IRF is removed from the data in this DSP step prior to fitting, the once-generalized GenHVL and the twice-generalized Gen2HVL closed form models should be all that is needed.

Primarily for serious peak research, PeakLab offers a way to explore all of the different built-in models. To use this feature, you will need to have an imported data set that contains just a single peak, or which has been sectioned to consist just one baseline resolved and baseline corrected peak. It is recommended that you use the Fit Local Maxima Peaks option since this Model Discovery option is only available for a single peak. For it to be effective, the peak should be a standard where you know there is nothing in the tail of the peak that would distort the IRF estimates, or the higher moment terms in the ZDD. You will need to be certain that just one peak is detected in order for this procedure to be available.

In any of the three peak fitting options, right click the data plot of the data where you wish to run a model discovery, and select the Search for Optimum Model, Single Fitted Peak... option from the popup menu.

The models are specified by families. In the above example, we fit all the basic chromatography models. all generalized HVL models, Gaussian and Generalized Gaussian convolutions, and statistical convolution models. If you have the Use IRF,ZDD Config option checked, be sure these estimates are applicable to your data. Alternatively, you can uncheck this box and the program will attempt to generate reasonable starting estimates for all of the parameters within the Model Discovery fitting. You will probably be happier if you input valid estimates for the IRF and ZDD parameters and check this Use IRF,ZDD Config option. This Model Discovery fitting does not contain the elaborate multistep fitting of the main program modeling, and fits can and will fail as a consequence of weak starting estimates. Simply click OK to begin the Model Discovery.

Gen2HVL<*g*,ge> 0.64 16702684 11.007954 8.3300154 0.0190666 -2.23e-05 2.0186899 0.0669175 0.0184749 0.0105103 0.0079351

Gen2HVL<*g*,e> 0.69 17741401 11.007416 8.3329299 0.0195108 -2.25e-05 2.0103501 0.0703667 0.0186912 0.0056848

Gen2HVL<*n*> 0.73 16898801 11.006735 8.3382231 0.0201963 -2.18e-05 2.0024786 0.0727323 0.0184294 2.0221317

Gen2HVL<*ge*> 0.73 14523078 11.006285 8.3327494 0.0195715 -2.29e-05 2.0086175 0.0720054 0.0158517 0.0011001 0.8231262

Gen2HVL<*g*> 0.74 19520680 11.006038 8.3380411 0.0201495 -2.2e-05 2.0027723 0.0725469 0.0181873

Gen2HVL<*k*> 0.76 16161598 11.007513 8.3354192 0.0196579 -2.2e-05 2.0092459 0.0687553 0.0193764 0.6927489

GenHVL<*n*> 0.82 17609663 11.007157 8.3385196 0.0202952 -2.2e-05 0.0749508 0.0188473 2.0351404

GenHVL<*g*> 0.87 20238473 11.005989 8.3382488 0.0202285 -2.24e-05 0.0750539 0.0185002

GenHVL<*g*,e> 0.87 16656951 11.005995 8.3382315 0.0202265 -2.24e-05 0.0750719 0.0184977 0.0005340

GenHVL<*l*> 0.87 16637441 11.006085 8.3382247 0.0202220 -2.25e-05 0.0750639 0.0435197 9.52e-08

GenHVL<*g*,ge> 0.87 14071215 11.005994 8.3381813 0.0202219 -2.25e-05 0.0751097 0.0184964 0.0013397 0.0002503

GenHVL<*s*> 0.97 14960895 11.004980 8.3384109 0.0202777 -2.22e-05 0.0755100 0.0186641 1e+06

GenHVL[V]<l> 1.67 7372959 11.013277 8.3021681 0.0118782 -2.15e-05 0.0243439 -0.069471 0.0003333 5.1916271

GenHVL<*k*> 1.74 8322576 11.008222 8.3358355 0.0198292 -2.38e-05 0.0763177 0.0204434 0.6886298

GenHVL[V]<e> 2.84 5097917 11.005765 8.2985490 0.0114388 -1.93e-05 0.0250160 -0.081034 0.0047951

GenHVL[V]<k> 3.38 3638999 11.000503 8.2997586 0.0117703 -2.27e-05 0.0233837 -0.060060 0.0098924 0.0107473

GenHVL[V]<g> 3.39 4279133 11.000496 8.2995096 0.0117279 -2.25e-05 0.0234460 -0.061231 0.0042574

GenHVL[V] 3.44 5123341 11.000751 8.3026564 0.0119599 -2.3e-05 0.0235213 -0.059914

GenHVL[Y]<n> 6.56 1873864 11.005370 8.3189100 0.0168998 -4.28e-06 2.1531649 -0.013140 0.0092920 3.0000000

Gen2HVL<n> 6.61 1859191 11.005439 8.3188527 0.0169106 -4.38e-06 2.1523171 -0.012727 0.0092025 3.0000000

GenHVL[Yp2]<g> 7.59 1618375 11.006999 8.3190216 0.0167589 -2.43e-06 1.0092095 2.1687166 -0.026370 0.0086518

GenHVL[Y]<g> 7.60 1909097 11.007033 8.3192134 0.0167527 -1.76e-06 2.1691607 -0.025074 0.0086645

Gen2HVL<g> 7.60 1909097 11.007033 8.3190037 0.0167527 -1.76e-06 2.1691527 -0.025070 0.0086640

GenHVL[Q]<k> 7.60 1907719 11.004610 8.3185178 0.0168738 -8.89e-06 2.1526278 0.0191596 0.0420214

Gen2HVL<g2> 7.61 1396019 11.007127 8.3188408 0.0167054 -1.25e-06 2.1737207 -0.027597 0.0089659 0.0089468 0.2049540

GenHVL[Yp2]<l> 7.96 1334689 11.006997 8.3209119 0.0169576 -7.29e-07 0.9645672 2.1494766 -0.010850 0.0169545 -1.03e-05

GenHVL[Y]<k> 8.21 1496316 11.001580 8.3162929 0.0163991 -8.19e-06 2.1925051 -0.009835 0.0266715 1e-12

GenHVL[Q]<n> 9.38 1545882 11.007511 8.3206545 0.0172058 -7.93e-06 2.1301986 0.0061820 3.0000000

GenHVL[Yp2]<n> 9.92 1071336 11.008559 8.3255589 0.0160026 2.492e-05 0.7021365 2.0384541 0.0179595 0.0116585 3.0000000

GenHVL[Q]<g> 9.96 1766364 11.007622 8.3208069 0.0171683 -8.27e-06 2.1332080 0.0054333

GenHVL[Yp2]<k> 10.24 1037249 11.008107 8.3269378 0.0155751 2.996e-05 0.6612708 1.9867062 0.0201608 0.0249290 0.0649862

GenHVL[Yp2]<e> 10.31 1191279 11.009374 8.3224416 0.0171880 -5.43e-06 1.0183485 2.1345748 -0.016125 0.0033041

GenHVL[Y]<e> 10.35 1401599 11.009435 8.3228691 0.0171833 -3.8e-06 2.1357120 -0.013926 0.0033210

Gen2HVL<e> 10.35 1401599 11.009433 8.3227514 0.0171837 -3.8e-06 2.1356835 -0.013913 0.0033189

Gen2HVL<*pe*> 10.41 1021080 11.001996 8.3264162 0.0178474 -1.18e-05 2.0824872 0.0192073 3.333e-05 0.0057250 0.0722296

GenHVL[Y]<pe> 10.49 1012997 11.016082 8.3238133 0.0173116 -4.32e-06 2.1269396 -0.010676 0.0018534 0.0023781 0.0100000

Gen2HVL<pe> 10.51 1010542 11.016471 8.3237634 0.0173150 -4.23e-06 2.1267754 -0.010869 0.0020597 0.0023498 0.0100000

Gen2HVL<*e2*> 11.34 936752 11.001240 8.3299805 0.0182887 -1.46e-05 2.0586257 0.0308684 0.0060547 0.0062905 0.9840073

Gen2HVL<*e*> 11.39 1272523 10.999616 8.3300844 0.0183371 -1.61e-05 2.0536544 0.0355429 0.0063358

GenHVL[Q]<e> 11.80 1491042 11.009103 8.3236348 0.0173505 -7.94e-06 2.1217223 0.0020967

GenHVL[Q]<e2> 11.85 1037076 11.009100 8.3233093 0.0173165 -8.11e-06 2.1239623 0.0023541 0.0023583 0.1609408

GenHVL[V]<n> 12.04 1020265 11.189335 8.3021008 0.0118248 -2.65e-05 0.0216837 -0.041568 0.0015501 2.4053901

GenHVL[Q]<pe> 12.04 1020172 11.009594 8.3243600 0.0174184 -7.69e-06 2.1175899 0.0001038 0.0015289 0.0100000

Gen2NLC 12.18 1445268 11.009592 8.3253259 1.837e-05 -5.73e-06 2.1137543 -2.218822

Gen2HVL 12.18 1445268 11.009592 8.3253259 0.0174903 -5.73e-06 2.1137543 -0.004661

The above is an excerpt from a Model Discovery of a baseline resolved peak within a well-designed gradient. We see fits that range from less than 1 ppm error (even with the noise of deconvolution fitting which was not fully filtered). When you do a model discovery, we recommend that you look for the following:

Failed Significance

The models that have been grayed had one or more parameters where a failed significance occurred (a test for a non-zero value). Such models may be valid, but the data may not have a strong enough S/N to support significance for all parameters. Note that there may be parameters, such as the a₃ chromatographic tailing and fronting, which fail significance because the value is very close to zero and that such may be a very desirable property.

Closed-Form Models

Look for the models which do not have an IRF convolution or deconvolution symbol <> appended. The Gen2HVL and Gen2NLC models are the last ones shown in this excerpt, identical goodness of fits, 12.18 ppm and F=1.44M, but different parameters. He we can look at a₄, and we see 2.114, a compaction significantly greater than the Gaussian's 2.0 power of decay, an estimate of gradient strength the peak experienced in its formation.

Interestingly, we see a strong fit, 3.44 ppm and F=5.12M, on the GenHVL[V] closed form model, the generalized HVL with the generalized GMG as the core density, the ZDD with two third moment terms. The F-statistic is a measure of how well a given model describes the data. The dramatic improvement with the [V] density might suggest an experiment to determine if the GenHVL[V]'s generalized GMG's half-Gaussian SD (a₄) and asymmetry (a₅) provide useful gradient information.

Convolution Models

For an isocratic peak, look for the models with an IRF that does not contain a deconvolution. This is normally what you would look for in most fits.

Deconvolution Models

For a gradient peak, look for the models with an IRF that does contain a deconvolution.

In this example, the best models are experimental mixed deconvolution/convolution models, such as the Gen2HVL<*g*,ge> and Gen2HVL<*g*,e>. Since in a gradient peak, we want the deconvolution to unwind the gradient to where the a₄ power of decay in a twice-generalized model is very close to the 2.0 decay of a Gaussian, we look at the a4 values of 2.018, and 2.010 as confirmation. We assume the gradient and IRF do not perfectly cancel under any circumstance, and that these models should produce the best fits. and here every parameter fit to significance. This is useful. We have, with every parameter significant, using the Gen2HVL<*g*,e> model (0.69 ppm statistical error, F=17.74M), estimated the gradient as a half-Gaussian with an SD of 0.0187 and the remnant IRF as an exponential with a time constant of 0.005, very small. The best F-statistic occurs with the GenHVL<*g*> pure deconvolution model (0.87 ppm, F=20.24M) and this estimates the gradient with a half-Gaussian SD (in the opposite direction of the IRF) as 0.0185.

Note how much we learned in this example from this automated model discovery step. We've confirmed the validity of the 4th moment power of decay in a closed form twice-generalized model for predicting the compaction of a peak arising from a gradient. We've also confirmed the value of fitting a once-generalized GenHVL<*g*> deconvolution model. We have even quantified the measure of remnant IRF in the gradient design. We also discovered that the generalized GMG, the [V] zero-distortion density or ZDD, appears to work well in modeling this gradient peak using just the simpler once-generalized closed form models.

If we were to repeat this model discovery example for a set of gradient standards, we would be able to take a median of the ppm and F-statistic across multiple peaks, and discover which of the models consistently fit such gradient peaks to a high level of accuracy.

Peak Modeling Research

If you are doing serious peak modeling research, you should use this model discovery liberally. You may see some surprises, and since any user-defined peaks you create will be automatically included in the discovery modeling, you will see how a custom model compares with the built-in models.