PeakLab v1 Documentation Contents AIST Software Home AIST Software Support
White Paper: Part I - Generalized Chromatographic Models
Part I - Generalized Chromatographic Models
In this first white paper, we describe our discovery of an equivalence in chromatographic peak shapes relative to the concentration dependency in the HVL ((Haarhoff-Van der Linde) and Wade-Thomas NLC theoretical models. These are the two models we have found to be the far and away the most successful in modeling the nonlinear shapes that occur with chromatographic peaks. We then use this equivalence to develop generalized models which are capable of fitting the higher moments in chromatographic peaks, allowing high accuracy fits of LC, GC, HPLC and ultraHPLC (with and without gradients), and preparative or high overload,peaks. In this initial white paper, we cover the generalization of the models that describe the actual chromatographic separations as such applies to analytical peaks where only a third moment generalization is required.
In part II, we will address accounting the additional nonidealities in the chromatographic flow and detection systems. By adding an instrument response function in fitting a convolution model to chromatographic data, we will demonstrate analytic fits with less than 10 ppm least-squares error, and in certain instances, fit errors as low as 1 ppm. By using Fourier methods in the fitting, we will illustrate the performance to be suitable for routine analysis of chromatographic peaks.
In part III, we will specifically address gradient HPLC separations and the additional steps that must be taken to successfully estimate the gradient strength in the chromatographic modeling, this on a peak-by-peak basis. In covering gradient GHPLC fits, we will address twice-generalized chromatographic models which also address fourth moment adjustments.
In part IV, we will address the additional challenges of modeling overload shapes arising from preparative chromatography, estimating the peak shapes that would have been generated had the column had infinite capacity and no overload had taken place.
Peak Shapes in Chromatography
If a chromatographic peak is 'fronted', there is a progression in the strength of this fronting as concentration increases:
This plot of real-world data covers two orders of magnitude of concentration. The peaks are normalized to unit area. Note that the peaks at very low concentration show little apparent fronting, and even an unusual and appreciable tailing. The peaks show an increasingly right triangular fronted shape as concentration increases.
Similarly, if a chromatographic peak is 'tailed', there is a progression in the strength of this tailing as concentration increases:
In the case of fronting, the higher concentration produces a later peak apex. In a tailed peak, the higher concentration produces an earlier apex.
It is this shape dependency with concentration that sets true chromatographic models apart from other density models. You can double the concetration, as in the progression from the green to blue peaks, and see appreciably different shapes.
The HVL Chromatographic Model
The Haarhoff-Van der Linde (“HVL”) gas chromatography model is defined as follows:
(1)
If we look at the HVL as a statistical model absent its theoretical derivations, and in a form where the area is an adjustable parameter, we have a four parameter function. In this form, a0 is the peak area, a1 is the center or location value, a2 is the peak width or scale parameter, and a3 is the shape parameter, positive for a right-skewed asymmetry, negative for left-skewed. We have labeled the variables in the HVL model so that adjustable parameters a0-a3 correspond to moments 0-3.
The HVL produces a theoretical diffusion width, originally seen as applicable to GC, and was derived using adsorption isotherm arguments.
When the a3 distortion parameter is negative, the peaks are fronted; when positive, the peaks are tailed. In the HVL model, the a3 distortion is additionally scaled by a1/a2. The a3 values were adjusted for the a1 locations in the plot above to produce a mirroring, identical measures of fronting and tailing at the two different locations.
Note also the obvious issue with using an apex value for location when concentrations of a given solute can significantly vary. In this plot, the five fronted shapes would all fit to a 4.0 location and all five tailed shapes would fit to a 6.0 location. The same kind of issue applies to using a FWHM as a surrogate for the peak's second moment. In this case, all ten of the peaks in the plot will fit to a single .125 width, the SD of the underlying Gaussian when no distortion is present, the width at the limit of infinite dilution.
The NLC Chromatographic Model
The Wade-Thomas non-linear liquid chromatography model (“NLC”) is defined as follows:
(2)
Where TFn is a modified Bessel function integral:
(3)
When the area is an adjustable parameter, the NLC is also a four parameter function. As with the HVL, a0 is the peak area and a1 is the center or location value, and a3 is the distortion parameter, positive for the right-skewed asymmetry of a tailed peak and negative for the left-skewed asymmetry of a fronted peak. This NLC parameterization uses a time constant instead of a rate constant for the a2 kinetic parameter given in the original publication of the Wade-Thomas NLC model. As such, the NLC a2 is similar to the HVL a2, a scale parameter that increases with the peak width.
The NLC produces a kinetic time constant, derived for LC when slow kinetics of adsorption and desorption are present, or where mass-transfer can be modeled by first order kinetics.
As with the HVL, when the NLC's a3 distortion parameter is negative, the peaks are fronted, and when positive, the peaks are tailed. In the NLC model, there is no additional scaling of the a3 distortion. The non-mirror shapes with identical magnitude a3 values are from the asymmetry in the underlying Giddings kinetic model which the NLC generates at the infinite dilution (zero concentration) limit.
Here as well using the apex and FWHM values is fraught with error with respect to concentration independent estimates of the location and broadening, or the first and second moments. All ten of these NLC shapes fit to the same a2 time constant. The five fronted NLC shapes fit to a 4.0 a1 center value and the five tailed shapes each fit to a 6.0 a1 value. These are the mean of the underlying (zero distortion) Giddings density.
Note also that the 0.001 first order time constant value used in these plots represents exceptionally fast kinetics, and yet the shapes track the real-world data in the initial concentration plots.
The Generalized HVL Template
The HVL reduces to a Gaussian at infinite dilution. We will first generalize the HVL model, using the Gaussian or normal probability density function (PDF):
(4)
We also use the Gaussian or normal cumulative distribution function (CDF):
(5)
We also take note of the complement of the CDF, the reverse cumulative of the normal density, even though it is not used in the HVL:
(6)
We can now rewrite the HVL as a generalized template that accepts any zero-distortion density:
(7)
To regenerate the HVL, Density is replaced with (4), the normal PDF, and Cumulative with (5), the normal CDF. Note that any replacement is always done with a matched PDF-CDF (Density-Cumulative) pair.
The Generalized NLC Template
To create a generalized NLC model template, we use the Giddings density:
(8)
Here we take note of the Giddings cumulative, although it is not used in the NLC:
(9)
We also use the Giddings reverse cumulative:
(10)
The NLC generalized density template can then be expressed as follows:
(11)
Just as with the HVL template, we can create any number of NLC-based generalized models by inserting a matched density-cumulative pair other than the Giddings for the zero-distortion assumption.
To regenerate the NLC, Density is replaced with (8), the Giddings PDF, and RevCumulative with (10), the Giddings CDF complement.
The Common Chromatographic Distortion Model
Despite different derivations across decades which targeted different types of chromatography, the generalized templates of the two models produce identical shapes for a given density-cumulative pair.
One can substitute the Gaussian PDF and CDF complement in the NLC template and exactly generate a shape that is exactly fitted by the HVL model. Similarly the Giddings PDF and CDF can be inserted into the HVL template to produce a shape that is exactly fitted by the Wade-Thomas NLC model.
Note that the a1 associated with the first moment, and the a2 associated with the second moment, also appear in the templates, and while the a1 center values are comparable (both represent the mean of the underlying ZDD), the a2's consist of immensely different representations of the peak broadening, one a Gaussian diffusion width, the other a Giddings kinetic time constant associated with adsorption-desorption.
Apart from the distortion scaling in the HVL and the use of the CDF in the HVL and the CDF complement in the NLC, the only difference between the HVL and NLC models is their zero-distortion density assumption. The HVL assumes a diffusion-based Gaussian, the simplest possible probabilistic density assumption. The NLC assumes a first order Giddings density, the simplest kinetic density assumption possible.
If you have long used both the HVL and NLC models in fitting chromatographic peaks, you were probably struck by the similarities in the fits. Part of this can be attributed to the similarity between the Gaussian and Giddings zero-distortion densities:
The Giddings density, the amber curve, is a slightly right-asymmetric peak as compared to the symmetric Gaussian, the blue curve. This symmetry explains why the HVL produces mirrored shapes about a1 with negative distortions, whereas the NLC produces different tailed and fronted shapes with the same magnitude of the a3 distortion parameter.
Extending the HVL and NLC Generalized Templates to Fit Higher Moments in Chromatographic Peaks
The major drawback of the basic HVL and NLC models is that the higher moments are fixed by the Gaussian and Giddings zero-distortion assumptions. Any non-ideality in the chromatographic separation, such as multiple-site adsorptions in the kinetic model, or asymmetry in the diffusion model are not accommodated.
The HVL and NLC generalized templates allow for any density-cumulative pair to be used. The zero distortion density (ZDD) need not have fixed higher moments as locked in by the Gaussian or Giddings assumptions. To create generalized HVL and NLC models, all that is needed is to assume the ZDD is neither Gaussian or Giddings but a more complex density that allows for the third moment, the skewness, to be broadly adjustable. This is what we refer to as a once-generalized model, the addition of third moment or skewness adjustments. Only if one is addressing gradient HPLC or overloaded preparative shapes, is a twice-generalized model, one which also allows for adjustments in the kurtosis (fourth moment, fatness of tails), needed.
By reducing the generalization problem to the zero concentration limiting density, there is an immense simplification, one readily addressed by the statistical sciences. In order to create a once-generalized HVL or NLC, we can use any one of a number of generalized Gaussians where third and/or fourth moments are adjustable. The generalization problem is thus rendered the straightforward one of finding a ZDD which would readily fit HVL and NLC shapes as two families of curves determined by two specific values of a third moment skewness parameter. Given the unlimited possibilities of skewness, such a generalization would also model every chromatographic shape where a skewness was introduced into the infinite dilution density.
A major benefit of a once-generalized closed-form model is an immense simplification of the NLC shape. If the generalization can accurately replicate the Giddings shape, the need for the modified Bessel approximation, and the far more computationally demanding modified Bessel function integral, both of which make the computation of the NLC so onerous, will cease to exist. The NLC shapes will simply be one of the infinite families of shapes the generalized models can produce, the HVL another.
Generalized Default ZDD (One Higher Moment)
We can adopt the widely used asymmetric generalized normal as the density in the templates. This density is not defined at all x, but it is computationally easy to compute:
a0 = Area
a1 = Center (as mean of asymmetric peak)
a2 = Width (SD of underlying Gaussian)
a3 = Asymmetry ( fronted -1 > a3 > 1 tailed)
GenHVL - Default Generalized Normal ZDD
If we substitute this statistical PDF and its CDF into the HVL template for tailed shapes, and this same PDF and its CDF complement into the NLC for fronted shapes, we can construct the Generalized HVL model for chromatography:
a0 = Area
a1 = Center (as mean of asymmetric peak)
a2 = Width (SD of underlying Gaussian ZDD)
a3 = HVL Chromatographic distortion ( -1 > a3 > 1 )
a4 = ZDD asymmetry ( -1 > a4 > 1 )
Note that the a4 value controlling the skew of the GenHVL peak appears as a3 in the ZDD nomenclature.
The once-generalized HVL model, and the once generalized NLC model produce identical shapes, and both reproduce the HVL to full precision and the NLC to 6-8 digits precision. The GenHVL model reports a diffusion width for a2 and a statistical asymmetry for the a4 parameter. The GenNLC differs only in parameterization, reporting a first order kinetic time constant for a2 and an asymmetry indexed to the Giddings/NLC for a4.
An Example of Fitting the GenHVL to Real-World Data
Even though we have a generalized model for the chromatographic separation which accounts a third moment skewness in the infinite dilution density, we have not as yet accounted the real-world non-idealities in a chromatographic system. We will cover this in the next paper in this series. For this illustration, we will jump ahead somewhat and remove the IRF (instrument response function) prior to fitting the GenHVL to a real-world set of IC data standards containing a mix of appreciably fronted and tailed peaks. We thus remove the instrument and system distortions prior to fitting with a Fourier deconvolution procedure that uses values estimated in an IRF determination which quantifies the non-idealities in the flow path and detection.
One of the largest tradeoffs in chromatographic modeling is in using a low enough concentration to see mostly Gaussian peaks and still having a high enough S/N to get effective fits on all components of interest. If one has a model which is capable of managing distorted shapes and reporting true theoretical location and broadening values, independent of concentration, then one can fit the more distorted shapes in a higher concentration sample and benefit from the improved S/N in the data.
Despite the additional noise introduced by the Fourier deconvolution, this high S/N sample, with the higher concentration fronting and tailing, fit to just 11 ppm least squares error. The following analytical fits are for three different concentrations of the above standard.
"Cation Std 5.0ppm (without PDCA)"
Fitted Parameters
r2 Coef Det DF Adj r2 Fit Std Err F-value ppm uVar
0.99996468 0.99996464 0.00611751 23,348,983 35.3179206
Peak Type a0 a1 a2 a3 a4
1 GenHVL 2.39409195 4.86629842 0.04836896 -0.0028304 0.01010560
2 GenHVL 0.68483314 7.09399421 0.06635864 -0.0005339 0.01010560
3 GenHVL 0.79975294 8.27604890 0.07330294 -0.0003202 0.01010560
4 GenHVL 0.36554694 12.3963875 0.11414019 0.00043770 0.01010560
5 GenHVL 1.27705415 27.3145721 0.27360663 0.01257608 0.01010560
6 GenHVL 0.72539077 34.1882845 0.33736125 0.00969516 0.01010560
Measured Values
Peak Type Amplitude Center FWHM Asym50 FW Base Asym10
1 GenHVL 17.5378848 4.96975730 0.12879519 0.51112283 0.26013732 0.44708895
2 GenHVL 4.09186251 7.11573005 0.15730496 0.90616835 0.31466816 0.88988455
3 GenHVL 4.34018865 8.28935683 0.17315652 0.94978109 0.34642154 0.94378805
4 GenHVL 1.27903707 12.3757241 0.26840030 1.06764105 0.53777053 1.09299830
5 GenHVL 1.77534645 26.8452258 0.67323521 1.75253266 1.37786763 2.01359269
6 GenHVL 0.84270757 33.8047084 0.80653519 1.45452305 1.63450870 1.60664078
"Cation Std 10ppm (without PDCA)"
Fitted Parameters
r2 Coef Det DF Adj r2 Fit Std Err F-value ppm uVar
0.99998410 0.99998408 0.00756954 51,856,486 15.9026113
Peak Type a0 a1 a2 a3 a4
1 GenHVL 4.76595940 4.84877877 0.04839938 -0.0054830 0.01425771
2 GenHVL 1.36140311 7.08591237 0.06632691 -0.0010553 0.01425771
3 GenHVL 1.59490007 8.26994645 0.07297309 -0.0006439 0.01425771
4 GenHVL 0.72807999 12.3913245 0.11433678 0.00099639 0.01425771
5 GenHVL 2.53401578 27.3046832 0.27963325 0.02528927 0.01425771
6 GenHVL 1.45127017 34.1888513 0.34552620 0.01984923 0.01425771
Measured Values
Peak Type Amplitude Center FWHM Asym50 FW Base Asym10
1 GenHVL 29.5298652 5.02552694 0.15434218 0.33255295 0.30954552 0.28134430
2 GenHVL 8.02653978 7.12919908 0.15955211 0.81889325 0.31899002 0.78692972
3 GenHVL 8.63873996 8.29746430 0.17358139 0.89449898 0.34700759 0.87811423
4 GenHVL 2.54584420 12.3460701 0.26838043 1.14773956 0.53899929 1.20028957
5 GenHVL 3.17348807 26.4873173 0.74615841 2.57612679 1.56417775 3.16318080
6 GenHVL 1.57429789 33.4774236 0.86111357 1.96515093 1.77985289 2.31704802
"Cation Std 25ppm (without PDCA)"
Fitted Parameters
r2 Coef Det DF Adj r2 Fit Std Err F-value ppm uVar
0.99998833 0.99998832 0.01374504 70,683,285 11.6669314
Peak Type a0 a1 a2 a3 a4
1 GenHVL 11.8142696 4.81659065 0.05245682 -0.0132830 0.01604070
2 GenHVL 3.37933473 7.07533282 0.06774599 -0.0026710 0.01604070
3 GenHVL 3.95466006 8.26853142 0.07360840 -0.0017388 0.01604070
4 GenHVL 1.80133596 12.4132466 0.11490052 0.00268649 0.01604070
5 GenHVL 6.28678029 27.2937733 0.29227426 0.06326881 0.01604070
6 GenHVL 3.59687759 34.2322923 0.36116313 0.05079139 0.01604070
Measured Values
Peak Type Amplitude Center FWHM Asym50 FW Base Asym10
1 GenHVL 51.3969315 5.13548306 0.22524705 0.18947414 0.43971849 0.15866465
2 GenHVL 18.3887055 7.18040938 0.17331581 0.61719324 0.34737404 0.55997115
3 GenHVL 20.5719358 8.34339547 0.18101521 0.73016511 0.36202471 0.68469431
4 GenHVL 6.21391502 12.2981959 0.27141977 1.39865836 0.55027979 1.53839159
5 GenHVL 6.17339933 25.7952864 0.95570974 4.78387753 2.05567140 6.26432429
6 GenHVL 3.20405947 32.8180426 1.05023079 3.46401927 2.23464512 4.40839341
Additive-free cation standards were processed at 5, 10, and 25 ppm concentration. The analysis consists of strongly baseline-resolved peaks with highly fronted and tailed peaks at the higher concentrations. Here we answer why one would want to engage the extra effort to mathematically model chromatographic peaks. Let us assume it might be perfectly expected that a solute of interest would vary by a factor of five in its presence in a sample. If you look at the apex values for the first peak, you see that its values change from 4.970 to 5.125 with concentration. The last peak in the standard changes from 33.805 to 32.818. With respect to FWHM values the first peak varies from .129 to .225 and the last peak varies from .806 to 1.05 with the increasing concentration.
If we look at the a1 fitted values, the center of the infinite dilution Gaussian, we see close to concentration independence. The first peak varies from 4.866 to 4.816, and the last from 34.188 to 34.232, across the 5x increase in concentration. If we look at the a2 fitted values, the standard deviation of this infinite dilution Gaussian, the first peak varies from .0484 to .0524, and the last from .337 to .361. The coefficient of variation for the a1 fitted values averages .15%. For the a2 widths, it is 2.26%. By contrast, the CV for the center or apex values varies 1.06% and the FWHM values vary 11.8%.
Fitting baseline resolved peaks does more than just remove concentration effects from location and broadening estimates. The a3 parameter estimates the measure of fronting or tailing, and a3/a0 will actually offer a concentration independent estimate of the fronted or tailed distortion in any given peak. The higher moment a4 parameter, the skewness in the infinite dilution generalized Gaussian, increases with concentration in this example, something that would perhaps be expected if this parameter were estimating the measure of additional site adsorptions or collision effects and those were nonlinear with concentration. The various parametric estimates tell you much more about each peak, and further, when fits are this accurate, these five fitted parameters can completely reconstruct each peak, as shown in the lower portion of the fitted plot.
White Paper: Part II - Instrument Response Convolution Models
In this white paper, we looked only at the enhancements necessary to fully model the actual chromatographic separation, and only for analytic peaks where one higher third moment adjustment suffices to capture virtually all of the variance in the fitting. We have described a generalized HVL which is capable of fitting all HVL and NLC shapes as well as those of any other third moment asymmetry in the infinite dilution ZDD.
Fitting the third moment skewness in the infinite dilution ZDD is important, but for most chromatographic fitting, this removal of instrumental effects will typically be of greater significance in producing near zero-error fits. In the data above, we removed the instrumental/system distortions prior to fitting and realized fits with 35, 16, and 12 ppm error using this once-generalized HVL model. If the IRF is not pre-subtracted, and a pure HVL is fit to this same data, this forced Gaussian ZDD assumption and no modeling of the IRF results in much higher 2587, 2751, and 2414 ppm errors. To effectively fit chromatographic peaks, this higher moment generalization and an accounting of the instrumental distortions are both necessary.
In part II, we will describe the fitting of the real-world non-idealities in chromatographic data using convolution models.