PeakLab v1 Documentation Contents            AIST Software Home            AIST Software Support

Segment Fit


Segment-Based Placement

Each of the three peak fitting options offer the Use Baseline Segments option. This will be available when a baseline has been subtracted in a preprocessing step. In a segment placement, the peaks resting between baseline segments are treated as separate sectioned data sets.

v5_SegmentFit1.png

Each segment that was identified by the baseline processing will be displayed with a different background color. There may be any number of peaks within a given segment, including just a single baseline-resolved peak. In the above example, the baseline processing identified six segments. The ten placed peaks are partitioned into these six segments, the last segment with three peaks.

When the Use Baseline Segments box is checked, the starting estimate algorithms will treat each of the segments as a separate data set, adjusting widths based on the areas of the peaks within each particular segment.

This segment-based placement doesn't mean that a segmented fit will automatically occur in any subsequent fit. This Use Baseline Segments setting affects only the placement of the peaks, but it also furnishes a visualization of the identified baseline segments which will be used in a segmented fit, should you choose to have this take place.

Segment-Based Fitting

To perform a fit where each segment is treated as a separate data set, you must check the Fit using Separate Segments (for mix of baseline resolved peaks and segments of overlapping peaks) box in the dialog that precedes the fitting:

v5_FitStrategySegmentDlg.png

One of the key concepts in a segment fit is that a segment with multiple peaks will contain peaks with some measure of overlap, peaks that will not be baseline resolved. The assumption in a segment fit is that there may not be enough information in the overlapping peaks to accurately fit independent a2 widths and a3 distortions with any accuracy. Such is a perfectly valid assumption. The greater the measure of overlap, the less the information in the segment will support the fitting of separate a2 widths and a3 distortions for the overlapping peaks.

In a segment fit, each segment will be fitted as if it were a separate data set. This means the Vary width a2 and shape a3 boxes apply only intra-segment. If these boxes are unchecked, all of the peaks within that segment will share the same a2 width and a3 distortion. Each segment will be addressed separately. The next segment, if also containing multiple peaks, will likewise fit a shared a2 width and a3 distortion, but these values will bear no relation to those in any other segment. The a2 widths and a3 distortions will thus vary across segments, but not within them.

In general, you will uncheck the width a2 and shape a3 boxes in the Vary section of the placement dialog in order to have the overlapping peaks in each segment fitted with a shared a2 width and a3 distortion.

Automatic Segment Determination

The segments are automatically determined by the baseline subtraction level nearest the current data level. It is possible to have more than one baseline subtraction. Such may be necessary if it is apparent that the shared a2 and a3 assumption within a segment does not hold across all of the segments. In this example, the three peaks in the last segment are hardly placed with the same accuracy as those within the second and third segments.

When the Use Baseline Segments box is checked, the starting estimates will honor the width a2 and shape a3 states as if one will proceed with a segment-based fit. These are the values shown in the fitting procedure's List option:

Peak      Type        a0              a1              a2              a3              a4      

1         GenNLC      11.1290488      4.12464408      0.00025337      -0.0099189      1.18580000      

2         GenNLC      1.63867012      5.80121665      0.00026275      0.00031198      1.18580000      

3         GenNLC      1.44295428      5.95856407      0.00026275      0.00031198      1.18580000      

4         GenNLC      1.83041795      6.68948747      0.00026595      0.00031577      1.18580000      

5         GenNLC      1.74669467      6.88577555      0.00026595      0.00031577      1.18580000      

6         GenNLC      0.80828824      9.17430946      0.00036095      0.00127690      1.18580000      

7         GenNLC      0.80153386      9.93080502      0.00039563      0.00212049      1.18580000      

8         GenNLC      1.50052900      15.7895325      0.00134629      0.00159850      1.18580000      

9         GenNLC      2.33426611      16.1071172      0.00134629      0.00159850      1.18580000      

10        GenNLC      4.54387311      17.2942595      0.00134629      0.00159850      1.18580000      

Visually inspecting the raw data, it is clear that the shared width and distortion assumption may be fine for the second and third segments, but not the last. The peaks are too sharply changing width and distortion at the far end of the chromatogram. There needs to be a segment between the two peaks with strong overlap, and the last peak which is almost fully baseline resolved, but which did not identify as such in the baseline processing.

A Second Baseline Preprocessing

We thus perform a second baseline subtraction preprocessing where we ensure a baseline exists between these peaks. For the sake of this example, we will manually set this new baseline so that the sixth and seventh peaks, which were previously detected as separate, are placed in the same baseline segment.

v5_SegmentFit1A.png

There are now seven baseline resolved segments from this second baseline step. The last peak now has its own segment. The only way to specify different segments is by a new baseline subtraction (this approach enforces baseline-resolved segments rather than allowing them to be arbitrarily specified).

The Non-Segment Fit

We can now fit the data with and without this segmented fit. We do a non-segmented fit by checking the width a2 and shape a3 boxes in the Vary section of the placement dialog and by unchecking the Fit using Separate Segments (for mix of baseline resolved peaks and segments of overlapping peaks) box in the fit strategy dialog:

v5_SegmentFit2.png

We have a very good fit. For IRF preprocessed data, 27 ppm unaccounted variance and an F-statistic of 10.28 million is excellent. All of the parameters are significant, and apart from only minor deviations, the a2 width increases with retention time. The a3 distortion is completely consistent, fronted to tailed. The a4 ZDD asymmetry is almost always shared across all peaks.

 

Fitted Parameters

r2 Coef Det        DF Adj r2          Fit Std Err        F-value            ppm uVar

0.99997272         0.99997262         0.01280603         10,286,030         27.2789721

 Peak        Type                a0                a1                a2                a3                a4    

    1        GenNLC        4.36638259        4.16368695        0.00020382        -0.0038447        1.73454007  

    2        GenNLC        0.61260761        5.71593860        0.00023245        -0.0007864        1.73454007  

    3        GenNLC        0.61610261        5.89428834        0.00026065        -0.0006031        1.73454007  

    4        GenNLC        0.71429836        6.62900942        0.00025207        -0.0004585        1.73454007  

    5        GenNLC        0.71229305        6.84261002        0.00026548        -0.0002913        1.73454007  

    6        GenNLC        0.32351443        9.18106091        0.00036863        0.00055316        1.73454007  

    7        GenNLC        0.32246135        9.92698775        0.00040960        0.00079336        1.73454007  

    8        GenNLC        0.52175590        16.2591094        0.00082308        0.00501364        1.73454007  

    9        GenNLC        0.56906077        16.6779205        0.00076060        0.00615972        1.73454007  

   10        GenNLC        2.23235252        18.0850837        0.00106453        0.01378402        1.73454007

  

The Segment Fit

In this segment fit, peaks 2 and 3, peaks 4 and 5, and peaks 8 and 9 will share the same a2 width and a3 distortion. For the segmented fit to represent a clear and unambiguous improvement, we need to see all parameters significant, a better F-statistic, and a consistent trend in a2 increasing with retention. We do the segmented fit by unchecking the width a2 and shape a3 boxes in the Vary section of the placement dialog and by checking the Fit using Separate Segments (for mix of baseline resolved peaks and segments of overlapping peaks) box in the fit strategy dialog:

v5_SegmentFit3.png


Fitted Parameters

r2 Coef Det        DF Adj r2          Fit Std Err        F-value            ppm uVar

0.99996717         0.99996702         0.01405429         6,971,411          32.8297967

 Peak        Type                a0                a1                a2                a3                a4    

    1        GenNLC        4.36620829        4.16369126        0.00020348        -0.0038413        1.75723189  

    2        GenNLC        0.62562905        5.72877712        0.00025142        -0.0004544        -0.0756189  

    3        GenNLC        0.60098032        5.89993879        0.00025142        -0.0004544        -0.0756189  

    4        GenNLC        0.71972091        6.63544978        0.00025881        -0.0002997        1.73199288  

    5        GenNLC        0.70489204        6.84276015        0.00025881        -0.0002997        1.73199288  

    6        GenNLC        0.32353187        9.18021406        0.00036932        0.00052512        2.06973298  

    7        GenNLC        0.32251737        9.92452361        0.00041216        0.00071113        2.61738816  

    8        GenNLC        0.51396194        16.2803260        0.00078062        0.00570914        1.05415262  

    9        GenNLC        0.57449926        16.6618619        0.00078062        0.00570914        1.05415262  

   10        GenNLC        2.23073609        18.0861882        0.00103473        0.01364446        1.27617574  

Here we see an issue. Since each segment is treated as a separate data set, the a4 asymmetry (deviation from the NLC's ZDD) is fitted separately for each segment. In three of the segments, not surprisingly those with the overlapping peaks, the a4 is insignificant (grayed).

We will thus do one more segment fit. The ZDDs (zero-distortion densities) are often as consistent as the IRFs, at least with a healthy column. The average of the significant a4 values is 1.748. In the non-segmented fit, a4 fit to 1.735. We will use the ZDD option to lock the GenNLC a4 at 1.74, treating this ZDD GenNLC asymmetry as a known constant:

v5_SegmentFit4.png

Fitted Parameters

r2 Coef Det        DF Adj r2          Fit Std Err        F-value            ppm uVar

0.99996680         0.99996668         0.01412671         8,669,374          33.1985834

 Peak        Type                a0                a1                a2                a3                a4    

    1        GenNLC        4.36634064        4.16368798        0.00020374        -0.0038438        1.74000000  

    2        GenNLC        0.62602443        5.72581829        0.00024691        -0.0005552        1.74000000  

    3        GenNLC        0.60018991        5.89692287        0.00024691        -0.0005552        1.74000000  

    4        GenNLC        0.71972261        6.63543589        0.00025880        -0.0003001        1.74000000  

    5        GenNLC        0.70488925        6.84274595        0.00025880        -0.0003001        1.74000000  

    6        GenNLC        0.32351551        9.18105006        0.00036865        0.00055279        1.74000000  

    7        GenNLC        0.32246178        9.92697245        0.00040961        0.00079286        1.74000000  

    8        GenNLC        0.51397482        16.2777338        0.00079824        0.00567383        1.74000000  

    9        GenNLC        0.57477575        16.6590372        0.00079824        0.00567383        1.74000000  

   10        GenNLC        2.23238842        18.0850397        0.00106498        0.01378462        1.74000000  

This segmented fit is a lovely one. The a2 consistently increases with segment as does the a3 fronting to tailing. All parameters are significant. The F-statistic, however is somewhat lower. There was a loss in overall modeling power by enforcing the overlapping peaks to have shared widths and distortions. The overall expectation of trend in a2 and a3 is improved, but the width and distortion of these overlapping peaks, while close, are not actually identical.

Which of the two fits, the non-segmented one, or the segmented-one, is the better one? Both are quite good for IRF preprocessed data. In the fitting sciences, it is not always about the very best goodness of it. In a chromatographic data fit, a slightly better goodness of fit can result from small differences in the baseline subtraction or IRF preprocessing. In this example, the segmented fit is our preference since it honestly acknowledges the insufficiency of the data with the overlapping peaks to warrant distinct width and distortion estimates.

 

 



PeakLab v1 Documentation Search for Optimum Model User-Defined Peaks