Source: http://www.chem.uoa.gr/applets/appletsmooth/appl_smooth2.html
Signal Smoothing Algorithms
Theory
The signal-to-noise ratio (SNR) of a signal can be enhanced by either hardware or software techniques. The wide use of personal computers in chemical instrumentation and their inherent programming flexibility make software signal smoothing (or filtering) techniques especially attractive. Some of the more common signal smoothing algorithms described below.
Moving average algorithm
The simpler software technique for smoothing signals consisting of equidistant points is the moving average. An array of raw (noisy) data [y1, y2, …, yN] can be converted to a new array of smoothed data. The “smoothed point” (yk)s is the average of an odd number of consecutive 2n+1 (n=1, 2, 3, ..) points of the raw data yk-n, yk-n+1, …, yk-1, yk, yk+1, …, yk+n-1, yk+n, i.e.
The odd number 2n+1 is usually named filter width. The greater the filter width the more intense is the smoothing effect. This operation is depicted in the animated picture below.
In this example the filter width is 5. The first five raw data (black squares) within the red rectangle (moving window) are averaged and their average value is plotted as smoothed (green squares) data point 3. The rectangle is then moved one point to the right and points 2 through 6 are averaged, and the average is plotted as smoothed data point 4, and so on. This procedure is called a 5-point unweighted smooth.
The signal-to-noise ratio may be further enhanced by increasing the filter width or by smoothing the data multiple times. Obviously after each filter pass the first n and the last n points are lost.
The results of this technique are deceptively impressive because of excessive filtering. Actually, information is lost and/or distorted because too much statistical weight is given to points that are well removed from the central point. Moving average algorithm is particularly damaging when the filter passes through peaks that are narrow compared to the filter width.
Savitzky-Golay algorithm
A much better procedure than simply averaging points is to perform a least squares fit of a small set of consecutive data points to a polynomial and take the calculated central point of the fitted polynomial curve as the new smoothed data point.
Savitzky and Golay (see A. Savitzky and M. J. E. Golay, Anal. Chem., 1964, 36, 1627) showed that a set of integers (A-n, A-(n-1) …, An-1, An) could be derived and used as weighting coefficients to carry out the smoothing operation. The use of these weighting coefficients, known as convolution integers, turns out to be exactly equivalent to fitting the data to a polynomial, as just described and it is computationally more effective and much faster. Therefore, the smoothed data point (yk)s by the Savitzky-Golay algorithm is given by the following equation:
Many sets of convolution integers can be used depending on the filter width and the polynomial degree. Typical sets of these integers for “quadratic smooth” are shown in the table below:
|
Filter width (2n+1) |
i |
11 |
9 |
7 |
5 |
-5 |
-36 |
|
|
|
-4 |
9 |
-21 |
|
|
-3 |
44 |
14 |
-2 |
|
-2 |
69 |
39 |
3 |
-3 |
-1 |
84 |
54 |
6 |
12 |
0 |
89 |
59 |
7 |
17 |
1 |
84 |
54 |
6 |
12 |
2 |
69 |
39 |
3 |
-3 |
3 |
44 |
14 |
-2 |
|
4 |
9 |
-21 |
|
|
5 |
-36 |
|
|
|
Sets of convolution integers can be used to obtain directly, instead of the smoothed signal, its 1st, 2nd, …, mth order derivative, therefore Savitzky-Golay algorithm is very useful for calculating the derivatives of noisy signals consisting of descrete and equidistant points.
The smoothing effect of the Savitzky-Golay algorithm is not so aggressive as in the case of the moving average and the loss and/or distortion of vital information is comparatively limited. However, it should be stressed that both algorithms are “lossy”, i.e. part of the original information is lost or distorded. This type of smoothing has only cosmetic value.
Ensemble Average
In ensemble average successive sets of data are collected and summed point by point. Therefore, a prerequisite for the application of this method is the ability to reproduce the signal as many times as possible starting always from the same data point, contrary to the previous two algorithms which operate exclusively on a single data set.
Typical application of ensemble average is found in NMR and FT-IR spectroscopy, where the final spectrum is the result of averaging thousands of individual spectra. This is the only way to obtain a meaningful signal, when a single scan generates a practically unreadable signal heavily contaminated with random noise.
Repetitive additions of noisy signals tend to emphasize their systematic characteristics and to cancel out any zero-mean random noise. If (SNR)o is the original signal-to-noise ratio of the signal, the final (SNR)f after N repetitions (scans) is given by the following equation:
Therefore, by averaging 100 (or 1000) data sets a 10-fold (or a 100-fold) reduction of noise level is achieved.
Applet
This applet is a demonstration of the aforementioned signal smoothing algorithms. The user has several options. Up to 4 different types (F1, F2, F3, F4) of signals consisting of 1000 data points can be selected. Up to 3 levels of zero-mean normally distributed random noise can be added to the normal (noiseless) signal. 3 filter widths (5, 7, 9 points) can be selected for the moving average and Savitzky-Golay algorithms.
The smoothed signal appears in black color, whereas the user has the option to allow the normal (noiseless) signal and/or the original noisy signal to be displayed in the plot area with faint blue and red colors, respectively, for comparison.
It is of interest to observe the difference between moving average and Savitzky-Golay algorithms using the signal F3. This signal consists of seven Gaussian peaks of equal height with progressively decreasing width. By successive applications (passings) of the moving average the peak-height of the more narrow peaks decreases, i.e. a crucial information (the height) is distorted. In sharp contrast to moving average, the Savitzky-Golay algorithm “respects” this information and the degradation of the signal is limited.
The ensemble average procedure is best demonstrated with the signal F2, that simulates the very narrow peaks encountered in NMR spectra. By adding high level noise, this signal becomes almost unreadable and only the higher peaks are discernible. By applying ensemble average and by increasing the number of scans, the signal gradually emerges and even the smaller peaks can be safely recognized.