Edinburgh Speech Tools  2.1-release
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
doc/estsigpr.md
1 Signal Processing {#estsigpr}
2 ========================
3 
4 The EST signal processing library provides a set of standard
5 signal processing tools designed specifically for speech
6 analysis. The library includes:
7 
8  - Windowing (creating frames from a continuous waveform)
9  - Linear prediction and associated operations
10  - Cepstral analysis, both via lpc and DFT.
11  - Filterbank analysis
12  - Frequency warping including mel-scaling
13  - Pitch tracking
14  - Energy and Power analysis
15  - Spectrogram Generation
16  - Fourier Transforms
17  - Pitchmarking (of laryngograph signals)
18 
19 # Overview {#estsigproverview}
20 
21 ## Design Issues {#estsigprdesign}
22 
23 The signal processing library is designed specifically for speech
24 applications and hence all functions are written with that end
25 goal in mind. The design of the library has centered around
26 building a set of commonly used easy to configure analysis
27 routines.
28 
29  - **Speed**: We have tried to make the functions as fast as
30  possible. Signal processing can often be time critical, and
31  so it will always be the case that if the code for a
32  particular signal processing algroithm is written in a
33  single function loop it will run faster than by using
34  libraries.
35 
36  However, the signal processing routines in the EST library
37  are in general very fast, and the fact that they use
38  classes such as EST_Track and EST_FVector does not make
39  them slower than they would be if `float *` etc was used.
40 
41  - **types**: The library makes heavy use of a small number of
42  classes, specifically EST_Wave, EST_Track and EST_FVector. These
43  classes are basically arrays and matrices, but take care of
44  issues such as memory managment, error handling and file i/o. Using
45  these classes in the library helps facilitate clean and simple
46  algorithm writing and use. It is strongly recommended that
47  you gain familiarity with these classes before using this
48  part of the library.
49 
50  At present, the issue of complex numbers in signal
51  processing is somewhat fudged, in that a vector of complex
52  numbers is represented by a vector of real parts and a
53  vector of imaginary parts, rather than as a single vector
54  of complex numbers.
55 
56 ## Common Processing model {#estsigprcommonprocessing}
57 
58 In speech, a large number of algorithms follow the same basic
59 model, in which a waveform is analysed by an algorithm and a
60 Track, containing a series of time aligned vectors is
61 produced. Regardless of the type of signal processing, the
62 basic model is as follows:
63 
64  1. Start with a waveform and a series of analysis positions, which
65  can be a fixed distance apart of specified by some other means.
66  2. For each analysis position, define a small portion of the
67  waveform around that position, Multiply this by a
68  windowing function to produce a vector of speech samples.
69  3. Pass this to a frame based signal processing
70  routine which outputs values in another vector.
71  4. Add this vector to a position in an EST_Track
72  which correponds to the analysis time position.
73 
74 Given this model, the signal processing library breaks down into a
75 number of different types of function:
76 
77  - **Utterance based functions**: Functions which operate on an entire waveform or
78  track. These break down into:
79  - **Analysis Functions**: which take a waveform and produce a track
80  - **Synthesis Functions**: which take a track and produce a waveform
81  - **Filter Functions**: which take a waveform and produce a waveform
82  - **Conversion Functions**: which take a track and produce a track
83  - **Frames based functions**: Functions which operate on a single frame of speech or
84  vector coefficients.
85  - **Windowing functions**: which create a windowed frame of speech from a portion
86  of a waveform.
87 
88 Nearly all functions in the signal processing library belong to
89 one of the above listed types. Quite often functions are
90 presented on both the utterance and frame level. For example,
91 there is a function called \ref sig2lpc which
92 takes a single frame of windowed speech and produces a set of
93 linear prediction coefficients. There is also a function called
94 \ref sig2coef which performs linear prediction
95 on a whole waveforn, returning the answer in a
96 Track. \ref sig2coef uses the common processing
97 model, and calls \ref sig2lpc as the algorithm
98 in the loop.
99 
100 Partly for historical reasons some functions,
101 e.g. \ref pda are only available in the
102 utterance based form.
103 
104 When writing signal processing code for this library, it is
105 often the case that all that needs to be written is the frame
106 based algorithm, as other algorithms can do the frame shifting
107 and windowing operations.
108 
109 
110 ## Track Allocation, Frames, Channels and sub-tracks {#estsigprtrackalloc}
111 
112 The signal processing library makes extensive use of the
113 advanced features of the track class, specifically the ability
114 to access single frames and channels.
115 
116 Given a standard multi-channel track, it is possible to make
117 a FVector point to any single frame or channel - this is done
118 by an internal pointer mechanism in EST_FVector. Furthermore,
119 a track can be made to point to a selected number of channels
120 or frames in a main track.
121 
122 For example, imagine we have a function that calculates the
123 covariance matrix for a multi-dimensional track of data. But
124 the data we actually have contains energy, cepstra and delta
125 cepstra. It is non-sensical to calculate convariance on
126 all of this, we just want the cepstra. To do this we use the
127 sub-track facility to set a temporary track to just the
128 cepstral coefficients and pass this into the covariance
129 function. The temporary track has smart pointers into the
130 original track and hence no data is copied.
131 
132 Without this facility, either you would have to do a copy
133 (expensive) or else tell the covariance function which part of
134 the track to use (hacky).
135 
136 Extensive documentation describing this process is found in \ref sigpr-example-frames,
137 \ref tr_example_access_multiple_frames and \ref tr_example_access_single_frames.
138 
139 # Functions {#estsigprfunctions}
140 
141 ## Functions for Generating Frames {#est-sigpr-generating-frames}
142 
143 The following set of functions perform either a signal
144 processing operation on a single frame of speech to produce a set of
145 coefficients, or a transformation on an existing set of coefficients
146 to produce a new set. In most cases, the first argument to the
147 function is the input, and the second is the output. It is assumed
148 that any input speech frame has already been windowed with an
149 appropriate windowing function (eg. Hamming) - see
150 \ref "Windowing mechanisms" on how to produce such a frame. See also
151 \ref sigpr-track-func.
152 
153 It is also assumed that the output vector is of the correct size. No
154 resizing is done in these functions as the incoming vectors may be
155 subvectors of whole tracks etc. In many cases (eg. lpc analysis), an
156 **order** parameter is required. This is usually derived from the size
157 of the input or output vectors, and hence is not passed explicitly.
158 
159  - \ref LinearPredictionfunctions
160  - \ref Energyandpowerframefunctions
161  - \ref FastFourierTransformfunctions
162  - \ref Framebasedfilterbankandcepstralanalysis
163 
164 ## Functions for Generating Tracks {#sigpr-track-func}
165 
166 Functions which operate on a whole waveform and generate coefficients
167 for a track.
168 
169  - \ref Functionsforusewithframebasedprocessing
170  - \ref DeltaandAccelerationcoefficients
171  - \ref PitchF0DetectionAlgorithmfunctions
172  - \ref PitchmarkingFunctions
173  - \ref Spectrogramgeneration
174 
175 These functions are a nice set of stuff
176 
177 ## Functions for Windowing Frames of Waveforms {#est_sigpr_windowing}
178 
179  - \ref EST_Window
180 
181 
182 ## Filter functions {#sigpr-filter}
183 
184 A filter modifies a waveform by changing its frequency
185 characteristics. The following types of filter are currently
186 supported:
187 
188  - **FIR filters**: FIR filters are general purpose finite impulse
189  response filters which are useful for band-pass, low-pass and
190  high-pass filtering.
191  - **Linear Prediction filters**: are used to produce LP residuals
192  from waveforms and viceversa.
193  - **Pre Emphasis filters**: are simple filters for changing the
194  spectral tilt of a signal.
195  - **Non linear filters**: Miscellaneous filters
196 
197  - \subpage FIRfilters
198  - \subpage LinearPredictionfilters
199  - \subpage PrePostEmphasisfilters
200  - \subpage Miscellaneousfilters
201 
202 ## Filter design {#sigpr-filter-design}
203 
204  - \subpage FilterDesign
205 
206 # Example
207 \subpage sigpr-example
208 
209 # Programs {#sigpr-programs}
210 
211 The following are exectutable programs which are used for signal
212 processing:
213 
214  - @ref sigfv_manual is used to provide produce a variety of feature vectors given a
215  waveform.
216  - @ref spectgen_manual is used to produce spectrograms from utterances.
217  - @ref sigfilter_manual performs filtering operations on waveforms.
218  - @ref pda_manual performs pitch detection on waveforms. While sig2fv can perform pitch
219  detection also, pda offers more control over the operation.
220  - @ref pitchmark_manual produces a set of pitchmarks,
221  specifying the instant of glottal close from laryngograph waveforms.
222 
223 The following programs are also useful in signal processing:
224 
225  - @ref ch_wave_manual performs basic operations on waveforms, such as
226  adding headers, resampling, rescaling, multi to single channel
227  conversion etc.
228  - @ref ch_track_manual performs basic operates on coefficient tracks,
229  such as adding headers, resampling, rescaling, multi to single
230  channel conversion etc.
231 
232 
233