Next: The H2M/cnt extension: models Up: Models with multivariate Gaussian Previous: A more advance example

Subsections


Implementation issues

Initialization

Initialization plays an important part in iterative algorithms such as the EM. Usually the choice of the initialization point will strongly depend on the application considered. I use only two very basic methods of initializing the parameters:
For left-right models
hmm_mint initializes all the model parameters using a uniform ``hard'' segmentation of each data sequence (each sequence is splitted in N consecutive sections, where N is he number of states in the HMM, the vectors thus associated with each state are used to obtain initial parameters of the state-conditional distributions).
For mixture models
svq implements a binary splitting vector quantization algorithm. This usually provides efficient initial estimates of the parameters of the Gaussian densities. Note that svq uses the unweighted Euclidean distance as performance criterion. If the components of the input vectors are strongly correlated and/or of very different magnitude, it would be preferable to use svq on the vectors $ \boldsymbol{\Phi}
\mathbf{x}_t$ where $ \boldsymbol{\Phi}$ is the Cholevski factor associated with $ {\rm Cov}^{-1}(\mathbf{x})$, ie. $ \boldsymbol{\Phi}'\boldsymbol{\Phi} =
{\rm Cov}^{-1}(\mathbf{x}).$


Modifications of the EM recursions

It is common practice to introduce some modifications of the EM algorithm in order to avoid known pitfalls. The fact that the likelihood becomes infinite for singular covariance matrices is maybe the problem most often encountered in practice. Solutions include thresholding the individual variances coefficients in the diagonal case or adding a constant diagonal matrix at each iteration. This is certainly useful, particularly in the case where there is ``not enough'' training data (compared to the complexity of the model). There is however a risk of modifying the properties of the EM algorithm with such modifications, in particular the likelihood may decrease at some iteration. A perhaps more elegant way of handling such problems consists in using priors for the HMM parameters in a Bayesian framework [6].

Most HMM packages such as HTK (popular development tool for speech processing applications) use such modifications in order to avoid these problems (the same is true for vector quantization where heuristics can be introduced to avoid the appearance of singleton clusters). No such modification has been used here but these are easy to code using something like:

for i = 1:n_iter
  [A, logl(i), gamma] = hmm_mest(X, st, A, mu, Sigma);
  [mu, Sigma] = mix_par(X, gamma, DIAG_COV);
  Sigma = Sigma + SMALL_VALUE*ones(size(Sigma)); % EM Modif. (assuming
                                                 % diagonal covariances)
end;
Another frequently used solution is to ``share'' some model parameters (ie. to constrain them to be equal) such as the variances of different states. This usually necessitates only minor modifications of the EM re-estimation equations [7] - see the function ex_sprec (section 2.4) for an example of variance sharing.

Finally the aforementioned HTK toolkit uses a modification of the HMM model which force the forward-backward recursions to give null probability to sequences that do not end in the final state (these modified equations are obtained simply by assuming that each observed sequence is terminated by an END_OF_SEQUENCE symbol associated with a terminal state located after the actual final state of the HMM). This doesn't modify much the EM estimates, except in case where very few training sequences are available (moreover this is clearly limited to left-right HMMs).


Computation time and memory usage

Each function is implemented in a rather straightforward way and should be easy to read. In some case (such as hmm_tran for instance) however, the code may be less easy to read because of aggressive ``matlabization'' (vectorization) which helps save computing time. Note that one of the most time-consuming operation is the computation of Gaussian density values (for all input vectors and all states of the model). In the case I use more frequently (Gaussian densities with diagonal covariance), I have included a mex-file (c_dgaus) which is used in priority if it is found on MATLAB's search path (expect a gain of a factor 5 to 10 on hmm_fb and mix_post). Some routines could easily be made more efficient (hmm_vit for instance) if someone has some time to do so.

Especially if you are using full covariance matrices or if you can't compile the mex-file c_dgaus, these routines can be made much faster by using the MATLAB compiler mcc (if you paid for it): I apologize if it looks like a plain commercial, but execution time gets approximately divided by 10 on functions such as hmm_fb when using mcc. In the four components 2-D mixture model (last example in script ex_basic.m) with 50 000 (fifty thousand) observation vectors, the execution time (on an old fashioned 1998 SUN SPARC workstation) for each EM iteration was: 3 seconds when using diagonal covariance with the mex-file c_dgaus; 3 minutes when using full covariance; 10 seconds for full covariance matrices when the files mix_post and mix_par had been compiled using MATLAB's mcc. Only the ratios between these figures are actually of interest since these should run faster on modern computers (with a Pentium III - 1 GHz PC running Linux, the full covariance case, without compilation, boils down to 25 seconds).

Users of the MATLAB compiler should compile in priority the file gauseval (computation of the Gaussian densities) which represents the main computational load in many of the routines. Compiling the high-level functions like mix, hmm (and vq) fails because I used variable names ending with a trailing underscore to denote the updated parameters (sorry for that!) It wouldn't be very useful anyway since only the compilation of the low-level functions significantly speeds up the computation. Note that functions compiled with mcc can't handle sparse matrices, which is a problem for left-right HMMs (for this reason, I don't recommend compiling a function like hmm_fb). Finally, in version 5.2 (the first MATLAB V5 version on which the compiler actually runs) it is possible to use compilation pragmas (or to specify them as arguments of mcc). Using the pragma #inbounds is an absolute requirement if you want to obtain any gain in execution time and to avoid memory overflows (#realonly also helps but to a much lesser extent - moreover mcc does not seem to handle this properly in the version I use which is 5.2.0.3084). I have included these pragmas when possible in gauseval.m, gauslogv.m mix_par.m and mix_post.m.

Memory space is also a factor to take into account: typically, using more than 50 000 training vectors of dimension 20 with HMMs of size 30 is likely to cause problems on most computers. Usually, the largest matrices are alpha and beta (forward and backward probabilities), gamma (a posteriori distribution for the states) and dens (values of Gaussian densities). The solution would consists in reading the training data from disk-files by blocks... but this is another story!


Next: The H2M/cnt extension: models Up: Models with multivariate Gaussian Previous: A more advance example

Olivier Cappé, Aug 24 2001