Next: Models with multivariate Gaussian Up: Introduction Previous: MATLAB/OCTAVE compatibility

FAQ

Here are the answers to some questions I have been asked several times (hope this helps):
Why does the log-likelihood takes positive values?
For Gaussian state conditional densities, the likelihood is a value from a probability density function (pdf) and should thus not be interpreted as a probability (it can take any positive values, not only those smaller than 1). For such models, obtaining log-likelihood values greater than 0 is certainly not a bug. It is easy to see that if you scale your observations by, say, 10, and modify the parameters of the model accordingly (divide the means by 10 and the variances by 10 to the square), then the likelihood is multiplied by 10^(T*p) (the log-likelihood is increased by T*p*log(10)) where T is the number of observations and p their dimension.
Does H2M has some limits on the dimension or on the scaling of the data?
The dynamics of the Gaussian pdf increases steadily with the dimension, so that if you are in large dimension and try to compute the value of the Gaussian pdf far from the mean vector (given the covariance matrix) you are likely to obtain 0. But this does not prevent the use of H2M tools in large dimension (I have already used them in dimension 40 or more), the only thing is that the Gaussian parameters should not be initialized too far from ``plausible values''. Remember that exp(-40^2/2) will already be rounded to zero (in double precision) so that these type of problems indeed occur. The way the problem usually appears is that you get the message ``Warning: Divide by zero'' caused by the following line of hmm_fb (or the equivalent line in mix_post):
  alpha(i,:) = alpha(i,:) / scale(i);
because all values of alpha for some time index i are null. If this occurs check your initialization (are the mean vectors centered on the data?) and try increasing the variances (especially useful if you have outliers).
Does hmm_mint operate well?
It is important to understand that hmm_mint does not correspond to an "optimal" initialization (if there is one). It is just an heuristic commonly used in speech processing which consist in chopping each parameter sequence into N segments of equal length where N is the number of states. It will clearly not work if you have many states, many allowed transitions (ie. many entries of the transition matrix not initialized to zero) and few training sequences.
Is it possible to use quantified observations?
No, H2M implements continuous observations as described in section 6.6 of [3]. The so called ``discrete HMM'' which requires prior quantization of the data using VQ (this is the one considered in section 6.4 of [3] as well as in the first part of the tutorial by Rabiner [4]) can not be implemented using H2M (note that these models are not really used anymore nowadays).
Is it possible to use mixture conditional densities?
Not directly. If you think of it though, you will realize that an HMM with N states and mixtures of K Gaussian densities as state conditional distributions is equivalent to an HMM with N*K states with some constraints on the transition matrix. There is however two limitations in using H2M for that purpose: (1) you will have to modify the EM re-estimation formulas to take into account the constraints on the transition matrix (should not be too difficult), (2) you will rapidly have to deal with huge transition matrices.
Why does the covariance matrices becomes ill-conditioned?
Problems with covariance matrices nearly always stem from the fact that there are too many HMM states compared to the available training data. In these conditions it is possible that outliers states which are associated to only one (or to a few) data points (a problem also experienced with vector quantization) will appear during the training. Using fewer states and/or more training data usually solve the problem. You may also switch to diagonal covariance matrices if you are using full covariance matrices. In any case you should read section 2.5.2 on heuristics which may prevent this problem from happening.


Next: Models with multivariate Gaussian Up: Introduction Previous: MATLAB/OCTAVE compatibility

Olivier Cappé, Aug 24 2001