This is an example from Zucchini &
MacDonald’s book on Hidden Markov Models for Time Series (exercise 1.3).

The data is annual counts of earthquakes of
magnitude 7 or greater, which exhibits both overdispersion for a Poisson (where
the mean should equal the variance) as well as serial dependence.

The aim is to fit a mixture of m independent
Poisson distributions to this data, using the non-linear minimizer nlm to minimize the
negative log-likelihood of the Poisson mixture.

Sounds easy right? I have not done much
optimisation stuff so it took me longer than it probably should have. There’s
little better for the ego then getting stuck on things by page 10.

###
Constrained to unconstrained

There are two sets of parameters, the
lambda parameters to the Poisson distributions and the delta mixing
distribution, the latter giving the proportions of each distribution to mix.

The Poisson parameters must be greater than
zero and there are m of them. The mixing distribution values must all sum to
one, and the first mixing value is implied by the m-1 subsequent values.

These are the constraints (lambdas greater
than zero, deltas sum to one), and there are 2m – 1 values (m lambdas, m-1
deltas). We need to transform these to unconstrained values for use with nlm
(becoming eta and tau respectively).

The formulas for transformation are
reproduced from the book here

These transformed values are combined in to
one parameter vector and passed to nlm. We also pass a function that calculates
the negative log likelihood.

In the function, we first convert the
parameters from the unconstrained “working” form to the constrained “natural”
form using the inverse transform, and then use these values to do the
likelihood calculation.

You can see the code here.

###
Outro

Now that I look over the code and write
this out it does seem all fairly straightforward and relatively trivial, but it
was new to me and took a while, so I figured other people might find the
example useful as well.

The code has three examples for the cases m = 2, 3, and
4. Obviously feel free to mess around with the initial parameter estimates to see what comes up.

It does seem to match the fitted models from the book, which I have copied
here:

For all the details, check out the book,
chapter 1!