modal simulation and visualization in finite mixture models
TRANSCRIPT
The Canadian Journal of StatisticsVol. 39, No. 3, 2011, Pages 421–437
La revue canadienne de statistique
421
Modal simulation and visualization in finitemixture modelsDaeyoung KIM1* and Bruce G. LINDSAY2
1Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA 01002,USA2Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
Key words and phrases: Confidence regions; likelihood; mixture model; visualization.
MSC 2010: Primary 62-09; secondary 62F25.
Abstract: Kim & Lindsay (2011a) proposed a new sampling-based visualization methodology, modal sim-ulation, designed to describe the boundaries of the confidence regions generated by an inference function
such as the likelihood. Once the sample points on the boundaries of the targeted confidence sets are created
in a single simulation run, one can use those samples to describe every confidence set of interest without
further numerical optimization. However, this method assumes that one is simulating the samples for the
parameters from the likelihood region with a single mode. In this article we extend the modal simulation
method to be applicable to the likelihood regions in a finite mixture model where there exist multiple modes.
The Canadian Journal of Statistics 39: 421–437; 2011 © 2011 Statistical Society of Canada
Resume: Kim et Lindsay (2011a) ont propose une nouvelle methodologie de visualisation assistee par
echantillonnage, appelee simulation modale, dont l’objectif est de decrire les frontieres des regions de confi-
ance generees par une fonction d’inference telle que la vraisemblance. Une fois que les points echantillonnes
sur les frontieres des regions de confiance selectionnees ont ete obtenus par un unique ensemble de simu-
lations, nous pouvons utiliser ceux-ci pour decrire tous les regions de confiance d’interet sans optimisation
numerique supplementaire. Cependant, cette methode presuppose que les simulations des parametres provi-
ennent d’une region de vraisemblance avec un seul mode. Dans cet article, nous generalisons la simulation
modale pour prendre en charge des regions de vraisemblance provenant d’un modele de melange fini pour
lequel il existe plusieurs modes. La revue canadienne de statistique 39: 421–437; 2011 © 2011 Société
statistique du Canada
1. INTRODUCTION
We often consider an inference function for statistical inference, which is a function of the param-
eter θ and the data x. For example, the likelihood, the score statistic (Rao, 1948), the quadratic
inference function (Lindsay & Qu, 2003) and the empirical likelihood (Owen, 1988; Qin & Law-
less, 1994) are all widely used throughout statistics. These inference functions can be used to
create confidence regions for the parameters.
However, the simple mathematical description of these confidence sets does not automati-
cally lead to easy numerical or pictorial descriptions that one can use in a practical situation.
One might contrast this with a modern Bayesian analysis, where simulation is used to turn the
pictorial representation of confidence sets into a form of data analysis. Kim & Lindsay (2011a)
developed simulation methods for non-Bayesian inference functions. The main achievement of
*Author to whom correspondence may be addressed.E-mail: [email protected]
© 2011 Statistical Society of Canada / Société statistique du Canada
422 KIM AND LINDSAY Vol. 39, No. 3
this article is the extension of those methods to the considerably more complicated setting of mix-
ture models, focusing on the likelihood as the inference function. In particular, we will show how
to describe the confidence sets in the face of both identifiability problems and multiple likelihood
modes.
There are several alternativemethods for confidence set construction. Let us take the likelihood
function for example. Themethod that provides the confidence regionwith the simplest description
formultivariate parameters is theWald-type confidence set based on using an estimated covariance
matrix for themaximum likelihood estimators (MLE). This yields elliptically shaped regions of the
parameter spacewhose boundaries area easily calculated.However, it is known that the likelihood-
based set generally has a more precise coverage probability than the Wald set, especially when
the sample size is not large. TheWald sets also require estimation of the covariance matrix for the
estimator; the different estimationmethods could produce quite different results. A final drawback
is that Wald sets are not invariant to parametrization (Cox & Hinkley, 1974; Meeker & Escobar,
1995; Agresti, 2002; Kalbfleisch & Prentice, 2002).
A completely different type of the confidence set construction is the bootstrap (Efron &
Tibshirani, 1993; Davison & Hinkley, 1997). A number of authors have discussed the theoretical
drawbacks of the bootstrap method compared to using an inference function (Owen, 2001; Lang,
2008). There are also drawbacks from the computational viewpoint. First, the recomputation of
parameter estimates based on new data sets generated from the bootstrap can be quite expensive
in computing time. Second, one is generating the confidence set from a cloud of sampled data
points. It follows that the sets could have many possible boundaries unless one specifies the shape
and orientation of the region (Owen, 2001). Third, this kind of simulation is inherently sparse
near the possible boundaries of the confidence set, and so there is significant statistical error in
describing the boundary (Kim & Lindsay, 2011a).
Kim & Lindsay (2011a) showed that when one has a confidence set generated by an inference
function, one can sample intensively on its well-specified boundary, and so obtain a sharper
picture of its size and shape. In this article we will focus on the use of the likelihood function
in the mixture model. The simulation methodology of Kim & Lindsay (2011a) assumed that one
was simulating from the likelihood with a single mode. However, in data analysis, we often
face cases where the likelihood function has multiple modes and thus corresponding confidence
regions have complicated structure. In addition,Kim&Lindsay (2011b) showed that the likelihood
sets for the parameters in a finite mixture model have a complex structure due to existence
of the two types of nonidentifiability, labelling nonidentifiability (Redner & Walker, 1984) and
degenerate nonidentifiability (Crawford, 1994; Lindsay, 1995).The rest of the article is organized as follows. Section2overviews themodal simulationmethod
proposed by Kim & Lindsay (2011a). In Section 3 we review the issues caused by existence of
nonidentifiabilities in a finite mixture model. In Sections 4 and 5 we propose a new strategy of
using the modal simulation method applicable to a simple case where unimodal partition exists
in the mixture likelihood region. Section 6 extends the modal simulation method developed in
Sections 4 and 5 to the multimodal case where there exists a secondary mode. In Section 7 we
apply our proposed methods to two real examples.
2. BACKGROUND ON THE LIKELIHOOD REGION SIMULATION ANDVISUALIZATION
In this section we describe the topographical structure of the (profile) likelihood regions for a
parameter vector θ in a regular parametric model. We then review the simulation method (Kim &
Lindsay, 2011a), designed to visualize the likelihood regions with a single mode.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 423
2.1. Likelihood RegionsFor n i.i.d. observations {y1, . . . , yn} generated from the parametric family {p(y | θ), θ ∈ � ⊂Rp}, one can construct the likelihood of a p-dimensional parameter vector θ
L(θ) =n∏
i=1
p(yi | θ). (1)
We will denote the true parameter by θτ , and the MLE mode for the parameter maximizing the
likelihood in Equation (1) by θ. We assume the model is regular and the likelihood is bounded.
One can construct the general form of the likelihood confidence region for θ by inverting a
test of the null hypothesis H0 : θ = θ0 based on the likelihood ratio statistic T1 = −2 logL(θ) +2 logL(θ),
CLRc = {θ : L(θ) ≥ c}, (2)
where c = L(θ)e−q(1−α)/2 is an adjustable constant and q(1 − α) is the 1 − α quantile of the
(asymptotic) distribution of T1. We will generally assume that the limiting distribution of T1 does
not depend on θ0 and asymptotic critical values are available for T1 although we can adjust these
values for finite sample sizes.
We will think of the problem of describing this set as a topographic one, in which a p-
dimensional parameter vector θ describes coordinates of a flat surface and the likelihood L(θ)
represents the elevations of the land mass on that surface. The boundary of the set CLRc , namely
{θ : L(θ) = c}, corresponds to a specified contour of the corresponding topographical map. The
shape of the confidence set is determined by this contour.
We call CLRc in Equation (2) as the elevation c likelihood confidence region. In this article
c is interpreted via the confidence level of the corresponding likelihood ratio statistic in the p-
dimensional parameter space which we denote by Confp(c). For example, when c = L(θ), CLRc
is equal to {θ} and corresponding confidence level is 0%. Note that, for illustrative convenience,
we will calibrate the confidence level for each elevation c using the limiting distribution of the
likelihood ratio statistic for an easy transition between the full confidence set and the profile
confidence set. That is, Confp(c) will be based on the chi-squared distribution with p degrees of
freedom. If one wishes to use elevations that provide more accurate confidence levels, one can
do a parametric bootstrap adjustment (Efron & Tibshirani, 1993; Davison & Hinkley, 1997; Kim
& Lindsay, 2011a).
2.2. Profile Likelihood RegionsWhen one is interested in a function of the parameters, one often constructs a profile likelihood and
profile confidence regions for inference. Let β(θ) be a r(≤ p)-dimensional vector of parameters
of interest. The profile likelihood for β(θ) is defined by
Lprof (β) = supθ
{L(θ) : β(θ) = β}. (3)
One can then construct the profile confidence region for β(θ) by inverting a test ofH0 : β(θ) = β0
based on T2 = −2 logLprof (β0) + 2 logLprof (β) where β = β(θ),
CPLRc = {β : Lprof (β) ≥ c}, (4)
where c is an adjustable constant. We here transform the constant c into a confidence level in the
r-dimensional parameter space which we denote by Confr(c).
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
424 KIM AND LINDSAY Vol. 39, No. 3
There is an important relationship between the profile confidence region and the full likelihood
region that we call the nesting property. If θ is a point in the likelihood confidence region CLRc
of Equation (2), by definition, β(θ) must also be in the profile likelihood confidence region CPLRc
of Equation (4) at the same elevation c. It is important to remember that the same elevation c
will have different confidence level interpretations in CLRc and CPLR
c , due to the different degrees
of freedom used to make the two confidence sets. A particular value of c generates a confidence
set with level Confp(c) in CLRc , but it generates a set with a larger level Confr(c) for CPLR
c of
dimension r.
2.3. Modal Simulation and VisualizationKim & Lindsay (2011a) created the modal simulation method for generating samples on the
boundaries of the confidence region generated by an inference function such as the likelihood.
The basic idea of their method is to generate a sample of points from the elevation c likelihood
contour set, {θ : L(θ) = c} that contains theMLE θ (i.e., themode in the likelihood). Their strategy
for generating the sample on {θ : L(θ) = c} can be described as follows:
For now we suppose that there is a single mode in the likelihood. We suppose that we have
set an elevation of interest c based on our desired confidence regions, as in Equation (2) with
c = L(θ)e−q(1−α)/2. Suppose one has found theMLEmode, θ, and formed a covariance estimator
Vθ . Suppose one also defines a ray generated by a vector z ∈ Rp to be θ(ε) = θ + εVθ1/2z where
ε ∈ R. Regarding the formofVθ in a ray, one can use the Fisher information or one of its asymptotic
equivalents. The following two steps will be carried out iteratively.
[Step 1] Generate z from the p-dimensional standard normal distribution.
[Step 2] Determine ε satisfying L(θ(ε)) = c. Denote the computed ε as ε = ε(c, z). We let θ =θ(ε) be the simulated value generated by (c, z). We compute θ for both the positive and
negative ε solutions.
Given a targeted elevation c and given z, the ray starts from θ and heads in direction z/‖z‖.When this ray reaches the boundary of the targeted {θ : L(θ) = c}, this point becomes the sampled
value of θ, depending on the random direction z/‖z‖. One can obtain solutions ε satisfying
L(θ(ε)) = c using an one-dimensional root finding algorithm. If one repeats the calculation of θ
for many generated z, one can obtain a large set of simulated parameter values on the boundary
of {θ : L(θ) = c}.There is one extremely useful feature of themodal simulationmethod. After generating a large
sample from a single simulation run, one can use the points in the sample to picture the profile
likelihood confidence sets for anydesired functions of the parameters, allwithout further numerical
optimization. This feature arises from the nesting property mentioned in Section 2.2. If one has
a sample of B points (θ1, . . . , θB) from {θ : L(θ) = c} and β(θ) is a function of the parameters of
interest, then (β(θ1), . . . , β(θB)) are a set of points from the profile set {β : Lprof (β) ≥ c} (Figure1 in Section 4 gives an illustration of a simulated profile plot from a mixture model).
As noted by Kim & Lindsay (2011a), if there is only one MLE mode in the likelihood and the
targeted set around θ is star-shaped, the likelihood is monotonically decreasing along every ray
from θ, regardless of elevation. Thus, the modal simulation will have only a single solution on
L(θ) = c along the positive and negative ray. If the set is not star-shaped, however, a ray from
θ can have multiple solutions on L(θ) = c. The first solution on the ray starting from θ is in the
star-shaped region for θ. Solutions further out the same ray lay outside the star-shaped region for
θ. We call these extra solutions simulation outliers. Note that the simulation outliers are still in
the confidence region for θ, but the existence of just one such outlier proves that the shape of the
likelihood region is quite different from that of the Wald elliptical region.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 425
There are multiple complications in applying the aforementioned simulation theory tomixture
models. On the other hand, the effort is worthwhile because it provides a direct solution to the
famous mixture labelling problem.
3. BACKGROUND ON FINITE MIXTURE MODELS AND NONIDENTIFIABILITIES
In this section we review a parametric finite mixture model and the two types of nonidentifiability
inherent in the mixture parameters. Suppose that n observations y = (y1, . . . , yn)′, yi ∈ Rq, are
randomly drawn from a K-component mixture density g(y; θ) indexed by a set of parameters
θ ∈ �:
p(y | θ) =K∑
j=1
πjf (y; ξj, ω), (5)
θ =
π1
ξ1
ω
, . . . ,
πj
ξj
ω
, . . . ,
πK
ξK
ω
(6)
where πj is the mixing weight of component j with the constraint 0 ≤ πj ≤ 1 and∑K
j=1 πj = 1,
ξj is the jth component specific parameter vector and a structural parameter ω for the density
function f . Note that each column of θ corresponds to a component. We will suppose that the
parameter space � is the full product space (πj’s in the simplex and ξj’s in the cross product
space). One can associate θ in Equation (6) with themixing distribution, denoted byQ<θ>, which
is the discrete distribution function with mass πj at ξj : p(y | θ) = ∫f (y; ξ, ω)dQ<θ>(ξ). The
mixing distribution Q<θ> is often identifiable (Teicher, 1960, 1963; Yakowitz & Spragins, 1968;
Lindsay, 1995). In this article we assume that the mixing distribution Q<θ> associated with the
parameter θ in Equation (5) is identifiable.
There are two key nonidentifiabilities in Equation (5), the degenerate nonidentifiability (Craw-ford, 1994; Lindsay, 1995) and the labelling nonidentifiability (Redner&Walker, 1984). Boundary
nonidentifiability means that for certain parameter values, the actual number of components is less
than K. For example, suppose that ξj is unidimensional and there is no structural parameter in the
two-component parameter space. Then the following three subsets of the boundary of the parame-
ter space θ =[(
π1
ξ0
),
(1 − π1
ξ0
)]for any π1 in [0,1],
[(0
ξ
),
(1
ξ0
)]and
[(1
ξ0
),
(0
ξ
)]for any ξ in the ξ parameter space, all generate a single density that has one component with
parameter ξ0. In this article we assume that the number of components K is fixed and known, so
that the true density does not display this degeneracy. Notice that when ξ1 = ξ2, the parameter θ
is degenerate but also in the interior of the cross product space so that this is not of the typical
“boundary of the parameter space” degeneracy.
By labelling nonidentifiability we mean that for fixed K, the parameters θ in Equation (6)
are only identifiable up to a column permutation of θ: : p(y | θ) = p(y | θσ) for θ �= θσ where θσ
is a copy of θ with columns permuted according to any permutation σ of the identity permuta-
tion (1, . . . , K). For example, when K = 2 and ξj is unidimensional, θ =[(
π1
ξ1
),
(π2
ξ2
)]=[(
0.4
1
),
(0.6
2
)]has the same distribution as θσ =
[(0.6
2
),
(0.4
1
)]. Due to this noniden-
tifiability, labels on θ are not identifiable. In particular, there areK! true values corresponding to all
possible column permutation of the true value θτ . Further, the modes of the mixture likelihood in
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
426 KIM AND LINDSAY Vol. 39, No. 3
Equation (1) come in amodal group ofK! points because there are also (at least)K!modeswith the
same likelihood. If there is only one modal group corresponding to all the permutations of a single
mode, we will say that the problem is unimodal. If not, we will say that there are secondarymodes.
In this under-identified setting one needs to modify the standard definitions of confidence
regions and coverage probabilities. We will seek labelled confidence regions that are constructedusing identifiable subsets of the parameter space. In such a set one will find at most one of the K!
permutations of a parameter and no points having the degenerate nonidentifiability.Note that ifB is
an identifiable subset of parameters, then the setBσ , the elements ofBwith columnspermuted byσ,
is also an identifiable subset, andwould also be an equivalent candidate to be the confidence set.We
will say that a labelled confidence region methodology has labelled coverage probability 1 − α
if 1 − α is the probability that the constructed region contains exactly one of the K! permutations
of the true value (since the regions are identifiable sets, they can contain at most one version of
the true value). In practice we will use the full likelihood region, which is not identifiable, and
then partition it into K! identifiable subsets that are permutation images of each other, any one of
which could be selected to use as the labelled confidence region.
The construction of labelled credible/confidence sets for mixture parameters has a long his-
tory (Jasra, Holmes, & Stephens, 2005; Yao & Lindsay, 2009). One family of solutions comes
from using parameter constraints. A second family is based on clustering the output of MCMC
algorithms, which in effect creates a connected identifiable subset. In this article we will let the
likelihood itself do the clustering.
4. MODAL SIMULATION FOR THE MIXTURE LIKELIHOOD WITH UNIMODALSTAR-SHAPED PARTITION
In this section we give a review on topological theory of the mixture likelihood regions. We will
focus in this section on the ideal case where the likelihood region can be decomposed into K!
identifiable subsets, and each subset is star-shaped and has only one MLEmode (Kim & Lindsay,
2011b). We then develop a strategy of modal simulation for such an ideal case and illustrate its
application to one simulated dataset.
4.1. Topology TheoryAlthough there are two types of nonidentifiability in the mixture parameter θ in Equation (6),
many statistical analyses have been carried out as if they are identifiable. This is possible because
the point estimator itself is usually well-defined by the results of an algorithm maximizing the
mixture likelihood. Moreover, there is an asymptotic theory which guarantees the existence of
consistent way to choose a permutation of theMLE and asymptotic normality for the chosenMLE
as the sample size grows large (Redner & Walker, 1984).
Kim & Lindsay (2011b) argued that the asymptotic theory can always be used to create a
meaningful mixture likelihood region using Equation (2) provided that one redefines the meaning
of confidence set to account for lack of labelling identifiability, and then chooses a sufficiently
small confidence level. At confidence level 0%, one can construct the likelihood region using the
constant c = L(θ), yielding the global confidence region CLRc = {θ : L(θ) ≥ L(θ)}, which is the
set of K! modes corresponding to the permutations of θ. Provided that c is not too much smaller
thanL(θ), the global confidence region will consist ofK! disjoint regions, one around each mode,
and each region is a connected and identifiable subset. These regions are also permutation images
of each other. One can then select any one of those regions to be the labelled confidence set.
To be more precise, Kim & Lindsay (2011b) defined the modal region determined by theelevation c and θ, denoted by Cc(θ), to be the set of all θ that are connected to θ by a continuous
path entirely in the mixture likelihood region at the elevation c of Equation (2). By this definition
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 427
one can obtain the modal region around another mode of the modal group by permuting all
elements in Cc(θ) : Cc(θσ) = Cσ
c (θ) for any permutation σ. For the elevations c just below L(θ),
the global likelihood region has a decomposition into K! disjoint and identifiable modal regions.
Asymptotically, this construction method will have the correct labelled coverage probability
based on our modified definition of coverage. When this situation holds at a constant c, we will
say that we have an identifiable confidence partition at that elevation.
However, if the value of c corresponding to the desired confidence level is too small, the
likelihood region no longer has an ideal decomposition into K! disjoint and identifiable modal
regions. A simple example of this occurs when c = ψdgnt, the highest likelihood in the degenerate
set, as then the full likelihood set contains a degenerate point, and so there is no way to partition it
into K! identifiable subsets. Note that one can easily obtain ψdgnt by calculating the likelihood at
the MLE for the (K − 1) component mixture model. When there is not an identifiable confidence
partition, there is no longer a natural way to create an identifiablemodal region from the likelihood
region to be called the labelled confidence region.
The existence of an identifiable partition at any particular elevation c depends on the topology
of the likelihood surface. This topology can be analyzed using Morse Theory (Matsumoto, 2002)
under suitable smoothness assumptions on the likelihood function. The ideal case occurs as fol-
lows:Letψmle be the elevationof theMLEmodes and letψcrit be the elevationof the highest critical
value that is not theMLE. As shown byKim&Lindsay (2011b), ifψcrit < c ≤ ψmle, then the con-
fidence region has an identifiable partition for which each element of the partition contains a single
MLE mode, but no other critical values. In this case we will say we have a unimodal partition.
4.2. Modal Simulation Strategy and a Simulated ExampleWenowapplying themodal simulationmethodofSection2.3 in the context of a unimodal partition.
Recall from that section that we assumed that one was simulating from a likelihood with a single
mode. When a unimodal partition exists at the chosen elevation c, one can apply this method
by choosing any one of the MLE group, say θ, to simulate from, knowing that simulating from
other MLE modes would only permute the output. Recall that the simulated values correspond to
solutions to L(θ) = c found along randomly selected rays from the mode.
Since the modal region Cc(θ) is an identifiable subset under existence of a unimodal partition,
the simulatedpoints on its boundarywill comeautomatically labelled by the choice of the particular
MLEmode θ. Please note that this feature provides crucial information for addressing the labelling
problem in a finite mixture model, as points simulated from the full dimensional likelihood have
the same labels as θ. One can also use these labels when constructing the simulated profile region.
It is important to note that a conventional numerical profile approach would not provide labels.
There are some technical difficulties with implementation of this strategy that we postpone
till the next subsection. We first illustrate how the method works in a two-component normal
mixture model with equal variances, θ =
π1
ξ1
ω
,
π2
ξ2
ω
. We simulated n = 500 obser-
vations from θ =
0.4
−1
1
,
0.6
1
1
, and obtained the MLE for θ using the expectation-
maximization (EM) algorithm (Dempster, Laird&Rubin, 1977). TheML estimate had parameters
θ =
0.59
0.95
0.89
,
0.41
−1.00
0.89
.
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
428 KIM AND LINDSAY Vol. 39, No. 3
Although the full parameter space is four-dimensional, one can view features of the full
likelihood structure through two-dimensional profiles.We here use a (π1, ξ1) plot because it nicely
shows how the two mixture components differ from each other. In this example the number of
components is two. An important fact that is useful in examining such plots is that if one can see
the two separated regions in the profiles at the elevation suitable for a unimodal partition, then it
follows that the full four-dimensional likelihood region also separates into two identifiable sets
at the same elevation.
Figure 1a,b shows the numerical profile contour and the simulation-based profile plot for
(π1, ξ1) at the elevation c with Conf2(c) = 0.95 (corresponding to Conf4(c) = 0.80). Note that a
numerical profile likelihood contour was obtained by maximizing L(θ) over (ξ2, ω) for each fixed
(π1, ξ1). In such a plot, “label switching” will occur, so that we see the image of the MLE mode
as well as its permuted version. In Figure 1a we can see that the two modal regions corresponding
to the MLE and its permutation have appeared in the profile plot: for the given MLE θ, one mode
had
(π1
ξ1
)=
(0.59
0.95
)and θ
σhad
(π1
ξ1
)=
(0.41
−1.00
).
For Figure 1b we used 3,000 rays. We also check for violations of the star-shaped assumption
usingmethodswe describe in the next section, and found no evidence. Sincewe simulated samples
forming the targeted likelihood region from just one MLEmode, setting the first component to be(0.59
0.95
)in θ, there is just one connected 95% profile confidence region that is nearly elliptical,
and has unique labels. That is, the simulation method identifies the first component in Cc(θ)
uniquely, and it corresponds the upper mode in the profile plot of Figure 1a. Note that the same
labelling system in Cc(θ) can be used to visualize the second modal region in Figure 1a. One
simply plots the simulated parameter values from the second component instead of the first. We
can measure the success of the simulation by noting that the simulated samples successfully
described the boundary of the elliptical shaped likelihood contour.
5. MODAL SIMULATION FOR THE MIXTURE LIKELIHOOD WITHNONSTAR-SHAPED PARTITIONS
In this section we design a modal simulation strategy to be used when a unimodal partition exists
at the targeted elevation, but one wants to check whether the region is star-shaped. Our method
involves proceeding further out each ray, and checking for solutions to L(θ) = c beyond the first
one on the ray. We know that the first solution on the ray is path-connected to the chosen mode,
the path being the ray itself. The problem we now face is that the new solutions may no longer be
path connected to the original mode, but rather be path connected to one of the permuted modes.
Therefore, we need tools that will identify the path connections between the contour solutions
and the modes.
We start with some definitions. Let Sc(θ) be the set of θ that lies in the star-shaped region
of elevation c generated by mode θ. That is, given a ray from θ, the points along with ray are in
Sc(θ), up to and including the value of θ that is the first solution to L(θ) = c. We note that Sc(θ)
is contained in the modal region of the same elevation for θ, which we have denoted by Cc(θ),
because each element has a path to it (namely the ray itself) entirely in Cc(θ). We know that the
first solution on the ray starting from θ belongs in Sc(θ) and hence the corresponding modal region
Cc(θ). The solutions further out on the same ray lay outside Sc(θ) are the simulation outliers.
The interpretation of these outliers is rather complex, but they do provide useful information
about the structure of the confidence region. If θ is a simulation outlier, there are two possible
explanations for its existence. First, Cc(θ) could fail to be star-shaped, and θ is a boundary point
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 429
π1
ξ 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3
−2
−1
0
1
2
395% setMLETrue
π1
ξ 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3
−2
−1
0
1
2
3a b
Figure 1: (a) Shows a numerical profile likelihood contour and (b) shows a modal simulation-based profileplot for (π1, ξ1) using n = 500 and Conf2(c) = 0.95.
for it that is outside Sc(θ). We will call this a star-shaped outlier. Secondly, it could lie outside
Cc(θ), but still exist, because the ray has entered the boundary of another element of the partition,
say Cc(θσ). We will call this a wrong mode outlier. Note that the existence of the star-shaped
outliers provides strong evidence that the likelihood regions are shaped rather differently from
Wald regions. However, the wrong-mode outliers are quite benign for the purpose of describing
the targeted modal region Cc(θ).
We can use the EM algorithm to determine the type of simulation outlier we have, as the EM
algorithm starting from any θ will climb the likelihood monotonically until it reaches a critical
point. We start the EM algorithm at the simulation outlier. If the algorithm climbs back to θ, then
the outlier is a star-shaped one. If it climbs to a permutation of θ, it is a wrong-mode outlier.
The existence of just one star-shaped outlier proves that Cc(θ) is not star-shaped. However,
the existence of wrong mode outliers does not, in itself, provide any evidence on this point. We
will simply discard the wrong-mode outliers from the analysis.
To see performance of the modal simulation strategy described above, we construct the likeli-
hood regions for the parameters in a two-component normal mixture model with equal variances,
θ =
π1
ξ1
ω
,
π2
ξ2
ω
. We simulated 100 observations from θ =
0.4
−1
1
,
0.6
1
1
,
and obtained the MLE for θ using the EM algorithm. The ML estimate had parameters θ =
0.62
1.09
0.83
,
0.38
−0.88
0.83
.
Figure 2a,b shows the numerical profile contour and the simulation-based profile plot for (π1,
ξ1) at the elevation with Conf2(c) = 0.7415 (corresponding to Conf4(c) = 0.3918). We observe
that the profile likelihood regions, have two identifiable subsets, one for each MLE mode. No-
tice that the shapes of these two modal regions are still star-shaped even though they seem to
bear less resemblance to two ellipses. For the modal simulation 3,000 rays were used and we
simulated samples forming the targeted likelihood region from just one MLE mode, setting the
first component to be (0.62,1.09) in θ. We see that the sampled values for (π1, ξ1) successfully
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
430 KIM AND LINDSAY Vol. 39, No. 3
π1
ξ 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3
−2
−1
0
1
2
374.148% setMLETrue
π1
ξ 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3
−2
−1
0
1
2
3a b
Figure 2: (a) Shows a numerical profile likelihood contour and (b) shows a modal simulation-based profileplots for (π1, ξ1) using n = 100 and Conf2(c) = 0.7415.
captured the numerical boundary of the modal region for (0.62, 1.09), corresponding the upper
mode in the profile plot of Figure 2a. Note that there were no star-shaped outliers and there were
two wrong-mode outliers (represented by xs in Figure 2b) belonging to the modal region for the
second component, (0.38, −0.88).
In our experience it is difficult to find cases where there exists a unimodal partition and
the labelled confidence regions fail to be star-shaped. Thus, in some sense the checking of this
assumption provides extra computational labour without much compensation. However, we will
tackle multimodal problems later, and see that the outlier problem is more serious in them.
6. SOME FURTHER THEORETICAL RESULTS FOR UNIMODAL PARTITIONS
In Section 4.1 we introduced a sufficient condition for existence of a unimodal partition given by
Kim&Lindsay (2011b): the elevation c should be betweenψcrit andψmle, and be higher thanψdgnt.
If the elevation c of interest is between ψdgnt and ψcrit, the story of the existence of the labelled
confidence region becomes complicated. Unfortunately, it is not so easy to determine ψcrit. The
EM algorithm makes it relatively straightforward to determine the modes of a likelihood, but it
generally does not find the saddlepoints, one of which could yield ψcrit.
Kim & Lindsay (2011b) proved that if the component parameter ξj is univariate, there is a
simpler sufficient condition for the existence of a unimodal partition. If there exists a secondary
mode between ψmle and ψdgnt whose elevation is denoted by ψ2nd, the sufficient condition for
the elevation c generating the unimodal partition is ψ2nd < c ≤ ψmle. Hence the method can be
carried out using the results of systematic algorithmic searches for the modes of the likelihood,
a process that is usually necessary to verify that one has the MLE (Lindsay, 1995; McLachlan &
Peel, 2000). If there are no modes between ψdgnt and ψmle, we can define ψ2nd to be ψdgnt and
thus we have a very simple necessary and sufficient condition for a unimodal partition, which is
ψdgnt < c ≤ ψmle.
Referring back to the two simulated examples in Section 4 and 5, the mixture model we used
had univariate component specific parameters and so this rule applied. When we used the EM
algorithm for finding the MLE in both examples, multiple starting values were employed. In
both examples we did not find any secondary modes (and so ψ2nd = ψdgnt). The confidence level
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 431
corresponding to ψdgnt in the four-dimensional space was Conf4(ψdgnt) = 0.986 at n = 500 and
Conf4(ψdgnt) = 0.505 at n = 100. In both examples, therefore, Conf4(ψdgnt) represents the exact
upper bound on the confidence levels for which a unimodal partition exists. This is in agreement
with what was observed in both examples, existence of two disjoint and identifiable subsets, one
for each MLE mode of the likelihood.
When the component-specific parameter is multivariate, however, Kim & Lindsay (2011b)
showed by example that a unimodal partition can fail to exist at the elevation c = ψcrit between
ψdgnt and ψmle, even when ψcrit is not equal to ψ2nd. In other words, it is technically feasible
that there exists a nondegenerate saddlepoint at which the two modal regions around two MLE
modes connect while the elevation of that saddlepoint is higher thanψdgnt. This implies that in the
multivariate caseψdgnt is just an inexact lower bound for a unimodal partition when there exist no
secondary modes. Kim & Lindsay (2011b) developed an algorithmic method designed to check
for the existence of such connecting saddlepoints higher than ψdgnt in a case of multivariate ξj .
One can use these results to find a safe lower bound for use with the simulation method of this
article.
7. MODAL SIMULATION FOR THE MIXTURE LIKELIHOOD WITH MULTIPLEMODAL GROUPS
In this section we extend the modal simulation method of the preceding sections to a complicated
case where there is more than one critical point in each modal region and the second critical
point corresponds to a secondary mode. Note that we here consider a finite mixture model with
univariate component-specific parameters.
7.1. Topology of the Likelihood with Multimodal GroupsWhen there is more than one modal group in the likelihood for the K component mixture model,
the number of distinct modal regions is not necessarilyK!, depending on the elevation of interest.
We here describe a complex topology of the mixture likelihood with a secondary modal group by
using a simulated example where there is a significant secondary mode.
We generated 75 observations from a three-component normal mixtures, 0.33 N(−3, 1)
+ 0.34 N(0, 1) + 0.33 N(3, 1). Then we fitted a three-component normal mixture with a
fixed variance, 0.25. The two modes were θmle =[(
0.257
−3.166
),
(0.350
−0.782
),
(0.392
2.430
)]with
�(θmle) = −211.7 and θ2nd =[(
0.367
−2.711
),
(0.351
0.1058
),
(0.282
2.917
)]with �(θ2nd) = −213.54.
The maximum log likelihood in the degenerate class was −298.11. That is, the elevation of a
secondary mode ψ2nd is lower than that of the MLE mode ψmle, but higher than ψdgnt, that of the
MLE for a two-component mixture model. Note that Conf5(ψ2nd) = 0.40 and Conf5(ψdgnt) = 1.
Note that there are 3! = 6 permutation of each mode possible.
Now the structure of the confidence sets is complicated by the distinct modal regions of the
two modes. Figure 3 shows the simulation-based profile plot for (π1, ξ1) at elevations c with
Conf2(c) = (0.84, 0.90, 0.95), corresponding to Conf5(c) = (0.40, 0.53, 0.69). There are three
square points corresponding to the MLE mode. Although only three modes are visible, it is
because we are using a (π1, ξ1) profile plot. Thus, for example, the two modes with (.257,−3.166)
labelled as the first component appear as one. There are also three circle points corresponding to
the secondary mode. These plots were created using a new modal simulation strategy, which we
will explain in Section 7.2.
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
432 KIM AND LINDSAY Vol. 39, No. 3
Figure 3: Modal simulation-based profile plots for (π1, ξ1). (a) Conf2(c) = 0.84; (b) Conf2(c) = 0.90; (c)Conf2(c) = 0.95.
From Section 6, when the component-specific parameter is univariate and there exists a sec-
ondary mode above ψdgnt, we know that there exists a unimodal partition of the likelihood region
for any elevations c between ψ2nd and ψmle. That is, if one’s choice of confidence level results in
an elevation c between ψmle and ψ2nd, the mixture likelihood has one group of K! MLE modal
regions, one around each MLE mode, and each modal region is a connected and identifiable
subset. Figure 3a shows the profile plot for (π1, ξ1) at the elevation for Conf2(c) = 0.84 (i.e.,
Conf5(c) = 0.40), and we can see that a unimodal partition for θmle exists.
However, as the elevation c is decreased to just below ψ2nd, a new group of K! secondary
modal regions is formed, one around each secondary mode, and there are 2K! distinct modal
regions at this elevation. In this example Conf5(ψ2nd) = 0.40 and Conf2(ψ2nd) = 0.84, so the
second mode will come into play in the construction of the bivariate profiles regions whenever
one uses the conventional confidence levels, Conf2(c) = (0.9, 0.95). As shown in Figure 3b, we
have the two modal groups, one for θmle and the other for θ2nd, but they are disconnected. To
construct a labelled region, we would need to match the ellipses in pairs corresponding to the
labels for component 1, 2 and 3.
As the elevation goes down further, the 2K!(= 12) distinct modal regions, each grow, until
individual modal regions connect to each other. The first points to make these connections are
necessarily saddlepoints of the likelihood. These connectionsmust obey the permutation rules: ifA
connects toB, thenAσ connects toBσ . Suppose that at the given elevation there areK! connecting
paths that connect primary modes with secondary modes, in pairs. Let one such saddlepoint be
θsaddle, with elevation ψsaddle. If this merging of the two modal groups results in one group of K!
identifiable subsets at c = ψsaddle above ψdgnt, then we still have a way to construct a labelled
confidence set for the parameters: just select one of the K! identifiable paired subsets. In this case
each of the identifiable subsets has two modes, θmle and θ2nd.
In our example the estimated saddlepoint is θsaddle =[(0.328
−2.858
),
(0.348
−0.220
),
(0.324
2.714
)]with �(θsaddle) = −214.26, and the correspond-
ing confidence level for its elevation is Conf5(ψsaddle) = 0.598 and Conf2(ψsaddle) = 0.923 (the
estimated saddlepoint is represented by a triangle). Thus, the two modal groups are connected in
two-dimensional profile plots with Conf2(c) = 0.95 (see Figure 3c).
7.2. Modal Simulation StrategyIf one wishes to display the topological structure of the mixture likelihood with more than one
modal group with K! elements, we would suggest that one first find the secondary modes in the
full likelihood surface. We then suggest locating the saddlepoints that connect the primary and
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 433
secondary modes to determine if we can construct an identifiable partition. We have developed
an approach to finding the saddlepoints that are based on the following characterizations of a
saddlepoint: (1) the likelihood gradient at the saddlepoint is zero and (2) the negative Hessian
matrix has one ormore negative eigenvalues.We used the steepest descent algorithm for searching
for saddlepoints (see Section 7.3 for more detail).
In order to describe the mixture likelihood where there exist other modal groups than the
MLE group, we propose to simulate from one representative mode of each modal group. Given
two modes, θmle and θ2nd with ψmle > ψ2nd, for example, one can apply the modal simulation
method described in Section 2.3 to each of them and then obtain samples forming the targeted
modal regions for each mode.
When simulated values from the modal simulation are outliers—that is, are outside the star-
shaped region for a chosenmode—and the likelihood ismultimodal, then the saddlepoint elevation
plays an important role in the classification of the outlier. To simplify our discussion, suppose
there exists a group of saddlepoints that connects the two modal regions with elevation ψsaddle
above ψdgnt. Suppose further that one uses the modal simulation method for θmle and generates a
simulation outlier θ having elevation c.
In order to classify this outlier, we need to consider three cases. In the first case, when the
elevation c is above ψ2nd, the classification of the outlier is the same as that of Section 5. This
is because there is only one modal group for θmle. For the elevations below ψ2nd, however, we
need to consider the connection between the two modal regions. As a second case, suppose the
elevation of interest c is higher than ψsaddle. Then the modal regions for θmle are separated from
those for θ2nd. That is, in this case we have the modal regions Cc(θmle) that are disjoint from the
Cc(θ2nd). The full likelihood region consists of the union of these two modal groups. However,
we can still use the EM algorithm to determine the proper assignment of any simulation outliers.
If an EM from θ converges to θmle, then θ is a star-shaped outlier in describing the modal region
for θmle, Cc(θmle). Otherwise, θ is a wrong-mode outlier.
As our third case, suppose the elevation c satisfiesψdgnt < c < ψsaddle. For such elevations the
modal regions for θmle are pairwise connected with those for θ2nd in which case there would again
be K! identifiable subsets, but two modes within each, which we might write as Cc(θmle, θ2nd).
In this case, the region Cc(θmle, θ2nd) is not likely to be star-shaped, although it could equal the
union of the two star-shaped regions generated by each mode. Since it is clear that such a region
is awkwardly shaped (see Figure 3c in Section 7.1), we recommend not worrying about defining
star-shaped outliers. More importantly, we can still take simulation outliers, and determine if they
are wrong mode outliers as follows. Start the EM at θ, and if it does not converge to θmle or θ2nd,
call it a wrong mode outlier and discard it.
For application of the modal simulation strategy described above to the simulated ex-
ample introduced in Section 7.2, we used the six elevations corresponding to Conf5(c) =(0.21, 0.40, 0.53, 0.69, 0.75, 0.90) and 5,000 rays at each elevation. Note that the correspond-
ing values of Conf2(c) were 0.7, 0.84, 0.9, 0.95, 0.964, and 0.99, respectively. There was no
outlier in the modal simulation for θmle using any of the six chosen elevations. In the modal
simulation for θ2nd with the lower four elevations, however, there were four wrong-mode out-
liers at Conf5(c) = 0.53 and there were eight star-shaped outliers at Conf5(c) = 0.69 (i.e.,
Conf2(c) = 0.95). That is, the secondary regions were not star-shaped at the latter confidence
level.
7.3. Search for Saddlepoints: More DetailsSection 7.2 showed thatwhen there aremore than onemode, say the primary and secondarymodes,
and one wishes to form an identifiable partition, the search for the saddlepoints connecting them
is important. We here describe our strategy for searching for saddlepoints.
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
434 KIM AND LINDSAY Vol. 39, No. 3
First, given an initial value, one should find a local minimum of the norm of the likelihood
gradient by using the steepest descent algorithm. If the minimized norm is numerically zero (less
than a prespecified value), then we check if the minimizer point has right Hessian eigenvalue
structure to be a saddlepoint rather than a maximum. Note that in this search one might find a
new mode in addition to the existing modes and saddlepoints. A general search could also find
saddlepoints that do not connect the two modal regions. For the use of the gradient algorithm it
is important to obtain starting values of high quality in searching for the critical points (including
the saddlepoints) of the likelihood. To do so, we use information from our simulation. We first
find the simulated point with the smallest likelihood gradient norm at given elevation and then
use it as a starting value for the saddlepoint algorithm described above. We note that there exists
a wider literature on the construction of saddlepoint algorithms; see for example, Pang (2010).
We applied the saddlepoint algorithm described above to the simulated example in Section 7.1.
We started the algorithm from ten starting values one selected from each of the six elevations for
θmle and one fromeach of the four elevations for θ2nd. The starting valueswere the simulated points
for θ with the smallest likelihood gradient norm at each elevation. Note that the number of starting
values leading the algorithm to θmle, θ2nd, and θsaddle were two, one, and seven, respectively. In
other words, this approach can lead the saddlepoint algorithm to locate the desired saddlepoint
that connects the two modal regions, but sometimes it led back to the original modes. With regard
to our convergence criteria, the norm of the gradient function at θsaddle was 5.84E−05, and the
negative Hessian matrix had one negative eigenvalue and four positive eigenvalues.
8. DATA ANALYSIS
In this section we present two real examples to illustrate one important feature of the modal simu-
lation described in Section 2.3: Suppose one wishes to display two-dimensional profile confidence
regions for a specific confidence level, such as 0.95. One can then determine the elevation c corre-
sponding to Conf2(c) = 0.95, and run the simulation algorithm one time. One can then visualize
every profile confidence set using that output. In the two examples we will construct different
types of confidence regions: for the first example, we will give a two-dimensional profile plot for
mixing proportion and component parameter of each component, and for the second example, the
profile will be for pairwise component parameters.
Figure 4: (a) A histogram and a fitted mixture density (solid red) for SLC data; (b) approximate 98.6%simulation-based profile plot for (π, ξ) (red squares represent the MLE for π and ξ). [Color figure can be
seen in the online version of this article, available at http://wileyonlinelibrary.com/journal/cjs]
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 435
Example 1. Figure 4a is the histogram of red blood cell sodium–lithium countertransport (SLC)
data analyzed in Roeder (1994). The data from 190 individuals is characterized by a single type of
three genotypes A1A1, A1A2, and A2A2. Roeder (1994) fitted a three-component normal mixture
model with equal variances to this data. The MLE for θ =
π1
ξ1
ω
,
π2
ξ2
ω
,
π3
ξ3
ω
was
θ =
0.78
0.22
0.003
,
0.20
0.38
0.003
,
0.02
0.58
0.003
. Note that we used multiple starting values in the
EM algorithm, and found that there was only the MLE modal group.
We calculated the confidence level corresponding to the elevation of the MLE for a class of
degenerate parameters (i.e., a two-component normal mixture with unknown equal variance)
in the full parameter space of dimension 6, which was Conf6(ψdgnt) = 0.86 (corresponding to
Conf2(ψdgnt) = 0.998). Since ξj is univariate,we have a unimodal partition at any six-dimensional
confidence level below 86%. However, we can construct two-dimensional profiles up to 99.8%
confidence.
Figure 4b is a simulation-based profile plots for (π, ξ) at the elevation c with Conf1(c) =0.986 (corresponding to Conf6(c) = 0.576), showing the relationship between πj and ξj for each
component in a single plot. Note that we employed a modal simulation method with 4,000 rays,
and found that there was no star-shaped outlier.
Example 2. The second example concerns presenting the relationships between the multiple
parameters in a Poissonmixturemodel. The datawe use here is based on a cohort study in northeast
Thailand (Schelp et al., 1990) where the health status of 602 preschool children checked every 2
weeks from June 1982 until September 1985. Each childwas examined if she/he showed symptoms
of fever, cough, running nose, or these symptoms together. The data were the frequencies of these
illness spells during the study period (see the left plot of Figure 2 for a histogram). Bohning
et al. (1992) and Schlattmann (2005) fitted four-component Poisson mixture model to this data:
g(y; θ) = ∑4j=1 πj(e
−ξj ξyj /x!) where θ =
[(π1
ξ1
),
(π2
ξ2
),
(π3
ξ3
),
(π4
ξ4
)]. The MLE for
θ θ =[(
0.197
0.14
),
(0.48
2.82
),
(0.27
8.16
),
(0.05
16.16
)](see the left plot of Figure 2 for a fitted
mixture). The log likelihood of the MLE for the class of degenerate parameters (i.e., a three-
component Poisson mixture model) in the full parameter space of dimension 7 was −1568.28
and then Conf7(ψdgnt) was larger than 0.99. Since ξj was univariate and there was no secondary
mode, Conf7(ψdgnt) was the exact upper bound on the confidence levels for a unimodal partition
at this data.
For the simultaneous inference on the four-component-specific parameters (ξ1, ξ2, ξ3, and
ξ4), we construct the profile confidence sets for every pair of these parameters (six pairs). For
application of the modal simulation method to this data we used 6,000 rays and there was no star-
shaped outlier. The right plot of Figure 2 is a matrix scatterplot of a sampling-based profiles for
every pair of (ξ1, ξ2, ξ3, and ξ4) at the elevation c with Conf2(c) = 0.95. Notice that the samples
from a single simulation can represent the correlative relationship between the parameters of
interest. We also observe that there is boundary effect for ξ1 and the likelihood provides proper
confidence sets at boundaries, which is an argument for using the likelihood-based confidence
sets, instead of the Wald confidence sets (see the first row in the matrix scatter plot).
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique
436 KIM AND LINDSAY Vol. 39, No. 3
Figure 5: (a) A histogram and a fitted mixture density for the northeast Thailand morbidity data (a solidred is an estimated mixture density and a broken blue is a fitted component density); (b) a matrix scatterplotof 95% sampling-based profiles for every pair of (ξ1, ξ2, ξ3, and ξ4). [Color figure can be seen in the online
version of this article, available at http://wileyonlinelibrary.com/journal/cjs]
9. CONCLUSION
In this article we proposed a strategy of using the modal simulation to describe the labelled
confidence regions generated by the likelihood in a finite mixture model. We showed by examples
that the modal simulation method, plus accompanying data analysis, provides a wider and more
useful set of tools than standard numerical analysis in describing the mixture likelihood region
for the parameters of interest, even when there are multiple modes.
BIBLIOGRAPHYAgresti, A. (2002). Categorical Data Analysis, 2nd ed., Wiley, New York.
Bohning, D., Schlattmann, P. & Lindsay, B. G. (1992). C.A.MAN-computer assisted analysis of mixtures:
Statistical algorithms. Biometrics, 48, 283–303.Cox, D. R. & Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London.
Crawford, S. L. (1994). An application of the Laplace method to finite mixture distributions.
Journal of the American Statistical Association, 89, 259–267.Davison, A. C. & Hinkley, D. V. (1997). Bootstrap Methods and Their Application, Cambridge University
Press, Cambridge.
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the
EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.Efron, B. & Tibshirani, R. J. (1993) An Introduction to the Bootstrap. Chapman & Hall.
Jasra, A., Holmes, C. C. & Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label
switching problem in Bayesian mixture modeling. Statistical Science, 20, 50–67.Kalbfleisch, J. D. & Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd ed.,
Wiley, New York.
Kim, D. & Lindsay, B. G. (2011a). Using confidence distribution sampling to visualize confidence sets.
Statistica Sinica, 21, 923–948.Kim,D.&Lindsay, B. G. (2011b). Empirical Identifiability and the Topology ofMixture LikelihoodRegions.
Submitted for publication.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs
2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 437
Lang, J. B. (2008). Score and profile likelihood confidence intervals for contingency table parameters.
Statistics in Medicine Science, 27, 5975–5990.Lindsay, B. G. (1995). Mixture Models: Theory, Geometry, and Applications. NSF-CbvS Regional
Conference Series in Probability and Statistics, Vol. 5, Institute of Mathematical Statistics:
Hayward, CA.
Lindsay, B. G. & Qu, A. (2003). Inference functions and quadratic score tests. Statistical Science, 18, 394–410.
Matsumoto, Y. (2002). An Introduction to Morse Theory. Translations of Mathematical Monographs, Vol.
208, American Mathematical Society, Providence.
Meeker, W. Q. & Escobar, L. A. (1995). Teaching about approximate confidence regions based on maximum
likelihood estimation. The American Statistician, 49, 48–53.Owen, A. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75,
237–249.
Owen, A. (2001). Empirical Likelihood. Chapman & Hall, New York.
Pang, C. H. J. (2010). Level set methods for finding saddle points of general Morse index. Available from
http://arxiv.org/pdf/1001.0925v1.
Qin, J. & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics,22, 300–325.
Rao, C. R. (1948). Large sample tests of statistical hypotheses concerning several parameters with applica-
tions to problems of estimation. Proceedings of the Cambridge Philosophical Society, 44, 50–57.Redner, R. A. & Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm.
SIAM Review, 26, 195–239.Roeder, K. (1994). A graphical technique for determining the number of components in amixture of normals.
Journal of the American Statistical Association, 89, 487–495.Schelp, F. P., Vivatanasept, P., Sitaputra, P., Sornmani, S., Pongpaew, P., Vudhivai, N., Egormaiphol, S. &
Bohning, D. (1990). Relationship of the morbidity of under-fives to anthropometric measurements and
community health intervention. Tropical Medicine and Parasitology, 41, 121–126.Schlattmann, P. (2005). On bootstrapping the number of components in finite mixtures of Poisson distribu-
tions. Statistics and Computing, 15, 179–188.Teicher, H. (1960). On the mixture of distributions. The Annals of Mathematical Statistics, 31, 55–73.Teicher, H. (1963). Identifiability of finite mixtures. The Annals of Mathematical Statistics, 34, 1265–1269.Yakowitz, S. J. & Spragins, J. D. (1968). On the identifiability of finite mixtures.
The Annals of Mathematical Statistics, 39, 209–214.Yao, W. & Lindsay, B. G. (2009). Bayesian mixture labeling by highest posterior density.
Journal of the American Statistical Association, 104, 758–767.
Received 21 May 2010Accepted 21 May 2011
DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique