modal simulation and visualization in finite mixture models

17
The Canadian Journal of Statistics Vol. 39, No. 3, 2011, Pages 421–437 La revue canadienne de statistique 421 Modal simulation and visualization in finite mixture models Daeyoung KIM 1 * and Bruce G. LINDSAY 2 1 Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA 01002, USA 2 Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA Key words and phrases: Confidence regions; likelihood; mixture model; visualization. MSC 2010: Primary 62-09; secondary 62F25. Abstract: Kim & Lindsay (2011a) proposed a new sampling-based visualization methodology, modal sim- ulation, designed to describe the boundaries of the confidence regions generated by an inference function such as the likelihood. Once the sample points on the boundaries of the targeted confidence sets are created in a single simulation run, one can use those samples to describe every confidence set of interest without further numerical optimization. However, this method assumes that one is simulating the samples for the parameters from the likelihood region with a single mode. In this article we extend the modal simulation method to be applicable to the likelihood regions in a finite mixture model where there exist multiple modes. The Canadian Journal of Statistics 39: 421–437; 2011 © 2011 Statistical Society of Canada esum´ e: Kim et Lindsay (2011a) ont propos´ e une nouvelle m´ ethodologie de visualisation assist´ ee par ´ echantillonnage, appel´ ee simulation modale, dont l’objectif est de d´ ecrire les fronti` eres des r´ egions de confi- ance g´ en´ er´ ees par une fonction d’inf´ erence telle que la vraisemblance. Une fois que les points ´ echantillonn´ es sur les fronti` eres des r´ egions de confiance s´ electionn´ ees ont ´ et´ e obtenus par un unique ensemble de simu- lations, nous pouvons utiliser ceux-ci pour d´ ecrire tous les r´ egions de confiance d’int´ erˆ et sans optimisation num´ erique suppl´ ementaire. Cependant, cette m´ ethode pr´ esuppose que les simulations des param` etres provi- ennent d’une r´ egion de vraisemblance avec un seul mode. Dans cet article, nous g´ en´ eralisons la simulation modale pour prendre en charge des r´ egions de vraisemblance provenant d’un mod` ele de m´ elange fini pour lequel il existe plusieurs modes. La revue canadienne de statistique 39: 421–437; 2011 © 2011 Société statistique du Canada 1. INTRODUCTION We often consider an inference function for statistical inference, which is a function of the param- eter θ and the data x. For example, the likelihood, the score statistic (Rao, 1948), the quadratic inference function (Lindsay & Qu, 2003) and the empirical likelihood (Owen, 1988; Qin & Law- less, 1994) are all widely used throughout statistics. These inference functions can be used to create confidence regions for the parameters. However, the simple mathematical description of these confidence sets does not automati- cally lead to easy numerical or pictorial descriptions that one can use in a practical situation. One might contrast this with a modern Bayesian analysis, where simulation is used to turn the pictorial representation of confidence sets into a form of data analysis. Kim & Lindsay (2011a) developed simulation methods for non-Bayesian inference functions. The main achievement of * Author to whom correspondence may be addressed. E-mail: [email protected] © 2011 Statistical Society of Canada / Société statistique du Canada

Upload: daeyoung-kim

Post on 12-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

The Canadian Journal of StatisticsVol. 39, No. 3, 2011, Pages 421–437

La revue canadienne de statistique

421

Modal simulation and visualization in finitemixture modelsDaeyoung KIM1* and Bruce G. LINDSAY2

1Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA 01002,USA2Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA

Key words and phrases: Confidence regions; likelihood; mixture model; visualization.

MSC 2010: Primary 62-09; secondary 62F25.

Abstract: Kim & Lindsay (2011a) proposed a new sampling-based visualization methodology, modal sim-ulation, designed to describe the boundaries of the confidence regions generated by an inference function

such as the likelihood. Once the sample points on the boundaries of the targeted confidence sets are created

in a single simulation run, one can use those samples to describe every confidence set of interest without

further numerical optimization. However, this method assumes that one is simulating the samples for the

parameters from the likelihood region with a single mode. In this article we extend the modal simulation

method to be applicable to the likelihood regions in a finite mixture model where there exist multiple modes.

The Canadian Journal of Statistics 39: 421–437; 2011 © 2011 Statistical Society of Canada

Resume: Kim et Lindsay (2011a) ont propose une nouvelle methodologie de visualisation assistee par

echantillonnage, appelee simulation modale, dont l’objectif est de decrire les frontieres des regions de confi-

ance generees par une fonction d’inference telle que la vraisemblance. Une fois que les points echantillonnes

sur les frontieres des regions de confiance selectionnees ont ete obtenus par un unique ensemble de simu-

lations, nous pouvons utiliser ceux-ci pour decrire tous les regions de confiance d’interet sans optimisation

numerique supplementaire. Cependant, cette methode presuppose que les simulations des parametres provi-

ennent d’une region de vraisemblance avec un seul mode. Dans cet article, nous generalisons la simulation

modale pour prendre en charge des regions de vraisemblance provenant d’un modele de melange fini pour

lequel il existe plusieurs modes. La revue canadienne de statistique 39: 421–437; 2011 © 2011 Société

statistique du Canada

1. INTRODUCTION

We often consider an inference function for statistical inference, which is a function of the param-

eter θ and the data x. For example, the likelihood, the score statistic (Rao, 1948), the quadratic

inference function (Lindsay & Qu, 2003) and the empirical likelihood (Owen, 1988; Qin & Law-

less, 1994) are all widely used throughout statistics. These inference functions can be used to

create confidence regions for the parameters.

However, the simple mathematical description of these confidence sets does not automati-

cally lead to easy numerical or pictorial descriptions that one can use in a practical situation.

One might contrast this with a modern Bayesian analysis, where simulation is used to turn the

pictorial representation of confidence sets into a form of data analysis. Kim & Lindsay (2011a)

developed simulation methods for non-Bayesian inference functions. The main achievement of

*Author to whom correspondence may be addressed.E-mail: [email protected]

© 2011 Statistical Society of Canada / Société statistique du Canada

422 KIM AND LINDSAY Vol. 39, No. 3

this article is the extension of those methods to the considerably more complicated setting of mix-

ture models, focusing on the likelihood as the inference function. In particular, we will show how

to describe the confidence sets in the face of both identifiability problems and multiple likelihood

modes.

There are several alternativemethods for confidence set construction. Let us take the likelihood

function for example. Themethod that provides the confidence regionwith the simplest description

formultivariate parameters is theWald-type confidence set based on using an estimated covariance

matrix for themaximum likelihood estimators (MLE). This yields elliptically shaped regions of the

parameter spacewhose boundaries area easily calculated.However, it is known that the likelihood-

based set generally has a more precise coverage probability than the Wald set, especially when

the sample size is not large. TheWald sets also require estimation of the covariance matrix for the

estimator; the different estimationmethods could produce quite different results. A final drawback

is that Wald sets are not invariant to parametrization (Cox & Hinkley, 1974; Meeker & Escobar,

1995; Agresti, 2002; Kalbfleisch & Prentice, 2002).

A completely different type of the confidence set construction is the bootstrap (Efron &

Tibshirani, 1993; Davison & Hinkley, 1997). A number of authors have discussed the theoretical

drawbacks of the bootstrap method compared to using an inference function (Owen, 2001; Lang,

2008). There are also drawbacks from the computational viewpoint. First, the recomputation of

parameter estimates based on new data sets generated from the bootstrap can be quite expensive

in computing time. Second, one is generating the confidence set from a cloud of sampled data

points. It follows that the sets could have many possible boundaries unless one specifies the shape

and orientation of the region (Owen, 2001). Third, this kind of simulation is inherently sparse

near the possible boundaries of the confidence set, and so there is significant statistical error in

describing the boundary (Kim & Lindsay, 2011a).

Kim & Lindsay (2011a) showed that when one has a confidence set generated by an inference

function, one can sample intensively on its well-specified boundary, and so obtain a sharper

picture of its size and shape. In this article we will focus on the use of the likelihood function

in the mixture model. The simulation methodology of Kim & Lindsay (2011a) assumed that one

was simulating from the likelihood with a single mode. However, in data analysis, we often

face cases where the likelihood function has multiple modes and thus corresponding confidence

regions have complicated structure. In addition,Kim&Lindsay (2011b) showed that the likelihood

sets for the parameters in a finite mixture model have a complex structure due to existence

of the two types of nonidentifiability, labelling nonidentifiability (Redner & Walker, 1984) and

degenerate nonidentifiability (Crawford, 1994; Lindsay, 1995).The rest of the article is organized as follows. Section2overviews themodal simulationmethod

proposed by Kim & Lindsay (2011a). In Section 3 we review the issues caused by existence of

nonidentifiabilities in a finite mixture model. In Sections 4 and 5 we propose a new strategy of

using the modal simulation method applicable to a simple case where unimodal partition exists

in the mixture likelihood region. Section 6 extends the modal simulation method developed in

Sections 4 and 5 to the multimodal case where there exists a secondary mode. In Section 7 we

apply our proposed methods to two real examples.

2. BACKGROUND ON THE LIKELIHOOD REGION SIMULATION ANDVISUALIZATION

In this section we describe the topographical structure of the (profile) likelihood regions for a

parameter vector θ in a regular parametric model. We then review the simulation method (Kim &

Lindsay, 2011a), designed to visualize the likelihood regions with a single mode.

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 423

2.1. Likelihood RegionsFor n i.i.d. observations {y1, . . . , yn} generated from the parametric family {p(y | θ), θ ∈ � ⊂Rp}, one can construct the likelihood of a p-dimensional parameter vector θ

L(θ) =n∏

i=1

p(yi | θ). (1)

We will denote the true parameter by θτ , and the MLE mode for the parameter maximizing the

likelihood in Equation (1) by θ. We assume the model is regular and the likelihood is bounded.

One can construct the general form of the likelihood confidence region for θ by inverting a

test of the null hypothesis H0 : θ = θ0 based on the likelihood ratio statistic T1 = −2 logL(θ) +2 logL(θ),

CLRc = {θ : L(θ) ≥ c}, (2)

where c = L(θ)e−q(1−α)/2 is an adjustable constant and q(1 − α) is the 1 − α quantile of the

(asymptotic) distribution of T1. We will generally assume that the limiting distribution of T1 does

not depend on θ0 and asymptotic critical values are available for T1 although we can adjust these

values for finite sample sizes.

We will think of the problem of describing this set as a topographic one, in which a p-

dimensional parameter vector θ describes coordinates of a flat surface and the likelihood L(θ)

represents the elevations of the land mass on that surface. The boundary of the set CLRc , namely

{θ : L(θ) = c}, corresponds to a specified contour of the corresponding topographical map. The

shape of the confidence set is determined by this contour.

We call CLRc in Equation (2) as the elevation c likelihood confidence region. In this article

c is interpreted via the confidence level of the corresponding likelihood ratio statistic in the p-

dimensional parameter space which we denote by Confp(c). For example, when c = L(θ), CLRc

is equal to {θ} and corresponding confidence level is 0%. Note that, for illustrative convenience,

we will calibrate the confidence level for each elevation c using the limiting distribution of the

likelihood ratio statistic for an easy transition between the full confidence set and the profile

confidence set. That is, Confp(c) will be based on the chi-squared distribution with p degrees of

freedom. If one wishes to use elevations that provide more accurate confidence levels, one can

do a parametric bootstrap adjustment (Efron & Tibshirani, 1993; Davison & Hinkley, 1997; Kim

& Lindsay, 2011a).

2.2. Profile Likelihood RegionsWhen one is interested in a function of the parameters, one often constructs a profile likelihood and

profile confidence regions for inference. Let β(θ) be a r(≤ p)-dimensional vector of parameters

of interest. The profile likelihood for β(θ) is defined by

Lprof (β) = supθ

{L(θ) : β(θ) = β}. (3)

One can then construct the profile confidence region for β(θ) by inverting a test ofH0 : β(θ) = β0

based on T2 = −2 logLprof (β0) + 2 logLprof (β) where β = β(θ),

CPLRc = {β : Lprof (β) ≥ c}, (4)

where c is an adjustable constant. We here transform the constant c into a confidence level in the

r-dimensional parameter space which we denote by Confr(c).

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

424 KIM AND LINDSAY Vol. 39, No. 3

There is an important relationship between the profile confidence region and the full likelihood

region that we call the nesting property. If θ is a point in the likelihood confidence region CLRc

of Equation (2), by definition, β(θ) must also be in the profile likelihood confidence region CPLRc

of Equation (4) at the same elevation c. It is important to remember that the same elevation c

will have different confidence level interpretations in CLRc and CPLR

c , due to the different degrees

of freedom used to make the two confidence sets. A particular value of c generates a confidence

set with level Confp(c) in CLRc , but it generates a set with a larger level Confr(c) for CPLR

c of

dimension r.

2.3. Modal Simulation and VisualizationKim & Lindsay (2011a) created the modal simulation method for generating samples on the

boundaries of the confidence region generated by an inference function such as the likelihood.

The basic idea of their method is to generate a sample of points from the elevation c likelihood

contour set, {θ : L(θ) = c} that contains theMLE θ (i.e., themode in the likelihood). Their strategy

for generating the sample on {θ : L(θ) = c} can be described as follows:

For now we suppose that there is a single mode in the likelihood. We suppose that we have

set an elevation of interest c based on our desired confidence regions, as in Equation (2) with

c = L(θ)e−q(1−α)/2. Suppose one has found theMLEmode, θ, and formed a covariance estimator

Vθ . Suppose one also defines a ray generated by a vector z ∈ Rp to be θ(ε) = θ + εVθ1/2z where

ε ∈ R. Regarding the formofVθ in a ray, one can use the Fisher information or one of its asymptotic

equivalents. The following two steps will be carried out iteratively.

[Step 1] Generate z from the p-dimensional standard normal distribution.

[Step 2] Determine ε satisfying L(θ(ε)) = c. Denote the computed ε as ε = ε(c, z). We let θ =θ(ε) be the simulated value generated by (c, z). We compute θ for both the positive and

negative ε solutions.

Given a targeted elevation c and given z, the ray starts from θ and heads in direction z/‖z‖.When this ray reaches the boundary of the targeted {θ : L(θ) = c}, this point becomes the sampled

value of θ, depending on the random direction z/‖z‖. One can obtain solutions ε satisfying

L(θ(ε)) = c using an one-dimensional root finding algorithm. If one repeats the calculation of θ

for many generated z, one can obtain a large set of simulated parameter values on the boundary

of {θ : L(θ) = c}.There is one extremely useful feature of themodal simulationmethod. After generating a large

sample from a single simulation run, one can use the points in the sample to picture the profile

likelihood confidence sets for anydesired functions of the parameters, allwithout further numerical

optimization. This feature arises from the nesting property mentioned in Section 2.2. If one has

a sample of B points (θ1, . . . , θB) from {θ : L(θ) = c} and β(θ) is a function of the parameters of

interest, then (β(θ1), . . . , β(θB)) are a set of points from the profile set {β : Lprof (β) ≥ c} (Figure1 in Section 4 gives an illustration of a simulated profile plot from a mixture model).

As noted by Kim & Lindsay (2011a), if there is only one MLE mode in the likelihood and the

targeted set around θ is star-shaped, the likelihood is monotonically decreasing along every ray

from θ, regardless of elevation. Thus, the modal simulation will have only a single solution on

L(θ) = c along the positive and negative ray. If the set is not star-shaped, however, a ray from

θ can have multiple solutions on L(θ) = c. The first solution on the ray starting from θ is in the

star-shaped region for θ. Solutions further out the same ray lay outside the star-shaped region for

θ. We call these extra solutions simulation outliers. Note that the simulation outliers are still in

the confidence region for θ, but the existence of just one such outlier proves that the shape of the

likelihood region is quite different from that of the Wald elliptical region.

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 425

There are multiple complications in applying the aforementioned simulation theory tomixture

models. On the other hand, the effort is worthwhile because it provides a direct solution to the

famous mixture labelling problem.

3. BACKGROUND ON FINITE MIXTURE MODELS AND NONIDENTIFIABILITIES

In this section we review a parametric finite mixture model and the two types of nonidentifiability

inherent in the mixture parameters. Suppose that n observations y = (y1, . . . , yn)′, yi ∈ Rq, are

randomly drawn from a K-component mixture density g(y; θ) indexed by a set of parameters

θ ∈ �:

p(y | θ) =K∑

j=1

πjf (y; ξj, ω), (5)

θ =

π1

ξ1

ω

, . . . ,

πj

ξj

ω

, . . . ,

πK

ξK

ω

(6)

where πj is the mixing weight of component j with the constraint 0 ≤ πj ≤ 1 and∑K

j=1 πj = 1,

ξj is the jth component specific parameter vector and a structural parameter ω for the density

function f . Note that each column of θ corresponds to a component. We will suppose that the

parameter space � is the full product space (πj’s in the simplex and ξj’s in the cross product

space). One can associate θ in Equation (6) with themixing distribution, denoted byQ<θ>, which

is the discrete distribution function with mass πj at ξj : p(y | θ) = ∫f (y; ξ, ω)dQ<θ>(ξ). The

mixing distribution Q<θ> is often identifiable (Teicher, 1960, 1963; Yakowitz & Spragins, 1968;

Lindsay, 1995). In this article we assume that the mixing distribution Q<θ> associated with the

parameter θ in Equation (5) is identifiable.

There are two key nonidentifiabilities in Equation (5), the degenerate nonidentifiability (Craw-ford, 1994; Lindsay, 1995) and the labelling nonidentifiability (Redner&Walker, 1984). Boundary

nonidentifiability means that for certain parameter values, the actual number of components is less

than K. For example, suppose that ξj is unidimensional and there is no structural parameter in the

two-component parameter space. Then the following three subsets of the boundary of the parame-

ter space θ =[(

π1

ξ0

),

(1 − π1

ξ0

)]for any π1 in [0,1],

[(0

ξ

),

(1

ξ0

)]and

[(1

ξ0

),

(0

ξ

)]for any ξ in the ξ parameter space, all generate a single density that has one component with

parameter ξ0. In this article we assume that the number of components K is fixed and known, so

that the true density does not display this degeneracy. Notice that when ξ1 = ξ2, the parameter θ

is degenerate but also in the interior of the cross product space so that this is not of the typical

“boundary of the parameter space” degeneracy.

By labelling nonidentifiability we mean that for fixed K, the parameters θ in Equation (6)

are only identifiable up to a column permutation of θ: : p(y | θ) = p(y | θσ) for θ �= θσ where θσ

is a copy of θ with columns permuted according to any permutation σ of the identity permuta-

tion (1, . . . , K). For example, when K = 2 and ξj is unidimensional, θ =[(

π1

ξ1

),

(π2

ξ2

)]=[(

0.4

1

),

(0.6

2

)]has the same distribution as θσ =

[(0.6

2

),

(0.4

1

)]. Due to this noniden-

tifiability, labels on θ are not identifiable. In particular, there areK! true values corresponding to all

possible column permutation of the true value θτ . Further, the modes of the mixture likelihood in

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

426 KIM AND LINDSAY Vol. 39, No. 3

Equation (1) come in amodal group ofK! points because there are also (at least)K!modeswith the

same likelihood. If there is only one modal group corresponding to all the permutations of a single

mode, we will say that the problem is unimodal. If not, we will say that there are secondarymodes.

In this under-identified setting one needs to modify the standard definitions of confidence

regions and coverage probabilities. We will seek labelled confidence regions that are constructedusing identifiable subsets of the parameter space. In such a set one will find at most one of the K!

permutations of a parameter and no points having the degenerate nonidentifiability.Note that ifB is

an identifiable subset of parameters, then the setBσ , the elements ofBwith columnspermuted byσ,

is also an identifiable subset, andwould also be an equivalent candidate to be the confidence set.We

will say that a labelled confidence region methodology has labelled coverage probability 1 − α

if 1 − α is the probability that the constructed region contains exactly one of the K! permutations

of the true value (since the regions are identifiable sets, they can contain at most one version of

the true value). In practice we will use the full likelihood region, which is not identifiable, and

then partition it into K! identifiable subsets that are permutation images of each other, any one of

which could be selected to use as the labelled confidence region.

The construction of labelled credible/confidence sets for mixture parameters has a long his-

tory (Jasra, Holmes, & Stephens, 2005; Yao & Lindsay, 2009). One family of solutions comes

from using parameter constraints. A second family is based on clustering the output of MCMC

algorithms, which in effect creates a connected identifiable subset. In this article we will let the

likelihood itself do the clustering.

4. MODAL SIMULATION FOR THE MIXTURE LIKELIHOOD WITH UNIMODALSTAR-SHAPED PARTITION

In this section we give a review on topological theory of the mixture likelihood regions. We will

focus in this section on the ideal case where the likelihood region can be decomposed into K!

identifiable subsets, and each subset is star-shaped and has only one MLEmode (Kim & Lindsay,

2011b). We then develop a strategy of modal simulation for such an ideal case and illustrate its

application to one simulated dataset.

4.1. Topology TheoryAlthough there are two types of nonidentifiability in the mixture parameter θ in Equation (6),

many statistical analyses have been carried out as if they are identifiable. This is possible because

the point estimator itself is usually well-defined by the results of an algorithm maximizing the

mixture likelihood. Moreover, there is an asymptotic theory which guarantees the existence of

consistent way to choose a permutation of theMLE and asymptotic normality for the chosenMLE

as the sample size grows large (Redner & Walker, 1984).

Kim & Lindsay (2011b) argued that the asymptotic theory can always be used to create a

meaningful mixture likelihood region using Equation (2) provided that one redefines the meaning

of confidence set to account for lack of labelling identifiability, and then chooses a sufficiently

small confidence level. At confidence level 0%, one can construct the likelihood region using the

constant c = L(θ), yielding the global confidence region CLRc = {θ : L(θ) ≥ L(θ)}, which is the

set of K! modes corresponding to the permutations of θ. Provided that c is not too much smaller

thanL(θ), the global confidence region will consist ofK! disjoint regions, one around each mode,

and each region is a connected and identifiable subset. These regions are also permutation images

of each other. One can then select any one of those regions to be the labelled confidence set.

To be more precise, Kim & Lindsay (2011b) defined the modal region determined by theelevation c and θ, denoted by Cc(θ), to be the set of all θ that are connected to θ by a continuous

path entirely in the mixture likelihood region at the elevation c of Equation (2). By this definition

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 427

one can obtain the modal region around another mode of the modal group by permuting all

elements in Cc(θ) : Cc(θσ) = Cσ

c (θ) for any permutation σ. For the elevations c just below L(θ),

the global likelihood region has a decomposition into K! disjoint and identifiable modal regions.

Asymptotically, this construction method will have the correct labelled coverage probability

based on our modified definition of coverage. When this situation holds at a constant c, we will

say that we have an identifiable confidence partition at that elevation.

However, if the value of c corresponding to the desired confidence level is too small, the

likelihood region no longer has an ideal decomposition into K! disjoint and identifiable modal

regions. A simple example of this occurs when c = ψdgnt, the highest likelihood in the degenerate

set, as then the full likelihood set contains a degenerate point, and so there is no way to partition it

into K! identifiable subsets. Note that one can easily obtain ψdgnt by calculating the likelihood at

the MLE for the (K − 1) component mixture model. When there is not an identifiable confidence

partition, there is no longer a natural way to create an identifiablemodal region from the likelihood

region to be called the labelled confidence region.

The existence of an identifiable partition at any particular elevation c depends on the topology

of the likelihood surface. This topology can be analyzed using Morse Theory (Matsumoto, 2002)

under suitable smoothness assumptions on the likelihood function. The ideal case occurs as fol-

lows:Letψmle be the elevationof theMLEmodes and letψcrit be the elevationof the highest critical

value that is not theMLE. As shown byKim&Lindsay (2011b), ifψcrit < c ≤ ψmle, then the con-

fidence region has an identifiable partition for which each element of the partition contains a single

MLE mode, but no other critical values. In this case we will say we have a unimodal partition.

4.2. Modal Simulation Strategy and a Simulated ExampleWenowapplying themodal simulationmethodofSection2.3 in the context of a unimodal partition.

Recall from that section that we assumed that one was simulating from a likelihood with a single

mode. When a unimodal partition exists at the chosen elevation c, one can apply this method

by choosing any one of the MLE group, say θ, to simulate from, knowing that simulating from

other MLE modes would only permute the output. Recall that the simulated values correspond to

solutions to L(θ) = c found along randomly selected rays from the mode.

Since the modal region Cc(θ) is an identifiable subset under existence of a unimodal partition,

the simulatedpoints on its boundarywill comeautomatically labelled by the choice of the particular

MLEmode θ. Please note that this feature provides crucial information for addressing the labelling

problem in a finite mixture model, as points simulated from the full dimensional likelihood have

the same labels as θ. One can also use these labels when constructing the simulated profile region.

It is important to note that a conventional numerical profile approach would not provide labels.

There are some technical difficulties with implementation of this strategy that we postpone

till the next subsection. We first illustrate how the method works in a two-component normal

mixture model with equal variances, θ =

π1

ξ1

ω

,

π2

ξ2

ω

. We simulated n = 500 obser-

vations from θ =

0.4

−1

1

,

0.6

1

1

, and obtained the MLE for θ using the expectation-

maximization (EM) algorithm (Dempster, Laird&Rubin, 1977). TheML estimate had parameters

θ =

0.59

0.95

0.89

,

0.41

−1.00

0.89

.

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

428 KIM AND LINDSAY Vol. 39, No. 3

Although the full parameter space is four-dimensional, one can view features of the full

likelihood structure through two-dimensional profiles.We here use a (π1, ξ1) plot because it nicely

shows how the two mixture components differ from each other. In this example the number of

components is two. An important fact that is useful in examining such plots is that if one can see

the two separated regions in the profiles at the elevation suitable for a unimodal partition, then it

follows that the full four-dimensional likelihood region also separates into two identifiable sets

at the same elevation.

Figure 1a,b shows the numerical profile contour and the simulation-based profile plot for

(π1, ξ1) at the elevation c with Conf2(c) = 0.95 (corresponding to Conf4(c) = 0.80). Note that a

numerical profile likelihood contour was obtained by maximizing L(θ) over (ξ2, ω) for each fixed

(π1, ξ1). In such a plot, “label switching” will occur, so that we see the image of the MLE mode

as well as its permuted version. In Figure 1a we can see that the two modal regions corresponding

to the MLE and its permutation have appeared in the profile plot: for the given MLE θ, one mode

had

(π1

ξ1

)=

(0.59

0.95

)and θ

σhad

(π1

ξ1

)=

(0.41

−1.00

).

For Figure 1b we used 3,000 rays. We also check for violations of the star-shaped assumption

usingmethodswe describe in the next section, and found no evidence. Sincewe simulated samples

forming the targeted likelihood region from just one MLEmode, setting the first component to be(0.59

0.95

)in θ, there is just one connected 95% profile confidence region that is nearly elliptical,

and has unique labels. That is, the simulation method identifies the first component in Cc(θ)

uniquely, and it corresponds the upper mode in the profile plot of Figure 1a. Note that the same

labelling system in Cc(θ) can be used to visualize the second modal region in Figure 1a. One

simply plots the simulated parameter values from the second component instead of the first. We

can measure the success of the simulation by noting that the simulated samples successfully

described the boundary of the elliptical shaped likelihood contour.

5. MODAL SIMULATION FOR THE MIXTURE LIKELIHOOD WITHNONSTAR-SHAPED PARTITIONS

In this section we design a modal simulation strategy to be used when a unimodal partition exists

at the targeted elevation, but one wants to check whether the region is star-shaped. Our method

involves proceeding further out each ray, and checking for solutions to L(θ) = c beyond the first

one on the ray. We know that the first solution on the ray is path-connected to the chosen mode,

the path being the ray itself. The problem we now face is that the new solutions may no longer be

path connected to the original mode, but rather be path connected to one of the permuted modes.

Therefore, we need tools that will identify the path connections between the contour solutions

and the modes.

We start with some definitions. Let Sc(θ) be the set of θ that lies in the star-shaped region

of elevation c generated by mode θ. That is, given a ray from θ, the points along with ray are in

Sc(θ), up to and including the value of θ that is the first solution to L(θ) = c. We note that Sc(θ)

is contained in the modal region of the same elevation for θ, which we have denoted by Cc(θ),

because each element has a path to it (namely the ray itself) entirely in Cc(θ). We know that the

first solution on the ray starting from θ belongs in Sc(θ) and hence the corresponding modal region

Cc(θ). The solutions further out on the same ray lay outside Sc(θ) are the simulation outliers.

The interpretation of these outliers is rather complex, but they do provide useful information

about the structure of the confidence region. If θ is a simulation outlier, there are two possible

explanations for its existence. First, Cc(θ) could fail to be star-shaped, and θ is a boundary point

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 429

π1

ξ 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3

−2

−1

0

1

2

395% setMLETrue

π1

ξ 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3

−2

−1

0

1

2

3a b

Figure 1: (a) Shows a numerical profile likelihood contour and (b) shows a modal simulation-based profileplot for (π1, ξ1) using n = 500 and Conf2(c) = 0.95.

for it that is outside Sc(θ). We will call this a star-shaped outlier. Secondly, it could lie outside

Cc(θ), but still exist, because the ray has entered the boundary of another element of the partition,

say Cc(θσ). We will call this a wrong mode outlier. Note that the existence of the star-shaped

outliers provides strong evidence that the likelihood regions are shaped rather differently from

Wald regions. However, the wrong-mode outliers are quite benign for the purpose of describing

the targeted modal region Cc(θ).

We can use the EM algorithm to determine the type of simulation outlier we have, as the EM

algorithm starting from any θ will climb the likelihood monotonically until it reaches a critical

point. We start the EM algorithm at the simulation outlier. If the algorithm climbs back to θ, then

the outlier is a star-shaped one. If it climbs to a permutation of θ, it is a wrong-mode outlier.

The existence of just one star-shaped outlier proves that Cc(θ) is not star-shaped. However,

the existence of wrong mode outliers does not, in itself, provide any evidence on this point. We

will simply discard the wrong-mode outliers from the analysis.

To see performance of the modal simulation strategy described above, we construct the likeli-

hood regions for the parameters in a two-component normal mixture model with equal variances,

θ =

π1

ξ1

ω

,

π2

ξ2

ω

. We simulated 100 observations from θ =

0.4

−1

1

,

0.6

1

1

,

and obtained the MLE for θ using the EM algorithm. The ML estimate had parameters θ =

0.62

1.09

0.83

,

0.38

−0.88

0.83

.

Figure 2a,b shows the numerical profile contour and the simulation-based profile plot for (π1,

ξ1) at the elevation with Conf2(c) = 0.7415 (corresponding to Conf4(c) = 0.3918). We observe

that the profile likelihood regions, have two identifiable subsets, one for each MLE mode. No-

tice that the shapes of these two modal regions are still star-shaped even though they seem to

bear less resemblance to two ellipses. For the modal simulation 3,000 rays were used and we

simulated samples forming the targeted likelihood region from just one MLE mode, setting the

first component to be (0.62,1.09) in θ. We see that the sampled values for (π1, ξ1) successfully

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

430 KIM AND LINDSAY Vol. 39, No. 3

π1

ξ 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3

−2

−1

0

1

2

374.148% setMLETrue

π1

ξ 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3

−2

−1

0

1

2

3a b

Figure 2: (a) Shows a numerical profile likelihood contour and (b) shows a modal simulation-based profileplots for (π1, ξ1) using n = 100 and Conf2(c) = 0.7415.

captured the numerical boundary of the modal region for (0.62, 1.09), corresponding the upper

mode in the profile plot of Figure 2a. Note that there were no star-shaped outliers and there were

two wrong-mode outliers (represented by xs in Figure 2b) belonging to the modal region for the

second component, (0.38, −0.88).

In our experience it is difficult to find cases where there exists a unimodal partition and

the labelled confidence regions fail to be star-shaped. Thus, in some sense the checking of this

assumption provides extra computational labour without much compensation. However, we will

tackle multimodal problems later, and see that the outlier problem is more serious in them.

6. SOME FURTHER THEORETICAL RESULTS FOR UNIMODAL PARTITIONS

In Section 4.1 we introduced a sufficient condition for existence of a unimodal partition given by

Kim&Lindsay (2011b): the elevation c should be betweenψcrit andψmle, and be higher thanψdgnt.

If the elevation c of interest is between ψdgnt and ψcrit, the story of the existence of the labelled

confidence region becomes complicated. Unfortunately, it is not so easy to determine ψcrit. The

EM algorithm makes it relatively straightforward to determine the modes of a likelihood, but it

generally does not find the saddlepoints, one of which could yield ψcrit.

Kim & Lindsay (2011b) proved that if the component parameter ξj is univariate, there is a

simpler sufficient condition for the existence of a unimodal partition. If there exists a secondary

mode between ψmle and ψdgnt whose elevation is denoted by ψ2nd, the sufficient condition for

the elevation c generating the unimodal partition is ψ2nd < c ≤ ψmle. Hence the method can be

carried out using the results of systematic algorithmic searches for the modes of the likelihood,

a process that is usually necessary to verify that one has the MLE (Lindsay, 1995; McLachlan &

Peel, 2000). If there are no modes between ψdgnt and ψmle, we can define ψ2nd to be ψdgnt and

thus we have a very simple necessary and sufficient condition for a unimodal partition, which is

ψdgnt < c ≤ ψmle.

Referring back to the two simulated examples in Section 4 and 5, the mixture model we used

had univariate component specific parameters and so this rule applied. When we used the EM

algorithm for finding the MLE in both examples, multiple starting values were employed. In

both examples we did not find any secondary modes (and so ψ2nd = ψdgnt). The confidence level

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 431

corresponding to ψdgnt in the four-dimensional space was Conf4(ψdgnt) = 0.986 at n = 500 and

Conf4(ψdgnt) = 0.505 at n = 100. In both examples, therefore, Conf4(ψdgnt) represents the exact

upper bound on the confidence levels for which a unimodal partition exists. This is in agreement

with what was observed in both examples, existence of two disjoint and identifiable subsets, one

for each MLE mode of the likelihood.

When the component-specific parameter is multivariate, however, Kim & Lindsay (2011b)

showed by example that a unimodal partition can fail to exist at the elevation c = ψcrit between

ψdgnt and ψmle, even when ψcrit is not equal to ψ2nd. In other words, it is technically feasible

that there exists a nondegenerate saddlepoint at which the two modal regions around two MLE

modes connect while the elevation of that saddlepoint is higher thanψdgnt. This implies that in the

multivariate caseψdgnt is just an inexact lower bound for a unimodal partition when there exist no

secondary modes. Kim & Lindsay (2011b) developed an algorithmic method designed to check

for the existence of such connecting saddlepoints higher than ψdgnt in a case of multivariate ξj .

One can use these results to find a safe lower bound for use with the simulation method of this

article.

7. MODAL SIMULATION FOR THE MIXTURE LIKELIHOOD WITH MULTIPLEMODAL GROUPS

In this section we extend the modal simulation method of the preceding sections to a complicated

case where there is more than one critical point in each modal region and the second critical

point corresponds to a secondary mode. Note that we here consider a finite mixture model with

univariate component-specific parameters.

7.1. Topology of the Likelihood with Multimodal GroupsWhen there is more than one modal group in the likelihood for the K component mixture model,

the number of distinct modal regions is not necessarilyK!, depending on the elevation of interest.

We here describe a complex topology of the mixture likelihood with a secondary modal group by

using a simulated example where there is a significant secondary mode.

We generated 75 observations from a three-component normal mixtures, 0.33 N(−3, 1)

+ 0.34 N(0, 1) + 0.33 N(3, 1). Then we fitted a three-component normal mixture with a

fixed variance, 0.25. The two modes were θmle =[(

0.257

−3.166

),

(0.350

−0.782

),

(0.392

2.430

)]with

�(θmle) = −211.7 and θ2nd =[(

0.367

−2.711

),

(0.351

0.1058

),

(0.282

2.917

)]with �(θ2nd) = −213.54.

The maximum log likelihood in the degenerate class was −298.11. That is, the elevation of a

secondary mode ψ2nd is lower than that of the MLE mode ψmle, but higher than ψdgnt, that of the

MLE for a two-component mixture model. Note that Conf5(ψ2nd) = 0.40 and Conf5(ψdgnt) = 1.

Note that there are 3! = 6 permutation of each mode possible.

Now the structure of the confidence sets is complicated by the distinct modal regions of the

two modes. Figure 3 shows the simulation-based profile plot for (π1, ξ1) at elevations c with

Conf2(c) = (0.84, 0.90, 0.95), corresponding to Conf5(c) = (0.40, 0.53, 0.69). There are three

square points corresponding to the MLE mode. Although only three modes are visible, it is

because we are using a (π1, ξ1) profile plot. Thus, for example, the two modes with (.257,−3.166)

labelled as the first component appear as one. There are also three circle points corresponding to

the secondary mode. These plots were created using a new modal simulation strategy, which we

will explain in Section 7.2.

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

432 KIM AND LINDSAY Vol. 39, No. 3

Figure 3: Modal simulation-based profile plots for (π1, ξ1). (a) Conf2(c) = 0.84; (b) Conf2(c) = 0.90; (c)Conf2(c) = 0.95.

From Section 6, when the component-specific parameter is univariate and there exists a sec-

ondary mode above ψdgnt, we know that there exists a unimodal partition of the likelihood region

for any elevations c between ψ2nd and ψmle. That is, if one’s choice of confidence level results in

an elevation c between ψmle and ψ2nd, the mixture likelihood has one group of K! MLE modal

regions, one around each MLE mode, and each modal region is a connected and identifiable

subset. Figure 3a shows the profile plot for (π1, ξ1) at the elevation for Conf2(c) = 0.84 (i.e.,

Conf5(c) = 0.40), and we can see that a unimodal partition for θmle exists.

However, as the elevation c is decreased to just below ψ2nd, a new group of K! secondary

modal regions is formed, one around each secondary mode, and there are 2K! distinct modal

regions at this elevation. In this example Conf5(ψ2nd) = 0.40 and Conf2(ψ2nd) = 0.84, so the

second mode will come into play in the construction of the bivariate profiles regions whenever

one uses the conventional confidence levels, Conf2(c) = (0.9, 0.95). As shown in Figure 3b, we

have the two modal groups, one for θmle and the other for θ2nd, but they are disconnected. To

construct a labelled region, we would need to match the ellipses in pairs corresponding to the

labels for component 1, 2 and 3.

As the elevation goes down further, the 2K!(= 12) distinct modal regions, each grow, until

individual modal regions connect to each other. The first points to make these connections are

necessarily saddlepoints of the likelihood. These connectionsmust obey the permutation rules: ifA

connects toB, thenAσ connects toBσ . Suppose that at the given elevation there areK! connecting

paths that connect primary modes with secondary modes, in pairs. Let one such saddlepoint be

θsaddle, with elevation ψsaddle. If this merging of the two modal groups results in one group of K!

identifiable subsets at c = ψsaddle above ψdgnt, then we still have a way to construct a labelled

confidence set for the parameters: just select one of the K! identifiable paired subsets. In this case

each of the identifiable subsets has two modes, θmle and θ2nd.

In our example the estimated saddlepoint is θsaddle =[(0.328

−2.858

),

(0.348

−0.220

),

(0.324

2.714

)]with �(θsaddle) = −214.26, and the correspond-

ing confidence level for its elevation is Conf5(ψsaddle) = 0.598 and Conf2(ψsaddle) = 0.923 (the

estimated saddlepoint is represented by a triangle). Thus, the two modal groups are connected in

two-dimensional profile plots with Conf2(c) = 0.95 (see Figure 3c).

7.2. Modal Simulation StrategyIf one wishes to display the topological structure of the mixture likelihood with more than one

modal group with K! elements, we would suggest that one first find the secondary modes in the

full likelihood surface. We then suggest locating the saddlepoints that connect the primary and

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 433

secondary modes to determine if we can construct an identifiable partition. We have developed

an approach to finding the saddlepoints that are based on the following characterizations of a

saddlepoint: (1) the likelihood gradient at the saddlepoint is zero and (2) the negative Hessian

matrix has one ormore negative eigenvalues.We used the steepest descent algorithm for searching

for saddlepoints (see Section 7.3 for more detail).

In order to describe the mixture likelihood where there exist other modal groups than the

MLE group, we propose to simulate from one representative mode of each modal group. Given

two modes, θmle and θ2nd with ψmle > ψ2nd, for example, one can apply the modal simulation

method described in Section 2.3 to each of them and then obtain samples forming the targeted

modal regions for each mode.

When simulated values from the modal simulation are outliers—that is, are outside the star-

shaped region for a chosenmode—and the likelihood ismultimodal, then the saddlepoint elevation

plays an important role in the classification of the outlier. To simplify our discussion, suppose

there exists a group of saddlepoints that connects the two modal regions with elevation ψsaddle

above ψdgnt. Suppose further that one uses the modal simulation method for θmle and generates a

simulation outlier θ having elevation c.

In order to classify this outlier, we need to consider three cases. In the first case, when the

elevation c is above ψ2nd, the classification of the outlier is the same as that of Section 5. This

is because there is only one modal group for θmle. For the elevations below ψ2nd, however, we

need to consider the connection between the two modal regions. As a second case, suppose the

elevation of interest c is higher than ψsaddle. Then the modal regions for θmle are separated from

those for θ2nd. That is, in this case we have the modal regions Cc(θmle) that are disjoint from the

Cc(θ2nd). The full likelihood region consists of the union of these two modal groups. However,

we can still use the EM algorithm to determine the proper assignment of any simulation outliers.

If an EM from θ converges to θmle, then θ is a star-shaped outlier in describing the modal region

for θmle, Cc(θmle). Otherwise, θ is a wrong-mode outlier.

As our third case, suppose the elevation c satisfiesψdgnt < c < ψsaddle. For such elevations the

modal regions for θmle are pairwise connected with those for θ2nd in which case there would again

be K! identifiable subsets, but two modes within each, which we might write as Cc(θmle, θ2nd).

In this case, the region Cc(θmle, θ2nd) is not likely to be star-shaped, although it could equal the

union of the two star-shaped regions generated by each mode. Since it is clear that such a region

is awkwardly shaped (see Figure 3c in Section 7.1), we recommend not worrying about defining

star-shaped outliers. More importantly, we can still take simulation outliers, and determine if they

are wrong mode outliers as follows. Start the EM at θ, and if it does not converge to θmle or θ2nd,

call it a wrong mode outlier and discard it.

For application of the modal simulation strategy described above to the simulated ex-

ample introduced in Section 7.2, we used the six elevations corresponding to Conf5(c) =(0.21, 0.40, 0.53, 0.69, 0.75, 0.90) and 5,000 rays at each elevation. Note that the correspond-

ing values of Conf2(c) were 0.7, 0.84, 0.9, 0.95, 0.964, and 0.99, respectively. There was no

outlier in the modal simulation for θmle using any of the six chosen elevations. In the modal

simulation for θ2nd with the lower four elevations, however, there were four wrong-mode out-

liers at Conf5(c) = 0.53 and there were eight star-shaped outliers at Conf5(c) = 0.69 (i.e.,

Conf2(c) = 0.95). That is, the secondary regions were not star-shaped at the latter confidence

level.

7.3. Search for Saddlepoints: More DetailsSection 7.2 showed thatwhen there aremore than onemode, say the primary and secondarymodes,

and one wishes to form an identifiable partition, the search for the saddlepoints connecting them

is important. We here describe our strategy for searching for saddlepoints.

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

434 KIM AND LINDSAY Vol. 39, No. 3

First, given an initial value, one should find a local minimum of the norm of the likelihood

gradient by using the steepest descent algorithm. If the minimized norm is numerically zero (less

than a prespecified value), then we check if the minimizer point has right Hessian eigenvalue

structure to be a saddlepoint rather than a maximum. Note that in this search one might find a

new mode in addition to the existing modes and saddlepoints. A general search could also find

saddlepoints that do not connect the two modal regions. For the use of the gradient algorithm it

is important to obtain starting values of high quality in searching for the critical points (including

the saddlepoints) of the likelihood. To do so, we use information from our simulation. We first

find the simulated point with the smallest likelihood gradient norm at given elevation and then

use it as a starting value for the saddlepoint algorithm described above. We note that there exists

a wider literature on the construction of saddlepoint algorithms; see for example, Pang (2010).

We applied the saddlepoint algorithm described above to the simulated example in Section 7.1.

We started the algorithm from ten starting values one selected from each of the six elevations for

θmle and one fromeach of the four elevations for θ2nd. The starting valueswere the simulated points

for θ with the smallest likelihood gradient norm at each elevation. Note that the number of starting

values leading the algorithm to θmle, θ2nd, and θsaddle were two, one, and seven, respectively. In

other words, this approach can lead the saddlepoint algorithm to locate the desired saddlepoint

that connects the two modal regions, but sometimes it led back to the original modes. With regard

to our convergence criteria, the norm of the gradient function at θsaddle was 5.84E−05, and the

negative Hessian matrix had one negative eigenvalue and four positive eigenvalues.

8. DATA ANALYSIS

In this section we present two real examples to illustrate one important feature of the modal simu-

lation described in Section 2.3: Suppose one wishes to display two-dimensional profile confidence

regions for a specific confidence level, such as 0.95. One can then determine the elevation c corre-

sponding to Conf2(c) = 0.95, and run the simulation algorithm one time. One can then visualize

every profile confidence set using that output. In the two examples we will construct different

types of confidence regions: for the first example, we will give a two-dimensional profile plot for

mixing proportion and component parameter of each component, and for the second example, the

profile will be for pairwise component parameters.

Figure 4: (a) A histogram and a fitted mixture density (solid red) for SLC data; (b) approximate 98.6%simulation-based profile plot for (π, ξ) (red squares represent the MLE for π and ξ). [Color figure can be

seen in the online version of this article, available at http://wileyonlinelibrary.com/journal/cjs]

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 435

Example 1. Figure 4a is the histogram of red blood cell sodium–lithium countertransport (SLC)

data analyzed in Roeder (1994). The data from 190 individuals is characterized by a single type of

three genotypes A1A1, A1A2, and A2A2. Roeder (1994) fitted a three-component normal mixture

model with equal variances to this data. The MLE for θ =

π1

ξ1

ω

,

π2

ξ2

ω

,

π3

ξ3

ω

was

θ =

0.78

0.22

0.003

,

0.20

0.38

0.003

,

0.02

0.58

0.003

. Note that we used multiple starting values in the

EM algorithm, and found that there was only the MLE modal group.

We calculated the confidence level corresponding to the elevation of the MLE for a class of

degenerate parameters (i.e., a two-component normal mixture with unknown equal variance)

in the full parameter space of dimension 6, which was Conf6(ψdgnt) = 0.86 (corresponding to

Conf2(ψdgnt) = 0.998). Since ξj is univariate,we have a unimodal partition at any six-dimensional

confidence level below 86%. However, we can construct two-dimensional profiles up to 99.8%

confidence.

Figure 4b is a simulation-based profile plots for (π, ξ) at the elevation c with Conf1(c) =0.986 (corresponding to Conf6(c) = 0.576), showing the relationship between πj and ξj for each

component in a single plot. Note that we employed a modal simulation method with 4,000 rays,

and found that there was no star-shaped outlier.

Example 2. The second example concerns presenting the relationships between the multiple

parameters in a Poissonmixturemodel. The datawe use here is based on a cohort study in northeast

Thailand (Schelp et al., 1990) where the health status of 602 preschool children checked every 2

weeks from June 1982 until September 1985. Each childwas examined if she/he showed symptoms

of fever, cough, running nose, or these symptoms together. The data were the frequencies of these

illness spells during the study period (see the left plot of Figure 2 for a histogram). Bohning

et al. (1992) and Schlattmann (2005) fitted four-component Poisson mixture model to this data:

g(y; θ) = ∑4j=1 πj(e

−ξj ξyj /x!) where θ =

[(π1

ξ1

),

(π2

ξ2

),

(π3

ξ3

),

(π4

ξ4

)]. The MLE for

θ θ =[(

0.197

0.14

),

(0.48

2.82

),

(0.27

8.16

),

(0.05

16.16

)](see the left plot of Figure 2 for a fitted

mixture). The log likelihood of the MLE for the class of degenerate parameters (i.e., a three-

component Poisson mixture model) in the full parameter space of dimension 7 was −1568.28

and then Conf7(ψdgnt) was larger than 0.99. Since ξj was univariate and there was no secondary

mode, Conf7(ψdgnt) was the exact upper bound on the confidence levels for a unimodal partition

at this data.

For the simultaneous inference on the four-component-specific parameters (ξ1, ξ2, ξ3, and

ξ4), we construct the profile confidence sets for every pair of these parameters (six pairs). For

application of the modal simulation method to this data we used 6,000 rays and there was no star-

shaped outlier. The right plot of Figure 2 is a matrix scatterplot of a sampling-based profiles for

every pair of (ξ1, ξ2, ξ3, and ξ4) at the elevation c with Conf2(c) = 0.95. Notice that the samples

from a single simulation can represent the correlative relationship between the parameters of

interest. We also observe that there is boundary effect for ξ1 and the likelihood provides proper

confidence sets at boundaries, which is an argument for using the likelihood-based confidence

sets, instead of the Wald confidence sets (see the first row in the matrix scatter plot).

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique

436 KIM AND LINDSAY Vol. 39, No. 3

Figure 5: (a) A histogram and a fitted mixture density for the northeast Thailand morbidity data (a solidred is an estimated mixture density and a broken blue is a fitted component density); (b) a matrix scatterplotof 95% sampling-based profiles for every pair of (ξ1, ξ2, ξ3, and ξ4). [Color figure can be seen in the online

version of this article, available at http://wileyonlinelibrary.com/journal/cjs]

9. CONCLUSION

In this article we proposed a strategy of using the modal simulation to describe the labelled

confidence regions generated by the likelihood in a finite mixture model. We showed by examples

that the modal simulation method, plus accompanying data analysis, provides a wider and more

useful set of tools than standard numerical analysis in describing the mixture likelihood region

for the parameters of interest, even when there are multiple modes.

BIBLIOGRAPHYAgresti, A. (2002). Categorical Data Analysis, 2nd ed., Wiley, New York.

Bohning, D., Schlattmann, P. & Lindsay, B. G. (1992). C.A.MAN-computer assisted analysis of mixtures:

Statistical algorithms. Biometrics, 48, 283–303.Cox, D. R. & Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London.

Crawford, S. L. (1994). An application of the Laplace method to finite mixture distributions.

Journal of the American Statistical Association, 89, 259–267.Davison, A. C. & Hinkley, D. V. (1997). Bootstrap Methods and Their Application, Cambridge University

Press, Cambridge.

Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the

EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.Efron, B. & Tibshirani, R. J. (1993) An Introduction to the Bootstrap. Chapman & Hall.

Jasra, A., Holmes, C. C. & Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label

switching problem in Bayesian mixture modeling. Statistical Science, 20, 50–67.Kalbfleisch, J. D. & Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd ed.,

Wiley, New York.

Kim, D. & Lindsay, B. G. (2011a). Using confidence distribution sampling to visualize confidence sets.

Statistica Sinica, 21, 923–948.Kim,D.&Lindsay, B. G. (2011b). Empirical Identifiability and the Topology ofMixture LikelihoodRegions.

Submitted for publication.

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs

2011 MODAL SIMULATION AND VISUALIZATION OF THE MIXTURE LIKELIHOOD 437

Lang, J. B. (2008). Score and profile likelihood confidence intervals for contingency table parameters.

Statistics in Medicine Science, 27, 5975–5990.Lindsay, B. G. (1995). Mixture Models: Theory, Geometry, and Applications. NSF-CbvS Regional

Conference Series in Probability and Statistics, Vol. 5, Institute of Mathematical Statistics:

Hayward, CA.

Lindsay, B. G. & Qu, A. (2003). Inference functions and quadratic score tests. Statistical Science, 18, 394–410.

Matsumoto, Y. (2002). An Introduction to Morse Theory. Translations of Mathematical Monographs, Vol.

208, American Mathematical Society, Providence.

Meeker, W. Q. & Escobar, L. A. (1995). Teaching about approximate confidence regions based on maximum

likelihood estimation. The American Statistician, 49, 48–53.Owen, A. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75,

237–249.

Owen, A. (2001). Empirical Likelihood. Chapman & Hall, New York.

Pang, C. H. J. (2010). Level set methods for finding saddle points of general Morse index. Available from

http://arxiv.org/pdf/1001.0925v1.

Qin, J. & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics,22, 300–325.

Rao, C. R. (1948). Large sample tests of statistical hypotheses concerning several parameters with applica-

tions to problems of estimation. Proceedings of the Cambridge Philosophical Society, 44, 50–57.Redner, R. A. & Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm.

SIAM Review, 26, 195–239.Roeder, K. (1994). A graphical technique for determining the number of components in amixture of normals.

Journal of the American Statistical Association, 89, 487–495.Schelp, F. P., Vivatanasept, P., Sitaputra, P., Sornmani, S., Pongpaew, P., Vudhivai, N., Egormaiphol, S. &

Bohning, D. (1990). Relationship of the morbidity of under-fives to anthropometric measurements and

community health intervention. Tropical Medicine and Parasitology, 41, 121–126.Schlattmann, P. (2005). On bootstrapping the number of components in finite mixtures of Poisson distribu-

tions. Statistics and Computing, 15, 179–188.Teicher, H. (1960). On the mixture of distributions. The Annals of Mathematical Statistics, 31, 55–73.Teicher, H. (1963). Identifiability of finite mixtures. The Annals of Mathematical Statistics, 34, 1265–1269.Yakowitz, S. J. & Spragins, J. D. (1968). On the identifiability of finite mixtures.

The Annals of Mathematical Statistics, 39, 209–214.Yao, W. & Lindsay, B. G. (2009). Bayesian mixture labeling by highest posterior density.

Journal of the American Statistical Association, 104, 758–767.

Received 21 May 2010Accepted 21 May 2011

DOI: 10.1002/cjs The Canadian Journal of Statistics / La revue canadienne de statistique