modeling compositional data

30
Modeling compositional data

Upload: jermaine-alford

Post on 31-Dec-2015

27 views

Category:

Documents


4 download

DESCRIPTION

Modeling compositional data. Some collaborators. Deformations: Paul Sampson Wendy Meiring, Doris Damian Space-time: Tilmann Gneiting Francesca Bruno Deterministic models: Montserrat Fuentes, Peter Challenor Markov random fields: Finn Lindstr ö m Wavelets: Don Percival - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling   compositional data

Modeling compositional data

Page 2: Modeling   compositional data

Some collaborators

Deformations: Paul SampsonWendy Meiring, Doris DamianSpace-time: Tilmann GneitingFrancesca BrunoDeterministic models: Montserrat Fuentes, Peter ChallenorMarkov random fields: Finn LindströmWavelets: Don PercivalBrandon Whitcher, Peter Craigmile, Debashis Mondal

Page 3: Modeling   compositional data

Background

NAPAP, 1980’s

Workshop on biological monitoring, 1986

Dirichlet process: Gary Grunwald, 1987

Current framework: Dean Billheimer, 1995

Other co-workers: Adrian Raftery, Mariabeth Silkey, Eun-Sug Park

Page 4: Modeling   compositional data

Compositional data

Vector of proportions

Proportion of taxes in different categories

Composition of rock samples

Composition of biological populations

Composition of air pollution

z =(z1,...,zk)T zi >0 zi =11

k

∑ z∈∇k−1

Page 5: Modeling   compositional data

The triangle plot

Proportion 1

0

1

1

0

1

0

Proportion 2

Proportion 3

(0.55,0.15,0.30)

Page 6: Modeling   compositional data

The spider plot

(0.40,0.20,0.10,0.05,0.25)

0.2

0.4

0.6

0.8

1.0

Page 7: Modeling   compositional data

An algebra for compositions

Perturbation: For define

The composition acts as a zero, so .

Set so .

Finally define .

ξ,α ∈∇k−1

ξ ⊕ α =ξ1α1

ξ iα i1

k

∑,...,

ξkαk

ξ iα i1

k

⎜ ⎜

⎟ ⎟∈∇k−1

ι =1k,...,

1k

⎛ ⎝ ⎜

⎞ ⎠ ⎟

ξ⊕ι =ξ

ξ−1 =1ξ1

,...,1ξk

⎝ ⎜ ⎜

⎠ ⎟ ⎟ ξ⊕ξ−1 =ι

ξ−η=ξ⊕η−1

Page 8: Modeling   compositional data

The logistic normal

If

we say that z is logistic normal, in short Z ~ LN(,).

Other distributions on the simplex:

Dirichlet — ratios of independent gammas

“Danish” — ratios of independent inverse Gaussian

Both have very limited correlation structure.

alr(z)= logz1

zk,...,log

zk−1

zk

⎝ ⎜ ⎜

⎠ ⎟ ⎟

T

~MVN(μ,Σ)

Page 9: Modeling   compositional data

Scalar multiplicationLet a be a scalar. Define

is a complete inner product space, with inner product given, e.g., by

N is the multinomial covariance N=I+jjT

j is a vector of k-1 ones.

is a norm on the simplex.

The inner product and norm are invariant to permutations of the components of the composition.

ξ⊗a=ξ1

a

ξia∑,...,

ξka

ξia∑

⎝ ⎜ ⎜

⎠ ⎟ ⎟

∇k−1,⊕,⊗( )

ξ,η =alr(ξ)TN−1alr(η)

ξ = ξ,ξ

Page 10: Modeling   compositional data

Some models

Measurement error:

where j ~ LN(0,) .Regression:

Correspondence in Euclidean space:

ξj ξ uj

zj =ξ⊕εj

ξj =ξ⊕γ⊗uj

compositions

centeredcovariate

μj = β0 + β1 (xj −x )

alr−1(μj)=alr−1(β0)⊕alr−1(β1)⊗(xj −x )

Page 11: Modeling   compositional data

Some regression lines

Page 12: Modeling   compositional data

Time series (AR 1)

zk+1 =φ⊗ zk ⊕ k

Page 13: Modeling   compositional data

A source receptor model

Observe relative concentration Yi of k species at a location over time.

Consider p sources with chemical profiles j. Let αi be the vector of mixing proportions of the different sources at the receptor on day i.

~ LN, αi ~ indep LN, i ~ zero mean LN

EYi = αiji=1

p

∑ θj =Θαi

Y =Θαi ⊕εi

Page 14: Modeling   compositional data

Juneau air quality

50 observations of relative mass of 5 chemical species. Goal: determine the contribution of wood smoke to local pollution load.

Prior specification:

Inference by MCMC.

f(,α i, i,α ,Γ, ) =

f(α i α ,Γ) f( i )f(α )f(Γ)f( )

Page 15: Modeling   compositional data

Wood smoke contribution

95% CL

50% CL

Page 16: Modeling   compositional data

Source profiles

(fluoranthene)

(pyrene)

(benzo(a))

(chrysene)

(benzo(b))

Page 17: Modeling   compositional data

State-space model

Space-time model of proportions

State-space model:

zj unobservable composition ~ LN(j,j)

yj k-vector of counts ~ Mult(

Inference using MCMC again

yj[ ]ii=1

k

∑ ,zj )

Page 18: Modeling   compositional data

Stability of arthropod food webs

Omnivory thought to destabilize ecological communities

Stability: Capacity to recover from shock (relative abundance in trophic classes)

Mount St. Helens experiment: 6 treat-ments in 2-way factorial design; 5 reps.Predator manipulation (3 levels)Vegetation disturbance (2 levels)

Count anthropods, 6 wks after treatment. Divide into specialized herbivores, general herbivores, predators.

Page 19: Modeling   compositional data

Specification of structure

is generated from independent observations at each treatment

mean depends only on treatment

Page 20: Modeling   compositional data
Page 21: Modeling   compositional data
Page 22: Modeling   compositional data

Benthic invertebrates in estuary

EMAP estuaries monitoring program: Delaware Bay 1990. 25 locations, 3 grab samples of bottom sediment during summer

Invertebrates in samples classified into–pollution tolerant–pollution intolerant–suspension feeders (control group; mainly palp worms)

Page 23: Modeling   compositional data

Site j, subsample t

j ~ CAR process z jt : LN( j +βxj,Ψ)

E( j −j ) = +λnj

(kk∈N( j)∑ −)

Var( j −j ) =Γnj

Page 24: Modeling   compositional data
Page 25: Modeling   compositional data

Effect of salinity

Page 26: Modeling   compositional data
Page 27: Modeling   compositional data
Page 28: Modeling   compositional data
Page 29: Modeling   compositional data
Page 30: Modeling   compositional data