f071604: bayesian statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf ·...

36
F071604: Bayesian Statistics Luo Shan Department of Mathematics Shanghai Jiao Tong University February 28th, 2014 Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 1 / 36

Upload: others

Post on 22-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

F071604: Bayesian Statistics

Luo Shan

Department of Mathematics

Shanghai Jiao Tong University

February 28th, 2014

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 1 / 36

Page 2: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Course Information

Consulting Time and Venue:Fridays (2:00 -5:00pm), Math Building 2304

Text Book:Berger, James O. Statistical Decision Theory and Bayesian Analysis.Springer, 1985.

Reference Books:1: Ghosh, J. K., Delampady, M., & Samanta, T. (Eds.). An introduction

to Bayesian analysis: theory and methods. Springer, 2006.2: +t, )Õâ. dÚO. ¥IÚOÑ, 2012.3: Ming-hui Chen, Dipak K.Dey, Peter Muller, Dong-chu Sun, Ke-ying Ye.

Frontiers of statistical decision making and Bayesian analysis: In Honorof James O. Berger. Springer, 2010.

Evaluation:

1: Mid-term paper and presentation (30%, ReferencePapersList.pdf ).2: Final-term examination (70%, one A4 size help sheet is allowed).

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 2 / 36

Page 3: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Chapter 1: Basic Concepts

1 Introduction

2 Basic Elements

3 Expected Loss, Decision Rules, and RiskBayesian Expected LossFrequentist Risk

4 Randomized Decision Rules

5 Decision PrinciplesThe Conditional Bayes Decision PrincipleFrequentist Decision Principles

6 Sufficient Statistics

7 Convexity

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 3 / 36

Page 4: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Introduction

(1)

Statistical decision theory : to make decisions in the presence ofstatistical knowledge which sheds light on some of the uncertainties(denoted by θ)

Classical statistics: to use sample information to make inferencesabout θ.

Decision theory : to combine the sample information with otherrelevant aspects to make the decision

Two typical types of relevant nonsample information:

A knowledge of the possible consequences of the decisions (loss forvarious values of θ: loss function)

Information arising from sources other than the statistical investigation,generally comes from past experience (prior information)

Bayesian analysis: a statistical approach which formally seeks toutilize prior information

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 4 / 36

Page 5: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

Notations and Terminologies

θ: the unknown quantity, state of nature (parameter)

Θ: the set of all possible states of nature (parameter space)

a: actions (decisions)

A: the set of all possible actions

L(θ, a): the loss function, θ is the true state of nature, a is the actiontaken, (L(θ, a) ≥ −K > −∞)

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 5 / 36

Page 6: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

Notations and Terminologies (cont.)

X = (X1,X2, · · · ,Xn): the outcome, Xi s are independentobservations from a common distribution

x : a particular realization of X

H: the set of possible outcomes (the sample space,e.g,Rn)

Pθ(A): or Pθ(X ∈ A): the probability of the event A (A ( H) when θis the true state of nature.

For example, X will be assumed to be either a continuous or a discreterandom variable with density f (x |θ), let FX (x |θ) be the cumulativedistribution function (c.d.f), then Pθ(A) =

∫A dFX (x |θ), the expectation

of function h(x), Eθ[h(X)] is defined to be∫H h(x)dFX (x |θ)

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 6 / 36

Page 7: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

Prior Information

It is useful and natural to state prior beliefs in terms of probabilities ofvarious possible values of θ being true.

Let π(θ) represent a prior density of θ (continuous or discrete), if A ( Θ,

P(θ ∈ A) =

∫A

dF π(θ) = ∫A π(θ)dθ (continuous)∑θ∈A π(θ) (discrete)

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 7 / 36

Page 8: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

An Example:

Example 1

A drug company needs to decide on whether or not to market the drug,how much to market, what price to charge, etc. Two main factors affectingits decision are the proportion of people for which the drug will proveeffective θ1, and the proportion of the market the drug will capture (θ2).

Assume it is desired to estimate θ2. Clearly, Θ = [0, 1] and A = [0, 1].

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 8 / 36

Page 9: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

An Example (cont.):

The loss function might be

L(θ2, a) =

θ2 − a if θ2 − a ≥ 0

2(a− θ2) if θ2 − a ≤ 0

The sample information about θ2 is obtained from a sample survey.For example, assume n people are interviewed, and the number X whowould buy the drug is observed. It might be reasonable to assume

f (x |θ2) =

(n

x

)θx2 (1− θ2)n−x .

There could well be considerable prior information about θ2, arisingfrom previous introductions of new similar drugs into the market.Assume the prior information could be modeled by giving θ2 aU(0.1, 0.2) prior density, i.e.,letting

π(θ2) = 10I(0.1,0.2)(θ2)

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 9 / 36

Page 10: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

It is important to note that:

Loss function and prior information could be very vague or evennonunique

The goal in statistical inference is to provide a “summary” of thestatistical evidence which a wide variety of future “users” of thisevidence can easily incorporate into their own decision-makingprocesses

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 10 / 36

Page 11: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

The reasons for involvement of losses and prior information in inference:

Reports from statistical inferences should (ideally) be constructed sothat they can be easily utilized in individual decision making

The investigator may very well possess such information; he will oftenbe very informed about the uses to which his inferences are likely tobe put, and may have considerable prior knowledge about thesituation. It is then almost imperative that he present suchinformation in his analysis

The choice of an inference can be viewed as a decision problem

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 11 / 36

Page 12: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Basic Elements

To summarize: decision theory can be useful even when such incorporationis proscribed. This is because many standard inference criteria can beformally reproduced as decision-theoretic criteria with respect to certainformal loss functions.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 12 / 36

Page 13: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Bayesian Expected Loss

Chapter 1

1 Introduction

2 Basic Elements

3 Expected Loss, Decision Rules, and RiskBayesian Expected LossFrequentist Risk

4 Randomized Decision Rules

5 Decision PrinciplesThe Conditional Bayes Decision PrincipleFrequentist Decision Principles

6 Sufficient Statistics

7 Convexity

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 13 / 36

Page 14: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Bayesian Expected Loss

Definition 2 (Bayesian Expected Loss)

lf π?(θ) is the believed probability distribution of θ at the time of decisionmaking, the Bayesian expected loss of an action a isρ(π?, a) = Eπ

?L(θ, a) =

∫Θ L(θ, a)dFπ?

(θ).

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 14 / 36

Page 15: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Bayesian Expected Loss

Example 3 (Example 1 cont.)

Assume no data is obtained, so that the believed distribution of θ2 issimply π(θ2) = 10I(0.1,0.2)(θ2). Then

ρ(π, a) =

∫ 1

0L(θ2, a)π(θ2)dθ2

=

∫ a

02(a− θ2)10I(0.1,0.2)(θ2)dθ2

+

∫ 1

a(θ2 − a)10I(0.1,0.2)(θ2)dθ2

=

0.15− a if a ≤ 0.1

15a2 − 4a + 0.3 if 0.1 ≤ a ≤ 0.22a− 0.3 if a ≥ 0.2

.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 15 / 36

Page 16: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Frequentist Risk

Chapter 1

1 Introduction

2 Basic Elements

3 Expected Loss, Decision Rules, and RiskBayesian Expected LossFrequentist Risk

4 Randomized Decision Rules

5 Decision PrinciplesThe Conditional Bayes Decision PrincipleFrequentist Decision Principles

6 Sufficient Statistics

7 Convexity

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 16 / 36

Page 17: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Frequentist Risk

Nonrandomized decision rule

Definition 4 (Nonrandomized Decision Rule)

A (nonrandomized) decision rule δ(x) is a function from H into A. IfX = x is the observed value of the sample information, then δ(x) is theaction that will be taken. ( For a no-data problem, a decision rule issimply an action.) Two decision rules, δ1 and δ2, are considered equivalentif Pθ(δ1(X ) = δ2(X )) = 1 for all θ.

Example 5 (Example 1 cont.)

For the situation of Example 1, δ(x) = x/n is the standard decision rule(estimator) for estimating θ2.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 17 / 36

Page 18: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Frequentist Risk

Definition 6 (Risk Function For Nonrandomized Decision Rules)

The risk function of a decision rule δ(x) is defined byR(θ, δ) = EX

θ [L(θ, δ(X ))] =∫H L(θ, δ(x))dFX (x |θ). For a no-data

problem, R(θ, δ) ≡ L(θ, δ).

To a frequentist, it is desirable to use a decision rule δ which has smallR(θ, δ). However, whereas the Bayesian expected loss of an action was asingle number. the risk is a function on Θ, and since θ is unknown wehave a problem in saying what “small” means.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 18 / 36

Page 19: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Frequentist Risk

Definition 7

A decision rule δ1, is R − better than a decision rule δ2 ifR(θ, δ1) ≤ R(θ, δ2) for all θ ∈ Θ, with strict inequality for some θ. A ruleδ1 is is R − equivalent to δ2 if R(θ, δ1) = R(θ, δ2) for all θ.

Definition 8 (Admissibility)

A decision rule δ is admissible if there exists no R − better decision rule. Adecision rule δ is inadmissible if there does exist an R − better decisionrule.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 19 / 36

Page 20: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Frequentist Risk

Example 9

Assume X is N (θ, 1) and that it is desired to estimate θ under lossL(θ, a) = (θ − a)2. (This loss is called squared-error loss.) Consider thedecision rules δc(x) = cx . Clearly the risk functionR(θ, δx) = EX

θ L(θ, δc(X )) = EXθ (θ − cX )2 = c2 + (1− c)2θ2. Hence, δ1 is

R − better than δc for c > l , the rules δc are inadmissible for c > l .

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 20 / 36

Page 21: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Expected Loss, Decision Rules, and Risk Frequentist Risk

Definition 10 (Bayes Risk)

The Bayes risk of a decision rule δ, with respect to a prior distribution πon Θ is defined as r(π, δ) = Eπ[R(θ, δ)].

Example 11 (Example 9 cont.)

In Example 9, suppose that π(θ) is a N(0, τ2) density. Then, for thedecision rule δc ,

r(π, δc) = Eπ[R(θ, δc)] = Eπ[c2 + (1− c)2θ2] = c2 + (1− c)2τ2.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 21 / 36

Page 22: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Randomized Decision Rules

Example 12 (Matching Pennies)

Persons A and B are to simultaneously uncover a penny (they cansecretly turn the penny to heads or tails)

If the two coin match, A wins $1 from B; otherwise, B wins $1 from A

Two available actions for A: a1-choose heads, or a2-choose tails

Two possible states of nature: θ1-B’s coin is a head, and θ2-B’s coinis a tail

Both a1 and a2 are admissible actions (Why?)

If the game is to be played a number of times, a way of preventing ultimatedefeat is to choose a1 and a2 with probabilities p and l − p respectively.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 22 / 36

Page 23: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Randomized Decision Rules

Definition 13 (Randomized Decision Rule)

A randomized decision rule δ?(x , ·) is, for each x , a probability distributionon A§ with the interpretation that if x is observed, δ?(x ,A) is theprobability that an action in A (a subset of A) will be chosen. In no-dataproblems, a randomized decision rule, also called a randomized action,denoted by δ?(·) is a probability distribution on A.

Remark: Nonrandomized decision rules will be considered a special caseof randomized rules, in that they correspond to the randomized ruleswhich, for each x , choose a specific action with probability 1. If δ(x) is anonrandomized decision rule, let < δ > denote the equivalent randomizedrule given by

< δ > (x ,A) = IA(δ(x)) =

1 if δ(x) ∈ A,0 if δ(x) /∈ A

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 23 / 36

Page 24: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Randomized Decision Rules

Example 14

In Example 12, the randomized action is defined byδ?(a1) = p, δ?(a2) = 1− p. It is equivalent to be denoted byδ? = p < a1 > +(1− p) < a2 > .

Definition 15 (Risk Function for Randomized Decision Rules)

The loss function L(θ, δ?(x , ·)) of the randomized rule δ? is defined to beL(θ, δ?(x , ·)) = Eδ

?(x ,·)[L(θ, a)], where the expectation is taken over a (arandom variable with distribution δ?(x , ·)). The risk function of δ? isR(θ, δ?) = EX

θ [L(θ, δ?(X , ·))].

Definition 16

Let D? be the set of all randomized decision rules δ? for whichR(θ, δ?) < +∞ for all θ. A decision rule will be said to be admissible ifthere exists no R-better randomized decision rules in D?.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 24 / 36

Page 25: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Decision Principles The Conditional Bayes Decision Principle

Chapter 1

1 Introduction

2 Basic Elements

3 Expected Loss, Decision Rules, and RiskBayesian Expected LossFrequentist Risk

4 Randomized Decision Rules

5 Decision PrinciplesThe Conditional Bayes Decision PrincipleFrequentist Decision Principles

6 Sufficient Statistics

7 Convexity

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 25 / 36

Page 26: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Decision Principles The Conditional Bayes Decision Principle

Theorem 17 (The Conditional Bayes Principle)

Choose an action a ∈ A which minimize ρ(π?, a) (def 2) (assume theminimum is attained). Such an action will be called a Bayes action andwill be denoted by aπ

?.

Example 18 (Example 3 cont.)

In Example 3, when a = 2/15, ρ(π, a) reaches its minimum 1/30. Hencethe Bayes estimator of θ2 is 2/15 assuming no data was available.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 26 / 36

Page 27: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Decision Principles Frequentist Decision Principles

Chapter 1

1 Introduction

2 Basic Elements

3 Expected Loss, Decision Rules, and RiskBayesian Expected LossFrequentist Risk

4 Randomized Decision Rules

5 Decision PrinciplesThe Conditional Bayes Decision PrincipleFrequentist Decision Principles

6 Sufficient Statistics

7 Convexity

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 27 / 36

Page 28: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Decision Principles Frequentist Decision Principles

Theorem 19 (The Bayes Risk Principle)

A decision rule δ1 is preferred to a rule δ2 if r(π, δ1) < r(π, δ2). A decisionrule which minimizes r(π, δ) is optimal; it is called a Bayes rule and will bedenoted by δπ. The quantity r(π) = r(π, δπ) is then called the Bayes riskfor π.

Example 20 (Example 11 cont.)

In Example 11, r(π, δc) is minimized when c = c0def= τ2/(1 + τ2), hence

δc0 is the Bayes rule (or Bayes estimator).

Remark: In a no-data problem, the Bayes risk principle will give the sameanswer as the conditional Bayes decision principle.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 28 / 36

Page 29: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Decision Principles Frequentist Decision Principles

Theorem 21 (The Minimax Principle)

A decision rule δ?1 is preferred to a rule δ?2 if supθ

R(θ, δ?1) < supθ

R(θ, δ?2). A

rule δ?M is a minimax decision rule if it minimizes R(θ, δ?) among allrandomized rules in D?, i.e,

supθ

R(θ, δ?M) = infδ?∈D?

supθ

R(θ, δ?)(minimax value)

Remark: minimax action (minimax decision rule in no-data problems),minimax nonrandomized rule, minimax nonrandomized action.

Example 22 (Example 9 cont.)

In Example 9, for decision rules δc ,

supθ

R(θ, δc) =

1 if c = 1∞ if c 6= 1

The minimax rule is δ1.Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 29 / 36

Page 30: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Decision Principles Frequentist Decision Principles

Theorem 23 (The Invariance Principle)

If two decision problems have the same formal structure(e.g, X, Y=g(X)),then the same decision rule should be used in each problem.i.e, let δ, δ? bethe decision rules used in the X and Y problems, then g(δ(x)) = δ?(y). Ifa decision problem is invariant under a group φ of transformations, a(nonrandomized) decision rule δ(x) is invariant under φ if for allx ∈ H, g ∈ φ, δ(g(x)) = g(δ(x)).

Remark: For the comparisons of the above principles, please refer to Sec1.6.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 30 / 36

Page 31: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Sufficient Statistics

Definition 24 (Sufficient Statistics)

Let X be a random variable whose distribution depends on the unknownparameter θ. but is otherwise known. A function T of X is said to be asufficient statistic for θ if the conditional distribution of X, givenT (X) = t, is independent of θ (with probability one).

Definition 25

If T (X) is a statistic with range F (i.e.F = T (x) : x ∈ H), the partitionof H induced by T is the collection of all sets of the formHt = x ∈ H : T (x) = t for t ∈ F .

Definition 26

A sufficient partition of H is a partition induced by a sufficient statistic T .

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 31 / 36

Page 32: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Sufficient Statistics

Theorem 27

Assume that T is a sufficient statistic for θ, and let δ?0(x , ·) be anyrandomized rule in D?. Then (subject to measurability conditions) thereexists a randomized rule δ?1(t, ·), depending only on T (x), which is R-equivalent to δ?0 .

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 32 / 36

Page 33: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Convexity

Definition 28

A set Q ( Rm is convex if for any two points x and y in Ω, the point[αx + (l − α)y] is in Ω for 0 ≤ α ≤ 1. (i.e, the line segment joining x andy is a subset of Ω.)

Definition 29

If x1, x2, · · · is a sequence of points in Rm, and O ≤ αi ≤ 1 are

numbers such that∞∑i=1

αi = 1, then∞∑i=1

αixi (providing it is finite) is called

a convex combination of the xi. The convex hull of a set Ω is the set ofall points which are convex combinations of points in Ω.

Definition 30

A real valued function g(x) defined on a convex set Ω is convex ifg(αx + (1− α)y) ≤ αg(x) + (1− α)g(y) for all x ∈ Ω, and 0 < α < 1. Ifthe inequality is strict for x 6= y, then g is strictly convex. Ifg(αx + (1− α)y) ≥ αg(x) + (1− α)g(y) then g is concave. If theinequality is strict for x 6= y, then g is strictly concave.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 33 / 36

Page 34: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Convexity

Lemma 31

Let g(x) be a function defined on an open convex subset Ω of Rm for

which all second-order partial derivatives g (i ,j)(x) = ∂2

∂xi∂xjg(x) exist and

are finite. Then g is convex if and only if the matrix G = (g (i ,j)(x)) isnonnegative definite for x ∈ Ω. Likewise, g is concave if −G isnonnegative definite. If G is positive (negative) definite, then g is strictlyconvex (strictly concave).

Lemma 32

Let X be an m-variate random vector such that E(|X|) <∞ andP(X ∈ Ω) = 1, where Ω is a convex subset of Rm. Then E(X) ∈ Ω.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 34 / 36

Page 35: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Convexity

Theorem 33 (Jensen’s Inequality)

Let g(x) be a convex real-valued function defined on a convex subset Ω ofRm, and let X be an m-variate random vector for which E(|X|) <∞.Suppose also that P(X ∈ Ω) = 1. Then g(E(X)) ≤ E[g(X)], with strictinequality if g is strictly convex and X is not concentrated at a point.

Theorem 34

Assume that A is a convex subset of Rm, and that for each θ ∈ Θ the lossfunction L(θ, a) is a convex function of a. Let δ? be a randomized decisionrule in D? for which Eδ

?(x ,·)[|a|] <∞ for all x ∈ H. Then (subject tomeasurability conditions) the nonrandomized rule δ(x) = Eδ

?(x ,·)[a] hasL(θ, δ(x)) ≤ L(θ, δ?(x , ·)) for all x and θ.

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 35 / 36

Page 36: F071604: Bayesian Statisticsluoshan.weebly.com/uploads/2/5/3/1/25310789/chapter1.pdf · 2018-10-19 · Frontiers of statistical decision making and Bayesian analysis: In Honor of

Convexity

Theorem 35 (Rao-Blackwell Theorem)

Assume that A is a convex subset of Rm and that L(θ, a) is a convexfunction of a for θ ∈ Θ. Suppose also that T is a sufficient statistic for θ,and that δ0(x) is a nonrandomized decision rule in D. Then the decisionrule, based on T (x) = t, defined by δ1(t) = EX |t [δ0(X)], is R-equivalentto or R-better than δ0, provided the expectation exists.

Proof.

By the definition of a sufficient statistic, the expectation above does notdepend on θ, so that δ1 is an obtainable decision rule. By Jensen’sinequality L(θ, δ1(t)) = L(θ,EX |t [δ0(X)]) ≤ EX |t [L(θ, δ0(X))]. Hence

R(θ, δ1) = ETθ [L(θ, δ1(T ))]

≤ ETθ [EX |T [L(θ, δ0(X))]]

= EX [L(θ, δ0(X))] = R(θ, δ0).

Luo Shan (SJTU) F071604: Bayesian Statistics February 28th, 2014 36 / 36