uncertain inference and artificial intelligence¬cial intelligence probabilistic inference...

31
Artificial Intelligence Probabilistic Inference Inferential Models Benchmark Problems Uncertain Inference and Artificial Intelligence Chuanhai Liu 1 March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Chuanhai Liu 1 Uncertain Inference and Artificial Intelligence

Upload: vanquynh

Post on 25-Apr-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Uncertain Inference and Artificial Intelligence

Chuanhai Liu1

March 3, 2011

1Prepared for a Purdue Machine Learning Seminar

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Acknowledgement

◮ Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory.

◮ Jianchun Zhang, Ryan Martin, Duncan Ermini Leaf, Zouyi Zhang, Huiping Xu,Jing-Shiang Hwang, Jun Xie, and Hyokun Yun for collaborations on a variety ofIM research projects.

◮ NSF support for a joint project with Jun Xie on large-scale multinomialinference and its applications in genome-wide association studies.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

References

Martin, R. and Liu, C. (2011, Inferential Models) and the references therein.

A possible tbook (Liu and Martin, 2012+; Inferential Models — Reasoning with Uncertainty)having the futures:

◮ A prior-free and valid probabilistic inference system, which is promising forserious applications of statistics.

◮ Fully developed valid probabilistic inferential methods for textbook problems

◮ A large collection of applications to modern, challenging, and large-scalestatistical problems

◮ Deeper understanding of existing schools of thought and their strengths andweaknesses.

◮ Satisfactory solutions to well-known benchmark problems, including Stein’sparadox and the Behrens-Fisher problem

◮ A direct attack on the source of uncertainty, which makes learning and teachingeasier and more enjoyable

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Abstract

It is difficult, perhaps, to believe that artificial intelligence canbe made intelligent enough without a valid probabilisticinferential system as a critical module. After a brief review ofexisting schools of thought on uncertain inference, we introducea valid probabilistic inferential framework termed inferentialmodels (IMs). With several simple and benchmark examples,we discuss potential applications of IMs in artificial intelligence(in general and machine learning in particular).

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Artificial intelligenceMachine learningLearning from data

What is it? An answer from the web

Artificial Intelligence (AI) is the area of computer science focusing on creatingmachines that can engage on behaviors that humans consider intelligent.

The ability to create intelligent machines has intrigued humans since ancient times,

and today with the advent of the computer and 50 years of research into AI

programming techniques, the dream of smart machines is becoming a reality.

Researchers are creating systems which can mimic human thought, understand speech,

beat the best human chess player, and countless other feats never before possible.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Artificial intelligenceMachine learningLearning from data

Is the answer precise?

If not, blame on Google’s machine learning algorithms

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Artificial intelligenceMachine learningLearning from data

What is it? An answer from the web

Machine learning has been central to AI research from the beginning. Unsupervised

learning is the ability to find patterns in a stream of input. Supervised learning

includes both classification and numerical regression. Classification is used to

determine what category something belongs in, after seeing a number of examples of

things from several categories. Regression takes a set of numerical input/output

examples and attempts to discover a continuous function that would generate the

outputs from the inputs. In reinforcement learning the agent is rewarded for good

responses and punished for bad ones. These can be analyzed in terms of decision

theory, using concepts like utility. The mathematical analysis of machine learning

algorithms and their performance is a branch of theoretical computer science known as

computational learning theory.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Artificial intelligenceMachine learningLearning from data

The inference problem

◮ Input

1. Data x — observed observable quantities X ∈ X.2. Assertion A — statements on θ ∈ Θ, unknown quantities.3. Association between X and θ.

For example, x is a sample of the population characterized by the cdf

Fθ(.).

◮ Output:

1. Probabilistic uncertainty assessments on the truth or the falsityof A given X = x .

2. Plausible regions for θ and its functions.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Intelligence and uncertaintyProbability modelsStatistical modelsExisting schools of thought

Uncertain inference

is critical to AI — No?

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Intelligence and uncertaintyProbability modelsStatistical modelsExisting schools of thought

One (simple) kind of uncertain inference

Probability models A probability model has a meaningful/validprobability distribution assumed to be adequate foreverything. In particular, θ has a valid marginaldistribution that can be operated via the usualprobability calculus to derive valid, e.g., marginal andconditional posterior distributions.

Subjective Bayesian Philosophically, every Bayesian is subjective.

◮ Bayes was not Bayesian.◮ What’s wrong? Nothing is wrong — you make

the decision and (you or your clients) shouldtake the consequence.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Intelligence and uncertaintyProbability modelsStatistical modelsExisting schools of thought

Statistical models

Statistical models In what follows, we consider the cases whereyou don’t have valid distributions for everything,which we refer to as Statistical Models.

θ is taken to be unknown.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Intelligence and uncertaintyProbability modelsStatistical modelsExisting schools of thought

“Objective” Bayesian — a personal view

The idea can be viewed as to use magic priors to approximate(ideal) frequentist results.

Remarks:

◮ Assertion-specific priors: Certain priors can work for certain assertions on θ.

◮ Large-sample theory: It is really on the case when uncertainty goes away;thinking about both normality and vanishing variances in very-high-dimensionalproblems.

◮ Robust Bayesian: The ‘worst case scenario’ thinking ultimately leads theBayesian to a non-Bayesian school.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Intelligence and uncertaintyProbability modelsStatistical modelsExisting schools of thought

Existing schools of thought

◮ Bayes: for it to work, it really requires valid priors.

◮ Fiducial: it is very interesting. It is wrong (but better than Bayes[?]).

◮ Dempster-Shafer: as an extension of both Bayes and fiducial, itrequires valid independent individual components that areprobabilistically meaningful.

For example, individual components are specified with fiducial probabilities.

◮ Frequentist: starting with specified rules and criteria, it invites the“guess and check” approach to uncertain inference. If so, is it veryappealing?

For example, 24+ methods for 2x2 tables and penalty-based methods.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Intelligence and uncertaintyProbability modelsStatistical modelsExisting schools of thought

Remarks

◮ These existing methods are useful.

◮ All these schools of thought fail for many “benchmark”examples, such as, the many-normal-means,Behrens-Fisher, and constrained parameter problems.

◮ Thinking outside the box may be necessary for newgenerations.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

The likelihood insufficiency principle

◮ Likelihood alone is not sufficient for probabilistic inference.

◮ An unobserved but predictable quantity called the auxiliary(a)-variable, must be introduced for predictive/probabilisticinference.

Remark: Bayes makes θ predictable. Is it credible/valid?

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

The “No Validity, No Probability” principle?

◮ Notation: denote by Px(A) the probability for the truth of Agiven the observed data x .

◮ Definition (validity). An inferential framework is said to be valid if ∀A ⊂ Θ,PX (A), as a function of X , satisfies

PX (A)stochastically

≤ Unif (0, 1)

under the falsity of A, i.e., under the truth of Ac , the negation of A.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

The Inferential Model (IM) framework

IM is valid and consists of three steps:

Association-step: Associate X and θ with an a-variable z to obtain the mapping

ΘX (z) ⊆ Θ (z ∼ πz)

consisting of candidate values of θ given X and z .

Prediction-step: Predict z with a credible predictive random set (PRS) Sθ, i.e.,

P (Sθ 6∋ z) ≤ Unif (0, 1), where z ∼ πz .

Combination-step: Combine x and Sθ to obtain Θx (Sθ) = ∪z∈SθΘx (z) and

compute evidence

ex (A) = P (Θx (Sθ) ⊆ A) and ex (Ac ) = P (Θx (Sθ) ⊆ Ac )

with ex (A) = 1 − ex (Ac ) called plausibility.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

X ∼ N(θ, 1)

A-step. X = θ + z , where z ∼ N(0, 1).P-step. Sθ = [−|Z |, |Z |], where Z ∼ N(0, 1).C-step. ex(A) and ex(A

c) with Θx(Sθ) = [x − |Z |, x + |Z |].

Example

−2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

θ0

e x( θ 0

)

Figure: Plausibility of assertion A = {θ : θ = θ0}, indexed by θ0, given x = 1.96.Note ex (θ0) = 0.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

X ∼ Binomial(n, θ)

This is a homework problem for Stat 598D.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

Efficiency

See Stat 598D lecture notes on Statistical Inference.Let b(z) be a continuous function and define

S = {z : b(z) ≤ b(Z )} (Z ∼ πz).

ThenP (S 6∋ z) ∼ Unif (0, 1) (z ∼ πz).

We can use this result to construct credible PRS.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

Combining information: Conditional IMs

Example (A textbook example)Consider the association model

Xi = θ + zi (ziiid∼ N(0, 1), i = 1, ...,n).

WriteX̄ = θ + z̄ and Xi − X̄ = (zi − z̄) (i = 1, ...,n).

Predict z̄ conditional on the observed a-quantities {(zi − z̄)}n1 . This leads to simplified

conditional IM:

A-step. X̄ = θ + 1√nu, where u ∼ N(0, 1).

P-step. S = [−|U|, |U|], where U ∼ N(0, 1).

C-step. Θx (S) = [X̄ − |U|/√n, X̄ + |U|/√n].

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

Efficient inference: Marginal IMs

Example (Another textbook example)Consider the association model

Xi = η + σzi (ziiid∼ N(0, 1), i = 1, ...,n).

Let θ = (η, σ2) ∈ Θ = R × R+ and write

X̄ = η + σz̄ , s2x = σ2s2

z , and (X − X̄1)/sx = (z − z̄1)/sz .

Predict z̄ and s2z conditional on the observed a-quantities (z − z̄1)/sz . This leads to

simplified conditional IM:

A-step. X̄ = η + sx√nu and s2

x = σ2s2z , where u ∼ tn−1(0, 1) ⊥ s2

z ∼ χ2n−1.

P-step. S = [−|U|, |U|] × [0,∞], where U ∼ tn−1(0, 1).

C-step. Θx (S) = [X̄ − |U|sx/√

n, X̄ + |U|sx/√

n] × [0,∞].

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

A valid probabilistic inference frameworkTwo simple examplesPredictive random setsOne sample test

Model selection via AI (or by AS — Artificial Statistician)?

Consider choosing a model from a collection of models, including,e.g., normal for simplicity (and efficiency) and non-parametric forrobustness.

See Jianchun Zhang’s PhD thesis for an IM-based method.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

2 × 2 tables

Example (Kidney stone treatment, Steven et al (1994))

Table 1. Small Stones⊕? Table 2. Large Stones

Treatment Success Failure Treatment Success FailureA 81 6 A 192 71B 234 26 B 55 25

For making intelligent decision, there are (at least) two things to consider.

Prediction: Conditional on the Stone type.

Estimation: Combining data if possible.

Thus, check the homogeneity of each of the two tables

Table 3. Treatment A & Table 4. Treatment BStone type Success Failure Stone type Success Failure

Small 81 6 Small 234 26Large 192 71 Large 55 25

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

Evidence for and against homogeneity of treatments

For each of Table 3 and Table 4, compute

1. e(homogeneous),

2. e(homogeneous), and

3. 95% plausibility interval for the odd ratio.

Remarks.

1. Simpson’s paradox is related more to wrong statistical analysis, i.e.,modeling, than to inferential method(?) How can this be done in AI?

2. Some relevant statistical thoughts◮ Increase precision of prediction via conditioning, and◮ Increase precision of estimation via pooling.

Can some basics like these be integrated into AI?

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

Numerical results

Figure: Plausibilities for log odd ratios Tables 3 and 4, which shows that poolingmakes no sense in this example.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

Comparing two normal means with unknown variances

This is a common textbook, controversial, and practically usefulexample (Bayes and fiducial do not work well); See Martin,Hwang, and Liu (2010b).

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

Many-normal-means

The association model:

Xi = µi + zi (ziiid∼ N(0, 1), i = 1, ..., n).

The problem of interest is to infer ‖µ‖.

A very important example for understanding inference. (Bayesand fiducial do not work); See Martin, Hwang, and Liu (2010b).

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

Many-normal-means

The usual model for the observable X1, ...,Xn:

µiiid∼ N(θ, σ2) (i = 1, ...,n)

and

Xi |µ ind∼ N(µi , s2i ) (i = 1, ...,n)

with known positive s21 , ..., s2

n , where µ = (µ1, ..., µn) and (θ, σ2) ∈ R × R+ unknown.Here, we are interested in inference about σ2.

Since there are really meaningful prior knowledge in practice, it has been tremendous

interest on choosing Bayesian priors.

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

Many-normal-means

The sampling model for the observable quantity is

Xiind∼ N(θ, σ2 + s2

i ) (i = 1, ...,n)

For simplicity to motivate ideas, consider the case with known θ = 0, that is,

Xiind∼ N(0, σ2 + s2

i ) (i = 1, ...,n)

An association model is given by

nX

i=1

X 2i

σ2 + s2i

= V

and"

nX

i=1

X 2i

σ2 + s2i

#−1/20

B

@

X1q

σ2 + s2i

, ...,Xn

p

σ2 + s2n

1

C

A= U,

where V ∼ χ2n ⊥ U ∼ Unif (On).

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence

Artificial IntelligenceProbabilistic Inference

Inferential ModelsBenchmark Problems

Simpson’s paradoxThe Behrens-Fisher problemStein’s paradoxA meta-analysis problem

Many-normal-means

Specify the predictive random set, which predicts u⋆ alone,

S = {(v , u) : |Fn(v) − .5| ≤ |Fn(V ) − .5|}

This is a constrained parameter inference problem.

Remark. Validity is not a problem, but efficient inference is not straightforward. Itrequires to consider Generalized Conditional IMs — a challenging topic underinvestigation!

Chuanhai Liu1 Uncertain Inference and Artificial Intelligence