Download - Universal Off-Policy Evaluation
Universal Off-Policy Evaluation
Xuhui Liu
Nanjing University
June 18, 2021
Yash Chandak et al.โUniversal Off-Policy Evaluation.โ CoRR, 2021.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.1/50
Table of Contents
Introduction
Universal Off-Policy Evaluation
High-Confidence Bounds
Experiment Results
Analysis
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.2/50
Table of Contents
Introduction
Universal Off-Policy Evaluation
High-Confidence Bounds
Experiment Results
Analysis
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.3/50
Off-Policy Evaluation
Definition: The problem of evaluating a new strategy for behavior, or policy, usingonly observations collected during the execution of another policy.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.4/50
Motivation
Want to evaluate new method without incurring the risk and cost of actuallyimplementing this new method/policy.Existing logs containing huge amounts of historical data based on existing policies.It makes economical sense to, if possible, use these logs.It makes economical sense to, if possible, not risk the loss of testing out a newpotentially bad policy.Online ad placement is a good example.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.5/50
Importance SamplingNaive importance sampling
๐ฝ (๐๐) = ๐ผ๐โผ๐๐ฝ(๐) [ ๐๐(๐)๐๐ฝ(๐)
๐ปโ๐ก=0
๐พ๐ก๐(s, a)]
= ๐ผ๐โผ๐๐ฝ(๐) [(๐ป
โ๐ก=0
๐๐ (a๐ก โฃ s๐ก)๐๐ฝ (a๐ก โฃ s๐ก)
)๐ป
โ๐ก=0
๐พ๐ก๐(s, a)] โ๐
โ๐=1
๐ค๐๐ป
๐ปโ๐ก=0
๐พ๐ก๐๐๐ก
where ๐ค๐๐ก = 1
๐ฮ ๐ก๐กโฒ=0
๐๐ (๐๐๐กโฒ โฃ ๐ ๐
๐กโฒ)๐๐ฝ (๐๐
๐กโฒ โฃ ๐ ๐๐กโฒ)
Weighted importance sampling
๐ค๐๐ก = 1
๐ฮ ๐ก๐กโฒ=0
๐๐ (๐๐๐กโฒ โฃ ๐ ๐
๐กโฒ)๐๐ฝ (๐๐
๐กโฒ โฃ ๐ ๐๐กโฒ)
โ ๐ค๐๐ก = 1
โ๐๐=1 ๐ค๐
๐กฮ ๐ก
๐กโฒ=0๐๐ (๐๐
๐กโฒ โฃ ๐ ๐๐กโฒ)
๐๐ฝ (๐๐๐กโฒ โฃ ๐ ๐
๐กโฒ)
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.7/50
Statistical Quantities
Consider a random variable ๐, the cumulative distribution function and probabilitydistribution function of which are ๐น(๐ฅ) and ๐(๐ฅ), respectively.
Mean ๐ผ[๐].Quantile quantile๐ผ(๐) = ๐น โ1(๐ผ).Value at Risk (VaR) VaR๐ผ(๐) = quantile๐ผ.Conditional Value at Risk (CVaR) CVaR๐ผ(๐) = ๐ผ[๐|๐ โค quantile๐ผ(๐)].Variance ๐ผ[(๐ โ ๐ผ๐)2].Entropy ๐ป(๐) = โซ ๐(๐ฅ) log ๐(๐ฅ)๐๐ฅ.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.8/50
Limitations
Safety critical applications, e.g., automated healthcare.Risk-prone metrics: Value at risk (VaR) and conditional value at risk(CVaR).
Applications like online recommendations are subject to noisy dataRobust metrics: Median and other quantiles.
Applications involving direct human-machine interaction, such as robotics andautonomous driving.Uncertainty metrics: Variance and entropy.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.10/50
How do we develop a universal off-policy methodโone that can estimate any desiredperformance metrics and can also provide high-confidence bounds that holdsimultaneously with high probability for those metrics?
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.11/50
Table of Contents
Introduction
Universal Off-Policy Evaluation
High-Confidence Bounds
Experiment Results
Analysis
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.12/50
Notations
Partial Observable Markov Decision Process (POMDP) (๐ฎ, ๐ช, ๐, ๐ซ, ฮฉ, โ, ๐พ, ๐0).๐ is the data set (๐ป๐)๐
๐=1 collected using behavior policies (๐ฝ๐)๐๐=1, where ๐ป๐ is
the observed history (๐0, ๐ด0, ๐ฝ(๐ด0|๐0), ๐ 0, ๐1, โฆ ).๐บ๐ โถ= โ๐
๐=0 ๐พ๐๐ ๐ is the return for ๐ป๐.๐ is the horizon length.๐บ๐ and ๐ป๐ is the random variables for return s and comlete trajectories under anypolicy ๐, respectively.๐(โ) is the return for trajectory โ.โ๐ is the set of all possible trajectories for policy ๐.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.13/50
Assumptions
Assumption 1The set ๐ contains independent (not necessarily identically distributed) historiesgenerated using (๐ฝ๐)๐
๐=1, along with the probability of the actions chosen, such that forsome ๐ > 0, (๐ฝ๐(๐|๐) < ๐) โน (๐(๐|๐) = 0), for all ๐ โ ๐ช, ๐ โ ๐, and ๐ โ (1, โฆ , ๐).
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.14/50
Method
First estimate its cumulative distribution ๐น๐.Then use it to estimate its parameter ๐(๐น๐).
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.15/50
Method
Represent ๐น๐ with return ๐บ๐.
๐น๐(๐) = Pr (๐บ๐ โค ๐) = โ๐ฅโ๐ณ,๐ฅโค๐
Pr (๐บ๐ = ๐ฅ) = โ๐ฅโ๐ณ,๐ฅโค๐
( โโโโ๐
Pr (๐ป๐ = โ) ๐{๐(โ)=๐ฅ})
Exchange the order of Sum.
๐น๐(๐) = โโโโ๐
Pr (๐ป๐ = โ) โ๐ฅโ๐ณ,๐ฅโค๐
๐{๐(โ)=๐ฅ} = โโโโ๐
Pr (๐ป๐ = โ) (๐{๐(โ)โค๐})
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.16/50
Method
From Assumption 1 โ๐ฝ, โ๐ โ โ๐ฝ,
๐น๐(๐) = โโโโ๐ฝ
Pr (๐ป๐ = โ) (๐{๐(โ)โค๐}) = โโโโ๐ฝ
Pr (๐ป๐ฝ = โ) Pr (๐ป๐ = โ)Pr (๐ป๐ฝ = โ) (๐{๐(โ)โค๐})
Let ๐๐ โถ= ฮ ๐๐=0
๐(๐ด๐|๐๐)๐ฝ๐(๐ด๐|๐๐) , which is equal to Pr(๐ป๐ = โ)/Pr(๐ป๐ฝ = โ).
โ๐ โ โ, ๐น๐(๐) โถ= 1๐
๐โ๐=1
๐๐๐{๐บ๐โค๐}
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.17/50
Partial Observable Setting
๐ช, ๐ช: Observation set for the behavior policy and the evaluation policy.If ๐ช = ๐ช = ๐ฎ, it becomes MDP setting.If ๐ช = ๐ช, as ๐ฝ(๐|๐) = ๐ฝ(๐| ๐), one can use density estimation on the availabledata, ๐, to construct an estimator ๐ฝ(๐|๐) of Pr(๐| ๐) = ๐ฝ(๐| ๐).
๐ช โ ๐ช, it is only possible to estimate Pr(๐| ๐) = โ๐ฅโ๐ ๐ฝ(๐|๐ฅ)Pr(๐ฅ| ๐) throughdensity estimation using data ๐.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.19/50
Probability Distribution and Inverse CDF
Let (๐บ(๐))๐๐=1 be the order statistics for samples (๐บ๐)๐
๐=1 and ๐บ0 โถ= ๐บmin.
Inverse CDF๐น โ1๐ (๐ผ) โถ= min {๐ โ (๐บ(๐))
๐๐=1
โฃ ๐น๐(๐) โฅ ๐ผ}
Probability distributiond ๐น๐ (๐บ(๐)) โถ= ๐น๐ (๐บ(๐)) โ ๐น๐ (๐บ(๐โ1))
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.20/50
Estimator
๐๐ ( ๐น๐) โถ= โ๐๐=1 d ๐น๐ (๐บ(๐)) ๐บ(๐),
๐2๐ ( ๐น๐) โถ= โ๐
๐=1 d ๐น๐ (๐บ(๐)) (๐บ(๐) โ ๐๐ ( ๐น๐))2 ,๐๐ผ
๐ ( ๐น๐) โถ= ๐น โ1๐ (๐ผ),
CVaR๐ผ๐ ( ๐น๐) โถ= 1
๐ผ โ๐๐=1 d ๐น๐ (๐บ(๐)) ๐บ(๐)๐{๐บ(๐)โค๐๐ผ๐ ( ๐น๐) } .
โข Definition of CVaR:CVaR๐ผ
๐ (๐น๐) = ๐ผ [๐บ๐ โฃ ๐บ๐ โค ๐น โ1๐ (๐ผ)]
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.21/50
Table of Contents
Introduction
Universal Off-Policy Evaluation
High-Confidence Bounds
Experiment Results
Analysis
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.22/50
High-Confidence Bounds
1. Itโs easy to obtain bounds for a single point.2. Itโs hard hard to obtain bounds for a interval.3. CDF is monotonically non-decreasing.
Let (๐ ๐)๐พ๐=1 be any K โkey points at which we obtain confidence interval for
(๐น๐(๐ ๐))๐พ๐=1.
Generalize to whole interval based on these โkey pointsโ.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.23/50
Let CIโ (๐ ๐, ๐ฟ๐) and CI+ (๐ ๐, ๐ฟ๐) be the lower and the upper confidence bounds on๐น๐ (๐ ๐), respectively, such that
โ๐ โ (1, โฆ , ๐พ), Pr (CIโ (๐ ๐, ๐ฟ๐) โค ๐น๐ (๐ ๐) โค CI+ (๐ ๐, ๐ฟ๐)) โฅ 1 โ ๐ฟ๐
Based on this constuction, we formulate a lower function ๐นโ and an upper function ๐น+:
๐นโ(๐) โถ= { 1 if ๐ > ๐บmaxmax๐ ๐โค๐ ๐ถ๐ผโ (๐ ๐, ๐ฟ๐) otherwise
๐น+(๐) โถ= { 0 if ๐ < ๐บminmin๐ ๐โฅ๐ ๐ถ๐ผ+ (๐ ๐, ๐ฟ๐) otherwise
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.24/50
Bootstrap Bounds1
1B. Efron and R. J. Tibshirani. โAn introduction to the Bootstrapโ. CRC press, 1994..
.
...
..
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.28/50
Non-stationary
Assumption 3For any ๐, โ๐ค๐ โ โ๐, such that, โ๐ โ [1, ๐ฟ + โ], ๐น (๐)
๐ (๐) = ๐(๐)โค๐ค๐.
Estimating ๐น (๐)๐ can now be seen as a time-series forecasting problem.
Wild bootstrap2 provides approximate CIs.
2E. Mammen. โBootstrap and wild bootstrap for high dimensional linear models.โ The Annals ofStatistics, pages 255โ285, 1993.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.29/50
Table of Contents
Introduction
Universal Off-Policy Evaluation
High-Confidence Bounds
Experiment Results
Analysis
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.30/50
Setting
A simulated stationary and a non-stationary recommender system domain,where the userโs interest for a finite set of items is represented using thecorresponding itemโs reward.Type-1 Diabetes Mellitus Simulator (T1DMS) for the treatment of type-1diabetes.A continuous-state Gridworld with partial observability (which also makes thedomain non-Markovian in the observations), stochastic transitions, and eightdiscrete actions corresponding to up, down, left, right, and the four diagonalmovements.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.31/50
Experiments3
3P. Thomas, G. Theocharous, and M. Ghavamzadeh. โHigh confidence policy improvement.โICML, 2015.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.32/50
Experiments4
4Y. Chandak, S. Shankar, and P. S. Thomas. โHigh confidence off-policy (or counterfactual)variance estimation.โ AAAI, 2021.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.33/50
Table of Contents
Introduction
Universal Off-Policy Evaluation
High-Confidence Bounds
Experiment Results
Analysis
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.35/50
Theorem
Estimatorโ๐ โ โ, ๐น๐(๐) โถ= 1
๐๐
โ๐=1
๐๐๐{๐บ๐โค๐}
Theoretical guarantee
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.36/50
Part 1 UnbiasednessRecall that
๐น๐(๐) = โโโโ๐ฝ
Pr (๐ป๐ = โ) (๐{๐(โ)โค๐}) = โโโโ๐ฝ
Pr (๐ป๐ฝ = โ) Pr (๐ป๐ = โ)Pr (๐ป๐ฝ = โ) (๐{๐(โ)โค๐})
The probability of a trajectory under a policy ๐ with partial observations andnon-Markovian structure is
Pr (๐ป๐ = โ) = Pr (๐ 0) Pr (๐0 โฃ ๐ 0) Pr ( ๐0 โฃ ๐0, ๐ 0) Pr (๐0 โฃ ๐ 0, ๐0, ๐0; ๐)
ร๐ โ1โ๐=0
(Pr (๐๐ โฃ โ๐) Pr (๐ ๐+1 โฃ โ๐) Pr (๐๐+1 โฃ ๐ ๐+1, โ๐) Pr ( ๐๐+1 โฃ ๐ ๐+1, ๐๐+1, โ๐)
ร Pr (๐๐+1 โฃ ๐ ๐+1, ๐๐+1, ๐๐+1, โ๐; ๐)) Pr (๐๐ โฃ โ๐ )
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.37/50
The ratio can be written as
Pr (๐ป๐ = โ)Pr (๐ป๐ฝ = โ) = Pr (๐0 โฃ ๐ 0, ๐0, ๐0; ๐)
Pr (๐0 โฃ ๐ 0, ๐0, ๐0; ๐ฝ)๐ โ1โ๐=0
Pr (๐๐+1 โฃ ๐ ๐+1, ๐๐+1, ๐๐+1, โ๐; ๐)Pr (๐๐+1 โฃ ๐ ๐+1, ๐๐+1, ๐๐+1, โ๐; ๐ฝ)
=๐
โ๐=0
๐ (๐๐ โฃ ๐๐)๐ฝ (๐๐ โฃ ๐๐)
= ๐(โ)
Then we have๐น๐(๐) = โ
โโโ๐ฝ
Pr(๐ป๐ฝ = โ)๐(โ)(๐{๐(โ)โค๐}).
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.38/50
๐ผ๐ [ ๐น๐(๐)] = ๐ผ๐ [ 1๐
๐โ๐=1
๐๐ (๐{๐บ๐โค๐})]
= 1๐
๐โ๐=1
๐ผ๐ [๐๐ (๐{๐บ๐โค๐})]
= 1๐
๐โ๐=1
โโโโ๐ฝ๐
Pr (๐ป๐ฝ๐= โ) ๐(โ) (๐{๐(โ)โค๐})
(๐)= 1๐
๐โ๐=1
๐น๐(๐)
= ๐น๐(๐)
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.39/50
Part 2 Uniform Consistency
First we show pointwise consistency, i.e., for all ๐, ๐น๐(๐) a.s.โถ ๐น๐(๐).
Then we use this to establish uniform consistency.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.40/50
Kolmogorovโ Strong Law of Large Numbers
The strong law of large numbers states that the sample average converges almostsurely to the expexted value:
๏ฟฝ๏ฟฝ๐a.s.โถ ๐ when ๐ โ โ
if one of the following conditions is satisfied:1. The random variables are identically distributed;2. For each ๐, the variance of ๐๐ is finite, and
โโ๐=1
Var [๐๐]๐2 < โ
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.41/50
Let๐๐ โถ= ๐๐ (๐{๐บ๐โค๐})
By assumption 1, ๐ฝ(๐|๐) โฅ ๐ when ๐(๐| ๐) > 0. This implies the ratio is boundedabove, and hence ๐๐ are bounded above and have a finite variance.By Kolmogorovโs strong law of large numbers:
๐น๐(๐) = 1๐
๐โ๐=1
๐๐a.s.โ ๐ผ๐ [ 1
๐๐
โ๐=1
๐๐] = ๐น๐(๐)
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.42/50
Some extra notation to tackle discontinuities in CDF ๐น๐
๐น๐ (๐โ) โถ= Pr (๐บ๐ < ๐) = ๐น๐(๐) โ Pr (๐น๐ = ๐) , ๐น๐ (๐โ) โถ= 1๐
๐โ๐=1
๐๐ (๐{๐บ๐<๐})
Similarly, we have๐น๐ (๐โ) a.sโถ ๐น๐ (๐โ)
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.43/50
Let ๐1 > 0, and let ๐พ be any value more than 1/๐1. Let (๐ ๐)๐พ๐=0 be ๐พ key points,
๐บmin = ๐ 0 < ๐ 1 โค ๐ 2 โฆ โค ๐ ๐พโ1 < ๐ ๐พ = ๐บmax
which create ๐พ intervals such that for all ๐ โ (1, โฆ , ๐พ โ 1),
๐น๐ (๐ โ๐ ) โค ๐
๐พ โค ๐น๐ (๐ ๐)
Then by construction, if ๐ ๐โ1 < ๐ ๐,
๐น๐ (๐ โ๐ ) โ ๐น๐ (๐ ๐โ1) โค ๐
๐พ โ ๐ โ 1๐พ = 1
๐พ < ๐1.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.44/50
For any ๐, let ๐ ๐โ1 and ๐ ๐ be such that ๐ ๐โ1 โค ๐ < ๐ ๐. Then,
๐น๐(๐) โ ๐น๐(๐) โค ๐น๐ (๐ โ๐ ) โ ๐น๐ (๐ ๐โ1)
โค ๐น๐ (๐ โ๐ ) โ ๐น๐ (๐ โ
๐ ) + ๐1.
Similarly,๐น๐(๐) โ ๐น๐(๐) โฅ ๐น๐ (๐ ๐โ1) โ ๐น๐ (๐ โ
๐ )โฅ ๐น๐ (๐ ๐โ1) โ ๐น๐ (๐ ๐โ1) โ ๐1
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.45/50
Then, โ๐ โ โ,
๐น๐ (๐ ๐โ1) โ ๐น๐ (๐ ๐โ1) โ ๐1 โค ๐น๐(๐) โ ๐น๐(๐) โค ๐น๐ (๐ โ๐ ) โ ๐น๐ (๐ โ
๐ ) + ๐1,
Letฮ๐ โถ= max
๐โ(1โฆ๐พโ1){โฃ ๐น๐ (๐ ๐) โ ๐น๐ (๐ ๐)โฃ , โฃ ๐น๐ (๐ โ
๐ ) โ ๐น๐ (๐ โ๐ )โฃ}
By the pointwise convergence, we have
ฮ๐a.s.โ 0
and thus,โฃ ๐น๐(๐) โ ๐น๐(๐)โฃ โค ฮ๐ + ๐1
Finally, since the inequality holds for โ๐ โ โ and is valid for any ๐1 > 0, making๐1 โ 0 gives the desired result,
sup๐โโ
โฃ ๐น๐(๐) โ ๐น๐(๐)โฃ a.s.โถ 0
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.46/50
Variance Reduction
Inspired by weighted importance sampling
โ๐ โ โ, ๐น๐(๐) โถ= 1โ๐
๐=1 ๐๐(
๐โ๐=1
๐๐ (๐{๐บ๐โค๐})) .
Under Assumption 1, ๐น๐ may be biased but is a uniformly consistent estimator of๐น๐,
โ๐ โ โ, ๐ผ๐ [ ๐น๐(๐)] โ ๐น๐, sup๐โโ
โฃ ๐น๐(๐) โ ๐น๐(๐)โฃ a.s.โถ 0
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.47/50
Part 1 Biased: We prove this using a counter-example. Let ๐ = 1, so
โ๐ โ โ, ๐ผ๐ [ ๐น๐(๐)] = ๐ผ๐ โกโขโฃ
1โ1
๐=1 ๐๐(
1โ๐=1
๐๐๐{๐บ๐โค๐})โคโฅโฆ
= ๐ผ๐ [๐{๐บ1โค๐}](๐)= โ
โโโ๐ฝ1
Pr (๐ป๐ฝ1= โ) (๐{๐(โ)โค๐})
= ๐น๐ฝ1(๐)
โ ๐น๐(๐)
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.48/50
Part 2 Uniform Consistency: First, we will establish pointwise consistency, i.e., forany ๐, ๐น๐(๐) a.s.โ ๐น๐(๐), and then we will use this to establish uniform consistency, asrequired.
โ๐ โ โ, ๐น๐(๐) = 1โ1
๐=1 ๐๐(
1โ๐=1
๐๐๐{๐บ๐โค๐})
= ( 1๐
๐โ๐=1
๐๐)โ1
( 1๐
๐โ๐=1
๐๐๐{๐บ๐โค๐}) .
If both (lim๐โ๐1๐ โ๐
๐=1 ๐๐)โ1
and (lim๐โโ1๐ โ๐
๐=1 ๐๐๐{๐บ๐โค๐}) exist, then usingSlutskyโs theorem, โ๐ โ โ,
lim๐โโ
๐น๐(๐) = ( lim๐โโ
1๐
๐โ๐=1
๐๐)โ1
( lim๐โโ
1๐
๐โ๐=1
๐๐๐{๐บ๐โค๐})
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.49/50
Notice using Kolmogorovโs strong law of large numbers that the term in the firstparentheses will converge to the expected value of importance ratios, which equals one.Similarly, we know that the term in the second parentheses will converge to ๐น๐(๐)almost surely. Therefore,
โ๐ โ โ, ๐น๐(๐) a.s.โถ (1)โ1 (๐น๐(๐)) = ๐น๐(๐)
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.50/50