statistical inference from dependent observationsย ยท pr uิฆ=๐=exp(ฯ ๐t ๐...
TRANSCRIPT
![Page 1: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/1.jpg)
Statistical Inference from Dependent Observations
Constantinos DaskalakisEECS and CSAIL, MIT
![Page 2: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/2.jpg)
Supervised Learning SettingGiven:
โข Training set ๐ฅ๐ , ๐ฆ๐ ๐=1๐ of examples
โข ๐ฅ๐: feature (aka covariate) vector of example ๐
โข ๐ฆ๐: label (aka response) of example ๐
โข Assumption: ๐ฅ๐ , ๐ฆ๐ ๐=1๐ ~๐๐๐ ๐ท where ๐ท is unknown
โข Hypothesis class โ = โ๐ ๐ โ ฮ} of (randomized) responsesโข e.g. โ = ๐,โ ๐ 2 โค 1},
โข e.g.2 โ = {๐ ๐๐๐ ๐๐๐๐ ๐ ๐๐ ๐ท๐๐๐ ๐๐๐ข๐๐๐ ๐๐๐ก๐ค๐๐๐๐ }
โข generally, โ๐ might be randomized
โข Loss function: โ:๐ด ร ๐ด โ โ
Goal: Select ๐ to minimize expected loss: min๐
๐ผ ๐ฅ,๐ฆ โผ๐ท โ(โ๐(๐ฅ), ๐ฆ)
Goal 2: In realizable setting (i.e. when, under ๐ท, ๐ฆ โผ โ๐โ(๐ฅ)), estimate ๐โ
, โdogโ
โฆ
, โcatโ
![Page 3: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/3.jpg)
E.g. Linear Regression
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and responses ๐ฆ๐ โ โ, ๐ = 1,โฆ , ๐Assumption: for all ๐, ๐ฆ๐ sampled as follows:
Goal: Infer ๐
![Page 4: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/4.jpg)
E.g.2 Logistic Regression
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and binary responses ๐ฆ๐ โ {ยฑ1}Assumption: for all ๐, ๐ฆ๐ sampled as follows:
Goal: Infer ๐
![Page 5: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/5.jpg)
Maximum Likelihood Estimator (MLE)
In the standard linear and logistic regression settings, under mild assumptions about the design matrix ๐, whose rows are the covariate vectors, MLE is strongly consistent.
MLE estimator แ๐ satisfies: แ๐ โ ๐2= ๐
๐
๐
![Page 6: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/6.jpg)
โข Recall: โข Assumption: ๐1, ๐1 , โฆ , ๐๐, ๐๐ โผ๐๐๐ ๐ท, where ๐ท unknownโข Goal: choose โ โ โ to minimize expected loss under some loss function โ, i.e.
minโโโ
๐ผ ๐ฅ,๐ฆ โผ๐ท โ(โ(๐ฅ), ๐ฆ)
โข Let โ โ โ be the Empirical Risk Minimizer, i.e.
โ โ argminโโโ
1
๐
๐
โ(โ(๐ฅ๐), ๐ฆ๐)
โข Then:
Loss๐ท โ โค infโโโ
Loss๐ท โ + ๐ ๐๐ถ โ /๐ , for Boolean โ, 0-1 loss โ
Loss๐ท โ โค infโโโ
Loss๐ท โ + ๐ ๐ โ๐(โ) , for general โ, ๐-Lipschitz โ
Beyond Realizable Setting: Learnability
โฆ
Loss๐ท โ
![Page 7: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/7.jpg)
Supervised Learning Setting on Steroids
But what do these results really mean?
![Page 8: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/8.jpg)
Broader Perspectiveโ ML methods commonly operate under stringent assumptions
o train set: independent and identically distributed (i.i.d.) samples
o test set: i.i.d. samples from same distribution as training
โ Goal: relax standard assumptions to accommodate two important challenges
o (i) censored/truncated samples and (ii) dependent samples
o Censoring/truncation โ systematic missing of data
โ train set โ test set
o Data Dependencies โ peer effects, spatial, temporal dependencies
โ no apparent source for independence
![Page 9: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/9.jpg)
Todayโs Topic: Statistical Inference from Dependent Observations
independent dependent
![Page 10: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/10.jpg)
Observations ๐ฅ๐ , ๐ฆ๐ ๐ are commonly collected on some spatial domain, some temporal domain, or on a social network
As such, they are not independent, but intricately dependent
Why dependent?
e.g. spin glassesโข neighboring particles influence each other
- +
++
+ +
-+
+ +
++
- +
--
e.g. social networksdecisions/opinions of nodes are influenced by the decisions of their neighbors (peer effects)
![Page 11: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/11.jpg)
Statistical Physics and Machine Learning
โข Spin Systems [Isingโ25]
โข MRFs, Bayesian Networks, Boltzmann Machines โข Probability Theory
โข MCMC
โข Machine Learning
โข Computer Vision
โข Game Theory
โข Computational Biology
โข Causality
โข โฆ
- +
++
+ +
-+
+ +
++
- +
--
![Page 12: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/12.jpg)
Peer Effects on Social Networks
Several studies of peer effects, in applications such as:โข criminal networks [Glaeser et alโ96]โข welfare participation [Bertrand et alโ00]โข school achievement [Sacerdoteโ01]โข participation in Retirement Plans [Duflo-Saezโ03]โข obesity [Trogdon et alโ08, Christakis-Fowlerโ13]
AddHealth Dataset: โข National Longitudinal Study of Adolescent Healthโข National study of students in grades 7-12โข friendship networks, personal and school life, age,
gender, race, socio-economic background, health,โฆ
MicroEconomics: โข Behavior/Opinion Dynamics e.g.
[Schellingโ78], [Ellisonโ93], [Youngโ93, โ01], [Montanari-Saberiโ10],โฆ
Econometrics: โข Disentangling individual from network
effects [Manskiโ93], [Bramoulle-Djebbari-Fortinโ09], [Li-Levina-Zhuโ16],โฆ
![Page 13: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/13.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observations
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 14: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/14.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observations
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 15: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/15.jpg)
Standard Linear Regression
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and responses ๐ฆ๐ โ โ, ๐ = 1,โฆ , ๐Assumption: for all ๐, ๐ฆ๐ sampled as follows:
Goal: Infer ๐
![Page 16: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/16.jpg)
Standard Logistic Regression
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and binary responses ๐ฆ๐ โ {ยฑ1}Assumption: for all ๐, ๐ฆ๐ sampled as follows:
Goal: Infer ๐
![Page 17: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/17.jpg)
Maximum Likelihood Estimator
In the standard linear and logistic regression settings, under mild assumptions about the design matrix ๐, whose rows are the covariate vectors, MLE is strongly consistent.
MLE estimator แ๐ satisfies: แ๐ โ ๐2= ๐
๐
๐
![Page 18: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/18.jpg)
Standard Linear Regression
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and responses ๐ฆ๐ โ โ, ๐ = 1,โฆ , ๐Assumption: for all ๐, ๐ฆ๐ sampled as follows:
Goal: Infer ๐
![Page 19: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/19.jpg)
Linear Regression w/ Dependent Samples
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and responses ๐ฆ๐ โ โ, ๐ = 1,โฆ , ๐Assumption: for all ๐, ๐ฆ๐ sampled as follows conditioning on ๐ฆโ๐:
Parameters: โข coefficient vector ๐, inverse temperature ๐ฝ (unknown)โข Interaction matrix ๐ด (known)
Goal: Infer ๐, ๐ฝ
![Page 20: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/20.jpg)
Standard Logistic Regression
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and binary responses ๐ฆ๐ โ {ยฑ1}Assumption: for all ๐, ๐ฆ๐ sampled as follows:
Goal: Infer ๐
![Page 21: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/21.jpg)
Logistic Regression w/ Dependent Samples(todayโs focus)
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and binary responses ๐ฆ๐ โ {ยฑ1}Assumption: for all ๐, ๐ฆ๐ sampled as follows conditioning on ๐ฆโ๐:
Parameters: โข coefficient vector ๐, inverse temperature ๐ฝ (unknown)โข Interaction matrix ๐ด (known)
Goal: Infer ๐, ๐ฝ
![Page 22: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/22.jpg)
Input: ๐ feature vectors ๐ฅ๐ โ โ๐ and binary responses ๐ฆ๐ โ {ยฑ1}Assumption: ๐ฆ1, โฆ , ๐ฆ๐ are sampled jointly according to measure
Parameters: โข coefficient vector ๐, inverse temperature ๐ฝ (unknown)โข Interaction matrix ๐ด (known)
Goal: Infer ๐, ๐ฝ
Ising model w/ inverse temperature ๐ฝ and external fields ๐๐๐ฅ๐
Logistic Regression w/ Dependent Samples(todayโs focus)
Challenge: one sample, likelihood contains ๐
For ๐ฝ = 0 equivalent to Logistic regression
i.e. lack of LLN, hard to compute MLE
![Page 23: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/23.jpg)
Main Result for Logistic Regression from Dependent Samples
N.B. We show similar results for linear regression with peer effects.
![Page 24: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/24.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observationsโข Proof Ideas
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 25: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/25.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observationsโข Proof Ideas
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 26: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/26.jpg)
MPLE instead of MLE
โข Likelihood involves ๐ = ๐๐,๐ฝ (the partition function), so is non-trivial to compute.
โข [Besagโ75,โฆ,Chatterjeeโ07] studies the maximum pseudolikelihood estimator (MPLE)
โข LogPL does not contain ๐ and is concave. Is MPLE consistent?
โข [Chatterjeeโ07]: yes, when ๐ฝ > 0, ๐ = 0
โข [BMโ18,GMโ18]: yes, when ๐ฝ > 0, ๐ โ 0 โ โ, ๐ฅ๐ = 1, for all ๐
โข General case?
Problem: Given: โข ิฆ๐ฅ = ๐ฅ1, โฆ , ๐ฅ๐ and โข ิฆ๐ฆ = ๐ฆ1, โฆ , ๐ฆ๐ โ ยฑ1 ๐ sampled as
Pr ิฆ๐ฆ = ๐ =exp(ฯ๐ ๐
T๐ฅ๐ ๐๐+๐ฝ๐T๐ด๐)
๐
Infer ๐, ๐ฝ
๐๐ฟ ๐ฝ, ๐ โ ฯ๐ Pr๐ฝ,๐,๐ด
[๐ฆ๐|๐ฆโ๐]1/๐
= ฯ๐1
1+exp โ2(๐๐๐ฅ๐+๐ฝ๐ด๐๐๐ฆ
1/๐
![Page 27: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/27.jpg)
Analysis
Problem: Given: โข ิฆ๐ฅ = ๐ฅ1, โฆ , ๐ฅ๐ and โข ิฆ๐ฆ = ๐ฆ1, โฆ , ๐ฆ๐ โ ยฑ1 ๐ sampled as
Pr ิฆ๐ฆ = ๐ =exp(ฯ๐ ๐
T๐ฅ๐ ๐๐+๐ฝ๐T๐ด๐)
๐
Infer ๐, ๐ฝ
๐๐ฟ ๐ฝ, ๐ โ เท
๐
Pr๐ฝ,๐,๐ด
[๐ฆ๐|๐ฆโ๐]
1/๐
![Page 28: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/28.jpg)
Analysis
Problem: Given: โข ิฆ๐ฅ = ๐ฅ1, โฆ , ๐ฅ๐ and โข ิฆ๐ฆ = ๐ฆ1, โฆ , ๐ฆ๐ โ ยฑ1 ๐ sampled as
Pr ิฆ๐ฆ = ๐ =exp(ฯ๐ ๐
T๐ฅ๐ ๐๐+๐ฝ๐T๐ด๐)
๐
Infer ๐, ๐ฝ
โข Concentration: gradient of log-pseudolikelihood at truth should be small; โข show ๐ปlog๐๐ฟ ๐0, ๐ฝ0 2
2 is ๐ ๐/๐ with probability at least 99%
โข Anti-concentration: Constant lower bound on min๐,๐ฝ
๐๐๐๐(โ๐ป2 log ๐๐ฟ (๐, ๐ฝ)) w.pr. โฅ 99%
๐๐ฟ ๐ฝ, ๐ โ เท
๐
Pr๐ฝ,๐,๐ด
[๐ฆ๐|๐ฆโ๐]
1/๐
![Page 29: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/29.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observationsโข Proof Ideas
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 30: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/30.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observationsโข Proof Ideas
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 31: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/31.jpg)
Supervised LearningGiven:
โข Training set ๐ฅ๐ , ๐ฆ๐ ๐=1๐ of examples
โข Hypothesis class โ = โ๐ ๐ โ ฮ} of (randomized) responses
โข Loss function: โ:๐ด ร ๐ด โ โ
Assumption: ๐ฅ๐ , ๐ฆ๐ ๐=1๐ ~๐๐๐ ๐ท where ๐ท is unknown
Goal: Select ๐ to minimize expected loss: min๐
๐ผ ๐ฅ,๐ฆ โผ๐ท โ(โ๐(๐ฅ), ๐ฆ)
Goal 2: In realizable setting (i.e. when, under ๐ท, ๐ฆ โผ โ๐โ(๐ฅ)), estimate ๐โ
, โdogโ
โฆ
, โcatโ
![Page 32: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/32.jpg)
Given:
โข Training set ๐ฅ๐ , ๐ฆ๐ ๐=1๐ of examples
โข Hypothesis class โ = โ๐ ๐ โ ฮ} of (randomized) responses
โข Loss function: โ:๐ด ร ๐ด โ โ
Assumption: ๐ฅ๐ , ๐ฆ๐ ๐=1๐ ~๐๐๐ ๐ท where ๐ท is unknown
Goal: Select ๐ to minimize expected loss: min๐
๐ผ ๐ฅ,๐ฆ โผ๐ท โ(โ๐(๐ฅ), ๐ฆ)
Goal 2: In realizable setting (i.e. when, under ๐ท, ๐ฆ โผ โ๐โ(๐ฅ)), estimate ๐โ
, โdogโ
โฆ
, โcatโ
Supervised Learning w/ Dependent Observations
![Page 33: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/33.jpg)
Supervised Learning w/ Dependent ObservationsGiven:
โข Training set ๐ฅ๐ , ๐ฆ๐ ๐=1๐ of examples
โข Hypothesis class โ = โ๐ ๐ โ ฮ} of (randomized) responses
โข Loss function: โ:๐ด ร ๐ด โ โ
Assumptionโ: a joint distribution ๐ท samples training examples andunknown test sample jointly, i.e.
โข ๐ฅ๐ , ๐ฆ๐ ๐=1๐ โช (๐ฅ, ๐ฆ) โผ ๐ท
Goal: select ๐ to minimize: min๐
๐ผ โ(โ๐(๐ฅ), ๐ฆ)
, โdogโ
โฆ
, โcatโ
![Page 34: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/34.jpg)
Main result
Learnability is possible when joint distribution ๐ท over training set and test set satisfies Dobrushinโs condition.
![Page 35: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/35.jpg)
Dobrushinโs condition
โข Given a joint probability vector าง๐ = ๐1, โฆ , ๐๐ define the influence of variable ๐ to variable ๐: โข โworst effect of ๐๐ to the conditional distribution of ๐๐ given ๐โ๐โ๐โ
โข Inf๐โ๐ = sup๐ง๐,๐ง๐
โฒ,๐งโ๐โ๐
๐๐๐ Pr ๐๐ โฃ ๐ง๐ , ๐งโ๐โ๐ , Pr ๐๐ โฃ ๐ง๐โฒ, ๐งโ๐โ๐
โข Dobrushinโs condition:โข โall nodes have limited total influence exerted on themโ
โข For all ๐, ฯ๐โ ๐ Inf๐โ๐ < 1.
Implies: Concentration of measureFast mixing of Gibbs sampler
![Page 36: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/36.jpg)
Learnability under Dobrushinโs condition
โข Suppose: โข ๐1, ๐1 , โฆ , ๐๐, ๐๐ , (๐, ๐) โผ ๐ท, where ๐ท is joint distributionโข ๐ท: satisfies Dobrushinโs conditionโข ๐ท: has the same marginals
โข Define Loss๐ท โ = ๐ผ โ(โ(๐ฅ), ๐ฆ)
โข There exists a learning algorithm which, given ๐1, ๐1 , โฆ , ๐๐, ๐๐ , outputs โ โ โ such that
Loss๐ท โ โค infโโโ
Loss๐ท โ + เทจ๐ ๐๐ถ โ /๐ for Boolean โ, 0-1 loss โ
Loss๐ท โ โค infโโโ
Loss๐ท โ + เทจ๐ ๐ ๐ข๐(โ) , for general โ, ๐-Lipschitz โ
(need stronger than Dobrushin condition for general โ)
Essentially the same bounds as i.i.d
![Page 37: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/37.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observationsโข Proof Ideas
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 38: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/38.jpg)
Menu
โข Motivation
โข Part I: Regression w/ dependent observationsโข Proof Ideas
โข Part II: Statistical Learning Theory w/ dependent observations
โข Conclusions
![Page 39: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/39.jpg)
Goal: relax standard assumptions to accommodate two important challenges
o (i) censored/truncated samples and (ii) dependent samples
o Censoring/truncation โ systematic missing of data
โ train set โ test set
o Data Dependencies โ peer effects, spatial, temporal dependencies
โ no apparent source for independence
Goals of Broader Line of Work
![Page 40: Statistical Inference from Dependent Observationsย ยท Pr Uิฆ=๐=exp(ฯ ๐T ๐ +๐ฝ๐T๐ด๐) ๐ Infer ๐,๐ฝ โข Concentration: gradient of log-pseudolikelihood at](https://reader036.vdocument.in/reader036/viewer/2022070121/5fcc67867164973f2206cfdb/html5/thumbnails/40.jpg)
o Regression on a network (linear and logistic, ๐/๐ rates):
Constantinos Daskalakis, Nishanth Dikkala, Ioannis Panageas:Regression from Dependent Observations.In the 51st Annual ACM Symposium on the Theory of Computing (STOCโ19).
o Statistical Learning Theory Framework (learnability and generalization bounds):
Yuval Dagan, Constantinos Daskalakis, Nishanth Dikkala, Siddhartha Jayanti:Generalization and learning under Dobrushin's condition.In the 32nd Annual Conference on Learning Theory (COLTโ19).
Statistical Estimation from Dependent Observations
Thank you!