joint works with to memory-sample (cmu), ran raz avishay
TRANSCRIPT
![Page 1: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/1.jpg)
Extractor-Based Approach to Memory-Sample Tradeoffs
Joint Works with Pravesh Kothari (CMU), Ran Raz (Princeton) and Avishay Tal (UC Berkeley)
Sumegha Garg
(Harvard)
![Page 2: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/2.jpg)
β Increasing scale of learning problems β Many learning algorithms try to learn a
concept/hypothesis by modeling it as a neural network. The algorithm keeps in the memory some neural network and updates the weights when new samples arrive. Memory used is the size of the network (low if network size is small).
Learning under Memory Constraints?
![Page 3: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/3.jpg)
Brief Description for the Learning ModelA learner tries to learn π: π΄ β {0,1}, π β π» (hypothesis class)
from samples of the form (π!, π(π!)), π", π(π") , β¦
that arrive one by one
![Page 4: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/4.jpg)
[Shamirβ14], [SVWβ15]Can one prove unconditional strong lower bounds on the number of samples needed for learning under memory constraints?
(when samples are viewed one by one)
![Page 5: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/5.jpg)
Outline of the Talkβ’ The first example of such memory-sample tradeoff:
Parity learning [Razβ16]
β’ Spark in the field: Subsequent results
β’ Extractor-based approach to lower bounds [G-Raz-Talβ18]
β’ Shallow dive into the proofs
β’ Implications for refutation of CSPs [G-Kothari-Razβ20]
![Page 6: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/6.jpg)
Example: Parity Learning
![Page 7: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/7.jpg)
Parity Learningπ β# 0,1 $ is unknown
A learner tries to learn π = (π!, π", β¦ , π$) from
(π!, π!), π", π" , β¦ , (π%, π%), where β π‘,
π& β# 0,1 $ and π& =< π&, π > = β' π&' β π' πππ 2 (inner product mod 2)
![Page 8: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/8.jpg)
Parity LearningA learner tries to learn π = (π!, π", β¦ , π$) from
(π!, π!), π", π" , β¦ , (π%, π%), where β π‘,
π& β# 0,1 $ and π& = β' π&' β π' πππ 2
In other words, learner gets random binary linear equations in π!, π", . . , π$, one by one, and need to solve them
![Page 9: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/9.jpg)
Letβs Try Learning (n=5)π! + π( + π) = 1
![Page 10: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/10.jpg)
Letβs Try Learning (n=5)π" + π) + π* = 0
![Page 11: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/11.jpg)
Letβs Try Learning (n=5)π! + π" + π) = 1
![Page 12: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/12.jpg)
Letβs Try Learning (n=5)π" + π( = 0
![Page 13: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/13.jpg)
Letβs Try Learning (n=5)π! + π" + π) + π* = 0
![Page 14: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/14.jpg)
Letβs Try Learning (n=5)π! + π( = 1
![Page 15: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/15.jpg)
Letβs Try Learning (n=5)π" + π( + π) + π* = 1
![Page 16: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/16.jpg)
Letβs Try Learning (n=5)π! + π" + π* = 0
![Page 17: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/17.jpg)
Letβs Try Learning (n=5)π! + π" + π* = 0
π" + π( + π) + π* = 1
π! + π( = 1π! + π" + π) + π* = 0 π" + π( = 0
π! + π" + π) = 1π" + π) + π* = 0
π! + π( + π) = 1
π = (0,1,1,0,1)
![Page 18: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/18.jpg)
Parity Learnersβ’ Solve independent linear equations (Gaussian
Elimination)
πΆ(π) samples and πΆ(ππ) memory
β’ Try all possibilities of π
πΆ(π) memory but exponential number of samples
![Page 19: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/19.jpg)
Razβs Breakthrough β16Any algorithm for parity learning of size π requires either memory of π"/25 bits or exponential number of samples to learn
Memory-Sample Tradeoff
![Page 20: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/20.jpg)
Motivations from Other Fields Bounded Storage Cryptography: [Razβ16, GZβ19, VVβ16, KRTβ16, TTβ18, GZβ19, JTβ19, DTZβ20, GZβ21] Keyβs length: π Encryption/Decryption time: π
Unconditional security, if attackerβs memory size < ππ
ππ
Complexity Theory: Time-Space Lower Bounds have been studied in many models [BJSβ98, Ajtβ99, BSSVβ00, Forβ97, FLvMVβ05, Wilβ06,β¦]
![Page 21: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/21.jpg)
Subsequent Results
![Page 22: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/22.jpg)
Subsequent Generalizationsβ’ [KRTβ17]: Generalization to learning sparse paritiesβ’ [Razβ17, MMβ17, MTβ17, MMβ18, DKSβ19, GKLRβ21]:
Generalization to larger class of learning problems
![Page 23: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/23.jpg)
[Garg-Raz-Talβ18]Extractor-based approach to prove memory-sample tradeoffs for a large class of learning problemsImplied all the previous results + new lower boundsClosely follows the proof technique of [Razβ17]
[Beame-Oveis-Gharan-Yangβ18] independently proved related lower bounds
![Page 24: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/24.jpg)
Subsequent Generalizations[Sharan-Sidford-Valiantβ19]: Any algorithm performing linear regression over a stream of π-dimensional examples, uses a quadratic amount of memory or exhibits a slower rate of convergence than can be achieved without memory constraintFirst-order methods for learning may have provably slower rate of convergence
![Page 25: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/25.jpg)
More Related Workβ’ [GRTβ19] Learning under multiple passes over the
streamβ’ [DSβ18, AMNβ18, DGKRβ19, GKRβ20] Distinguishing, testing
under memory or communication constraintsβ’ [MTβ19, GLMβ20] Towards characterization of memory-
bounded learningβ’ [GRZβ20] Comments on quantum memory-bounded
learning
![Page 26: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/26.jpg)
Extractor-Based Approach to Lower Bounds [G-Raz-Talβ18]
![Page 27: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/27.jpg)
Learning Problem as a Matrixπ΄, πΉ : finite sets, π:π΄ΓπΉ β {β1,1} : a matrix
πΉ : concept class = 0,1 $
π΄ : possible samples = 0,1 $.
![Page 28: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/28.jpg)
Learning Problem as a Matrixπ:π΄ΓπΉ β {β1,1} : a matrix
π β# πΉ is unknown. A learner tries to learn π from a stream(π!, π!), π", π" , β¦ , π%, π% , where βπ‘ : π& β# π΄ and π& = π(π&, π)
π
π(π!, π)π!
π(π", π)π"
![Page 29: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/29.jpg)
Main TheoremAssume that any submatrix of π of at least 2/0|π΄| rows and at least 2/β|πΉ| columns, has a bias of at most 2/2. Then,Any learning algorithm requires either π΄(ππ)memory bits or ππ΄(π)samples
2#$|π΄|
2#%|πΉ|
![Page 30: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/30.jpg)
ApplicationsLow-degree polynomials: A learner tries to learn an π-variate multilinear polynomial π of degree at most π over F", from random evaluations of π over F"$
π(ππ 8π) memory or ππ(π) samples
![Page 31: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/31.jpg)
ApplicationsLearning from sparse equations: A learner tries to learn π= π!, β¦ , π$ β {0,1}$ , from random sparse linear equations, of sparsity π, over F" (each equation depends on π variables)
π(π β π) memory or ππ(π) samplesWe will use this result for
proving sample lower bounds
for refuting CSPs under
memory constraints
![Page 32: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/32.jpg)
Proof Overview
![Page 33: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/33.jpg)
Branching Program (length π, width π)
Each layer represents a time step. Each vertex represents a memory state of the learner (π = 2!"!#$%). Each non-leaf vertex has 2&'() outgoing edges, one for each π, π β 0,1 &!Γ β1,1
(π!,π!)
(π, π)
π
π(π% , π
% )(π", π")
![Page 34: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/34.jpg)
Branching Program (length π, width π)
The samples π), π) , . . , π!, π! define a computation-path. Each vertex π£ in the last layer is labeled by Iπ* β 0,1 &. The output is the label Iπ* of the vertex reached by the path
(π!,π!)
(π, π)
π
π(π% , π
% )(π", π")
![Page 35: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/35.jpg)
Proof OverviewPK|M = distribution of π conditioned on the event that the computation-path reaches π£
Significant vertices: π£ s.t. ||PK|M||" β₯ 2N β 2/$
ππ π£ = probability that the path reaches π£
We prove that if π£ is significant, ππ π£ β€ 2/O(0β N)
Hence, if there are less than 2Q(0β N) vertices, with high probability, we reach a non-significant vertex
![Page 36: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/36.jpg)
Proof OverviewProgress Function [Razβ17]: For layer πΏ',
π' = XMβR"
ππ π£ β PK|M , PK|S 0
β π! = 2"#$β &
β π' is very slowly growing: π! β π(β If π β πΏ(, then π( β₯ ππ π β 2#)β & β 2"#$β &
Hence: If π is significant, ππ π β€ 2/O(0β N)
![Page 37: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/37.jpg)
Hardness of Refuting CSPs [G-Kothari-Razβ20]
![Page 38: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/38.jpg)
Refuting CSPs under Streaming Modelβ’ Fix a predicate π: 0,1 0 β {0,1}
β’ With bounded memory, distinguish between CSPs (with predicate π) where the constraints are randomly generated vs ones where the constraints are generated using a satisfying assignment
β’ Constraints arrive one by one in a stream
![Page 39: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/39.jpg)
Refuting CSPs under Streaming Modelβ’ Fix a predicate π: 0,1 0 β {0,1}
β’ With bounded memory, distinguish between CSPs (with predicate π) where the constraints are randomly generated vs ones where the constraints are generated using a satisfying assignment
β’ Constraints arrive one by one in a stream Refutation is at least as hard as distinguishing between Random CSPs and satisfiable ones
![Page 40: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/40.jpg)
Refuting CSPs under Streaming Modelβ’ π₯ is chosen uniformly from 0,1 $
β’ At each step π, π' β [π] is a randomly chosen (ordered) subset of size π.
β’ Distinguisher sees a stream of either π', π π₯T" or π', π' where π' is chosen uniformly from 0,1
π₯&!: projection of π₯ to coordinates of π'
![Page 41: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/41.jpg)
Reduction from Learning from Sparse EquationsAny algorithm that distinguishes random π βXOR (random CSPs with XOR predicate) on π variables from
π βXOR CSPs with a satisfying assignment (under the streaming model),
requires either π(π β π) memory or ππ(π) constraints
![Page 42: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/42.jpg)
Reduction from Learning from Sparse EquationsAny algorithm that distinguishes random π βXOR (random CSPs with XOR predicate) on π variables from
π βXOR CSPs with a satisfying assignment (under the streaming model),
requires either π(π β π) memory or ππ(π) constraints
For π β« π₯π¨π π, refutation is hard for algorithms using even super-linear memory, when the number of constraints is < ππ(π)
![Page 43: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/43.jpg)
Better Sample Bounds for Sub-Linear Memoryβ’ Stronger lower bounds on the number of constraints
needed to refute
β’ Even for CSPs with π‘-resilient predicates π [OWβ14, ALβ16, KMOWβ17]
β’ π: 0,1 0 β {0,1} is π‘-resilient if π has zero correlation with every parity function on at most π‘ β 1 bits
![Page 44: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/44.jpg)
Better Sample Bounds for Sub-Linear MemoryFor every 0 < π < 1, any algorithm that distinguishes
random CSPs (with a π‘-resilient predicate π) on π variables
from satisfiable ones (when the constraints arrive in a streaming fashion),
requires either ππ memory or ππ
π(π(π/π))constraints
![Page 45: Joint Works with to Memory-Sample (CMU), Ran Raz Avishay](https://reader033.vdocument.in/reader033/viewer/2022051307/6279f724ea17fe0243729c3f/html5/thumbnails/45.jpg)
Thank You!