ee270 large scale matrix computation, …web.stanford.edu/class/ee270/lecture1.pdfi project: i you...

79
EE270 Large scale matrix computation, optimization and learning Instructor : Mert Pilanci Stanford University Tuesday, Jan 7 2020

Upload: others

Post on 08-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

EE270

Large scale matrix computation,

optimization and learning

Instructor : Mert Pilanci

Stanford University

Tuesday, Jan 7 2020

Page 2: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Outline

• Introduction

• Administrative

• Overview of topics and applications

Page 3: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

AdministrativeTeaching sta↵

I Instructor: Mert PilanciI Email: [email protected] O�ce hours: Wednesday 3-5pm in Packard 255

TA: Tolga Ergen, [email protected]

I TA o�ce hours: TBA

I Public web page :http://web.stanford.edu/class/ee270/

Please check Canvas for up-to-date infoFor all questions please use Piazza

Page 4: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

About EE-270

This course will explore the theory and practice of randomizedmatrix computation and optimization for large-scale problems toaddress challenges in modern massive data sets.I Our goal in this course is to help you to learn:

I randomized methods for linear algebra, optimization andmachine learning

I probabilistic tools for analyzing randomized approximationsI how to implement optimization algorithms for large scale

problemsI applications in machine learning, statistics, signal

processing and data mining.

Page 5: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

About EE-270

This course will explore the theory and practice of randomizedmatrix computation and optimization for large-scale problems toaddress challenges in modern massive data sets.I Our goal in this course is to help you to learn:

I randomized methods for linear algebra, optimization andmachine learning

I probabilistic tools for analyzing randomized approximations

I how to implement optimization algorithms for large scaleproblems

I applications in machine learning, statistics, signal

processing and data mining.

Page 6: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

About EE-270

This course will explore the theory and practice of randomizedmatrix computation and optimization for large-scale problems toaddress challenges in modern massive data sets.I Our goal in this course is to help you to learn:

I randomized methods for linear algebra, optimization andmachine learning

I probabilistic tools for analyzing randomized approximationsI how to implement optimization algorithms for large scale

problems

I applications in machine learning, statistics, signal

processing and data mining.

Page 7: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

About EE-270

This course will explore the theory and practice of randomizedmatrix computation and optimization for large-scale problems toaddress challenges in modern massive data sets.I Our goal in this course is to help you to learn:

I randomized methods for linear algebra, optimization andmachine learning

I probabilistic tools for analyzing randomized approximationsI how to implement optimization algorithms for large scale

problemsI applications in machine learning, statistics, signal

processing and data mining.

Page 8: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Prerequisites

I Familiarity with linear algebra (EE 103 or equivalent).

I Probability theory and statistics (EE 178 or equivalent)

I Basic programming skills

Page 9: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Homework

I Assigned homeworks will be bi-weekly.

I The problem sets will include programming assignments toimplement algorithms covered in the class.

I We will also analyze randomized algorithms using probabilistictools.

I We support Python and Matlab.

Page 10: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Group Study

I Homework:I Working in groups is allowed, but each member must submit

their own writeup.I Write the members of your group on your solutions (Up to

four people are allowed).

I Project:I You will be asked to form groups of about 1-2 people and

choose a topicI I will suggest a list of research problems on the course websiteI Proposal (1 page) and progress report (4 pages)I Final presentation (last week of classes)

Page 11: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Topics

I randomized linear algebraI approximate matrix multiplicationI tools from probability theoryI sampling and projection methods

I randomized linear system solvers and regressionI leverage scoresI iterative sketching and preconditioningI sparse linear systemsI robust regression

I matrix decompositionsI randomized QR decompositionI randomized low rank factorizationI column subset selection

Page 12: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Topics

I large-scale optimizationI empirical risk minimizationI stochastic gradient methods and variantsI second order methods and Hessian approximationsI asynchronous distributed optimization

I kernel methodsI Reproducing kernel Hilbert spacesI Nystrom approximationsI Random featuresI neural networks and Neural Tangent Kernel

I information-theoretic methodsI Error-resilient computations via error-correcting codesI Lower-bounds on random projections

Page 13: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

For details see Canvas!

Any questions?

Page 14: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Overview of topics and applications

Page 15: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Scale of data

I Every day, we create 2.5 billion gigabytes of data

I Data stored grows 4x faster than world economy (Mayer-Schonberger)

Page 16: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Scale of data

I Every day, we create 2.5 billion gigabytes of data

I Data stored grows 4x faster than world economy (Mayer-Schonberger)

Page 17: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Scale of data

I Every day, we create 2.5 billion gigabytes of data

I Data stored grows 4x faster than world economy (Mayer-Schonberger)

Page 18: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Scale of data

I Every day, we create 2.5 billion gigabytes of data

I Data stored grows 4x faster than world economy (Mayer-Schonberger)

Page 19: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Deep learning revolution

Page 20: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Big data matricesI n ⇥ d datamatrix

An

d

I Small: we can look at the data and find solutions easilyI Medium: Fits into RAM and one can run computations in

reasonable timeI Large: Doesn’t easily fit into RAM. One can’t relatively easily

run computations.

Page 21: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Typical data matrices

I Rectangular data (object-feature data): n objects, each ofwhich are described by d features, e.g., document-term data,people-SNPs data.

I Correlation matrices

I Kernels and similarity matrices

I Laplacians or Adjacency matrices of graphs.

I Discretizations of dynamical systems, ODEs and PDEs

I Constraint matrices

I ...

Page 22: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Typical data matrices

I Rectangular data:

essentially a two-dimensional matrix with rows indicatingrecords (cases) and columns indicating features (variables)

I Example: Airline dataset

depart arrive origin dest dist weather delay cancelled

00:00:01 13:35:01 RNO LAS 345 0 107:20:01 08:40:01 SFO SAN 447 40 007:25:01 10:15:01 OAK PHX 646 0 007:30:01 08:30:01 OAK BUR 325 0 0

...

Page 23: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

in machine learning, statistics and signal processing

I More data points typically increase the accuracy of models! large scale matrix computation and optimization problems

e.g. matrix multiplication, matrix factorization, singular valuedecomposition, convex optimization, non-convexoptimization...

Can we reduce the data volume with minimal loss of

information ?

Page 24: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

in machine learning, statistics and signal processing

I More data points typically increase the accuracy of models! large scale matrix computation and optimization problems

e.g. matrix multiplication, matrix factorization, singular valuedecomposition, convex optimization, non-convexoptimization...

Can we reduce the data volume with minimal loss of

information ?

Data size

Com

puta

tion

IDEAL

Page 25: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Matrix Computations

I Data matrix A 2 Rn⇥d where n, d are extremely large

Examples:

I Airline dataset (120GB) n = 120⇥ 106, d = 28Flight arrival and departure details from 1987 to 2008

I Imagenet dataset (1.31TB) n = 14⇥ 106, d = 2⇥ 105

14 Million images for visual recognition

[US Department of Transportation][Deng et al. 2009]

Page 26: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Matrix Computations

I Data matrix A 2 Rn⇥d where n, d are extremely large

Examples:

I Airline dataset (120GB) n = 120⇥ 106, d = 28Flight arrival and departure details from 1987 to 2008

I Imagenet dataset (1.31TB) n = 14⇥ 106, d = 2⇥ 105

14 Million images for visual recognition

[US Department of Transportation][Deng et al. 2009]

Page 27: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Matrix Computations

I Data matrix A 2 Rn⇥d where n, d are extremely large

Examples:

I Airline dataset (120GB) n = 120⇥ 106, d = 28Flight arrival and departure details from 1987 to 2008

I Imagenet dataset (1.31TB) n = 14⇥ 106, d = 2⇥ 105

14 Million images for visual recognition

[US Department of Transportation][Deng et al. 2009]

Page 28: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Approximate Matrix Multiplication

I How to approximate the matrix product AB fast ?

Page 29: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Least Squares Problems

Least squaresminx

kAx � yk2

[Gauss, 1795]

variable

response

Page 30: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

DATA OPTIMIZER

Page 31: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

DATA OPTIMIZER

Page 32: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

DATA OPTIMIZER

parameter

cost

all data

Page 33: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

DATA OPTIMIZER

parameter

cost

all data

sample

Page 34: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

DATA OPTIMIZER

parameter

cost

all data

Page 35: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

DATA OPTIMIZER

𝑤1

𝑤2

𝑤3

𝑤4

𝑤5

𝑤6

𝑤7

𝑤8parameter

cost

all data

Page 36: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

DATA OPTIMIZER

𝑤1

𝑤2

𝑤3

𝑤4

𝑤5

𝑤6

𝑤7

𝑤8parameter

cost

all data

combined

Page 37: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Randomized Sketching

An

d

Page 38: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Randomized Sketching

An

d

Sm SA= m

d

Page 39: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Randomized Least Squares Solvers

I A : n ⇥ d feature matrix, and y : n ⇥ 1 response vector

I Original problem OPT = minx2C

kAx � yk2| {z }I Randomized approximation min

x2CkAx � yk2| {z }

I A and y are smaller approximations

Page 40: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

QR decomposition

I The Gram–Schmidt process takes a finite, linearly independentset of vectors v1, ..., vn 2 Rd generates an orthogonal setu1, ..., uk 2 Rd that spans the same n-dimensional subspace.

Page 41: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

QR decomposition

I The Gram–Schmidt process takes a finite, linearly independentset of vectors v1, ..., vn 2 Rd generates an orthogonal setu1, ..., uk 2 Rd that spans the same n-dimensional subspace.

I complexity O(dn2)

I randomized algorithm complexity ⇡ O(dn)

produces an approximately orthogonal basis

Page 42: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Low-rank matrix approximations

I Singular Value Decomposition (SVD)

I A = U⌃V T

I takes O(nd2) time for A 2 Rn⇥d

I best rank-k approximation is Ak := Uk⌃kVTk =

Pki=1 �iuiv

Ti

I kA� Akk2 �k+1

Page 43: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Randomized low-rank matrix approximations

I Randomized (SVD)

I approximation C (e.g. a subset of the columns of A)

I AAT ⇡ CC

T

I Ak = CC†A is a randomized rank-k approximation

I kA� Akk22 �2k+1 + ✏kAk22

Page 44: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Iterative Methods

I Gradient descent and momentum acceleration

I Iterative sketching methods

I Conjugate gradient

I Preconditioning

I Sparse linear systems

I Stochastic Gradient Descent

I Variance reduction

I Adaptive gradient methods: Adagrad, ADAM

Page 45: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Newton’s Method

minx2C

g(x)

xt+1 = argmin

x2Chrg(x t), x � x

ti+ 1

2(x � x

t)Tr2g(x t)(x � x

t)

Page 46: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 47: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

0OE

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 48: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 49: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 50: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 51: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 52: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 53: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 54: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 55: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

0OE��

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 56: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

BGGJOF�JOWBSJBOU

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 57: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

0OE

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 58: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Gradient Descent vs Newton’s Method

time (hours)

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 59: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Randomized Newton’s Method

minx2C

g(x)

xt+1 = argmin

x2Chrg(x t), x � x

ti+ 1

2(x � x

t)T r2g(x t)(x � x

t)

I r2g(x t) ⇡ r2

g(x t) is an approximate Hessian

Page 60: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Randomized Newton’s Method

minx2C

g(x)

xt+1 = argmin

x2Chrg(x t), x � x

ti+ 1

2(x � x

t)T r2g(x t)(x � x

t)

I r2g(x t) ⇡ r2

g(x t) is an approximate Hessian

Diagonal, subsampled, low-rank approximations yield

I Adagrad, ADAM

I Stochastic Variance Reduced Gradient (SVRG)

I Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm

Page 61: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Linear Programming

I LP in standard form where A 2 Rn⇥d

minAxb

cTx

I Log barrier

minx µcT x �nX

i=1

log(bi � aTi x)

I Hessian ATdiag

⇣1

(bi�aTi x)2

⌘A takes O(nd2) operations

Page 62: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Linear Programming

I LP in standard form where A 2 Rn⇥d

minAxb

cTx

I Log barrier

minx µcT x �nX

i=1

log(bi � aTi x)

I Hessian ATdiag

⇣1

(bi�aTi x)2

⌘A takes O(nd2) operations

Page 63: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Exact Newton

Xn

i=1

log(bi���aiT�x)

c�

cT�x�¦

min� cT�x

Axb

µ

Page 64: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Interior Point Methods for Linear Programming

I Hessian of f (x) = cTx �

Pni=1 log(bi � a

Ti x)

r2f (x) = A

Tdiag

✓1

(bi � aTi x)2

◆A ,

I Root of the Hessian

(r2f (x))1/2 = diag

✓1

|bi � aTi x |

◆A ,

I Sketch of the Hessian

St(r2

f (x))1/2 = Stdiag

✓1

|bi � aTi x |

◆A

takes O(md2) operations

Page 65: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Interior Point Methods for Linear Programming

I Hessian of f (x) = cTx �

Pni=1 log(bi � a

Ti x)

r2f (x) = A

Tdiag

✓1

(bi � aTi x)2

◆A ,

I Root of the Hessian

(r2f (x))1/2 = diag

✓1

|bi � aTi x |

◆A ,

I Sketch of the Hessian

St(r2

f (x))1/2 = Stdiag

✓1

|bi � aTi x |

◆A

takes O(md2) operations

Page 66: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Interior Point Methods for Linear Programming

I Hessian of f (x) = cTx �

Pni=1 log(bi � a

Ti x)

r2f (x) = A

Tdiag

✓1

(bi � aTi x)2

◆A ,

I Root of the Hessian

(r2f (x))1/2 = diag

✓1

|bi � aTi x |

◆A ,

I Sketch of the Hessian

St(r2

f (x))1/2 = Stdiag

✓1

|bi � aTi x |

◆A

takes O(md2) operations

Page 67: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Exact NewtonNewton Sketch

Xn

i=1

log(bi���aiT�x)

c�

cT�x�¦�

min� cT�x

Axb

Page 68: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Trial 1 Trial 2 Trial 3

Exact NewtonNewton Sketch

(a) sketch size m = d

(b) sketch size m = 4d

Trial 1 Trial 2 Trial 3

Exact NewtonNewton Sketch

(a) sketch size m = d

(b) sketch size m = 4d

Page 69: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Trial 1 Trial 2 Trial 3

Exact NewtonNewton Sketch

(a) sketch size m = d

Trial 1 Trial 2 Trial 3

Exact NewtonNewton Sketch

(b) sketch size m = 4d

Trial 1 Trial 2 Trial 3

Exact NewtonNewton Sketch

(a) sketch size m = d

(b) sketch size m = 4d

Page 70: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

High dimensional problems n ⌧ d

I xt+1 = Axt + But , t = 1, ...,T

I minimum fuel control from 0 ! xf

minu

kuk1

s.t. [ B AB A2B · · · ]u = xf

I nT decision variables

I We can apply sampling and sketching for the variablesu 2 RnT

I Basic idea: dual linear program has nT constraints

Page 71: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

High dimensional problems n ⌧ d

I xt+1 = Axt + But , t = 1, ...,T

I minimum fuel control from 0 ! xf

minu

kuk1

s.t. [ B AB A2B · · · ]u = xf

I nT decision variables

I We can apply sampling and sketching for the variablesu 2 RnT

I Basic idea: dual linear program has nT constraints

Page 72: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Kernel methodsI Kernel matrices

given data points x1, ..., xn 2 Rd

e.g., Gaussian kernel Kij = e� 1

�2 kxi�xjk22

I large n ⇥ n square matrices

Page 73: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Kernel methodsI Kernel matrices

given data points x1, ..., xn 2 Rd

e.g., Gaussian kernel Kij = e� 1

�2 kxi�xjk22

I large n ⇥ n square matrices

Kn

S SK=n

log 𝑛n

Page 74: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Large Graphs

I Adjacency matrix or Laplacian

I Examples: a gene network and a co-authorship network graph

Page 75: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Sampling Graphs

I Random sampling graphs

Page 76: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Non-convex Optimization Problems

I In general, very di�cult to solve globally

I Need to make further assumptions

Page 77: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Non-convex Optimization Problems

minx

nX

i=1

(fx(ai )� yi )2

Page 78: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Non-convex Optimization Problems

minx

nX

i=1

(fx(ai )� yi )2

! Heuristic: Gauss-Newton method

xt+1 = argminx

k fxt (A) + Jtx| {z }Taylor’s approx for fx

�yk22

where (Jt)ij =@@xj

fx(ai ) is the Jacobian matrix

I Jacobian can be sampled for faster computations

Page 79: EE270 Large scale matrix computation, …web.stanford.edu/class/ee270/Lecture1.pdfI Project: I You will be asked to form groups of about 1-2 people and choose a topic I I will suggest

Questions?