machine learning for mathematicians€¦ · machine learning for mathematicians data science data...
TRANSCRIPT
![Page 1: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/1.jpg)
Machine Learning for Mathematicians
Machine Learning for Mathematicians
Daniel Mckenzie
Tuesday March 6th, 2018
![Page 2: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/2.jpg)
Machine Learning for Mathematicians
Why should we care about Machine Learning
1 Necessary for non-academic jobs.
2 Can be useful in your research.
3 Your (future) students will need to know about it.
![Page 3: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/3.jpg)
Machine Learning for Mathematicians
An outline of this talk
1 Disambiguation of buzzwords.
2 Simple (yet effective) approaches.
3 Deep approaches.
4 Survey of applications.
5 Current research trends.
![Page 4: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/4.jpg)
Machine Learning for Mathematicians
What do we mean by Data?
1 Could be images, audio signals, stock prices, results of surveysetc.
2 Can always vectorize
Example (Greyscale Images)
Suppose each Ia is a 28× 28 array of pixel values. Each pixel valueis a number between 0 and 256 with 0 =black and 256 =white.Can think of each Ia as a 28× 28 matrix Aa
ij . Can make this avector by stacking: xa = [A11, . . . ,A1,28,A2,1, . . . ,A2,28, . . . ,A28,28]
More sophisticated approaches uses Fourier Transform orWavelets (Applied Harmonic Analysis). Each entry of vector iscalled a feature.
3 Three V’s of Big Data: Variety, Volume and Velocity.
![Page 5: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/5.jpg)
Machine Learning for Mathematicians
Data Science, Machine Learning, and Artificial Intelligence1
1 Data Science: Produce insights from data for humans.
2 Machine Learning: Find a function f that predicts y frominput x . Eg f (Image) = cat. How f is doing this is oftenunclear.
3 Artificial Intelligence: Produce or recommend an actionfrom data. Eg AlphaGo, self-driving cars.
Caution: Distinction between ‘general’ AI (long wayoff/impossible) and ‘single purpose’ AI (AlphaGo, self-driving cars).
1Robinson 2018.
![Page 6: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/6.jpg)
Machine Learning for Mathematicians
Data Science
Data Scientists use
1 Statistical know how to ‘wrangle’ complex data in a variety offormats into a clean, usable (vectorized) data set X .
2 Algorithms (Regression, Data Clustering, Neural Networksetc) to extract insights from X . E.g.: identify a trend/correlation, find outliers (fraud prevention), computequantities of interest (likelihood of certain type of consumerto renew cable contract).
3 Domain-specific knowledge to evaluate appropriateness of theabove.
to produce easily interpretable summaries (pie charts, reports,visualizations) to inform decision-making of other parties(management, sales team, R& D, government).
![Page 7: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/7.jpg)
Machine Learning for Mathematicians
(Supervised) Machine Learning
1 Model Problem: Identify people from pictures.
2 Key assumption: Let D be domain of interest (e.g. allpossible 28× 28 pictures). Let C be codomain of interest(e.g. the names of people we wish to identify). We assumethere exists a continuous function f ∗ : D→ C mapping allphotos of Dan to ‘Dan’ ∈ C.
3 Goal of Machine Learning: function approximation. Find anapproximation f # to f ∗.
4 Learning f #: Given training set X = {x1, . . . , xn} ⊂ D andknown labels Y = {y1, . . . , yn} ⊂ C. Find function f # suchthat f #(xi ) ≈ yi for all i .
Caution: Generalizability very important. Need to be confidentthat given x /∈ X f #(x) ≈ f ∗(x)
![Page 8: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/8.jpg)
Machine Learning for Mathematicians
Artificial Intelligence
This slide intentionally left blank
![Page 9: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/9.jpg)
Machine Learning for Mathematicians
Simple Approaches to Machine Learning2
1 Let P be a class of ‘easy functions’ (e.g. piecewisepolynomial). Find f # as:
f # = argmin{L(f ,X ) : f ∈ P}
Think L(f ,X ) =∑n
i=1 ‖f (xi )− yi‖2. Regression, Splines,Finite Elements.
2 K -Nearest Neighbours. LetNK (x) = {xi1 , . . . , xiK nearest to x}. Compute
f #(x) = 1K
∑Kj=1 yij .
3 Support Vector Machine.
4 Decision trees.
2Goodfellow et al. 2016.
![Page 10: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/10.jpg)
Machine Learning for Mathematicians
Simple Approaches: Logistic Regression
Suppose |C| = 2 e.g. C = {‘Dan’, ’Not Dan’}. Define sigmoid/logistic function g(u) = 1/ (1 + e−u). Look for f # of the form:
fw(x) =
{‘Dan’ if g(w>x) ≈ 1
‘not Dan’ if g(w>x) ≈ 0
That is, f # = argmin{L(fw,X ) : w ∈ Rn}. Can think ofz = g(w>x) as probability that the image contains Dan.
Figure: Schematic depiction of Perceptron
![Page 11: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/11.jpg)
Machine Learning for Mathematicians
(Shallow) Neural Networks
Essentially iterated Logistic Regression:
Figure: Schematic depiction of 2-layer Neural Network
![Page 12: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/12.jpg)
Machine Learning for Mathematicians
(Shallow) Neural Networks cont.
Notation: fW denotes Neural Network with weightsW = {w1, . . . ,w5}. fW(x) = z.Typically, z1 = probability x in class 1, z2 = probability x in class 2.Architecture: Choice of number of layers and neurons per layer.Activation function: g . Many other choices, but must benon-linear!.These layers are fully connected.Need to find good W. will vectorize: w = [w1,w2, . . . ,w5].Need to solve f # = argmin{L(fw,X ) : w ∈ Rn1 × Rn2 × . . .Rn5}
![Page 13: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/13.jpg)
Machine Learning for Mathematicians
Gradient Descent
1 The problem: Find minimum of F : Rm → R. Can assumethat F is differentiable.
2 Know that −∇F (w) ⊂ Rm points in direction of steepestdecrease of F at x.
3 Gradient Descent Algorithm: wk+1 = wk − ε∇F (wk).
4 For Neural Networks: F (w) = L(fw,X ) (Think:L(fw,X ) =
∑ni=1 ‖fw(xi )− yi‖2). Randomly initialize w0.
Compute wk+1 using gradient descent until ‘good enough’.
5 Issue 1: Computing ∇L can be costly (typically useStochastic Gradient Descent).
6 Issue 2: L is usually (highly) non-convex. No guarantee thatGradient Descent will converge.
![Page 14: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/14.jpg)
Machine Learning for Mathematicians
Skills necessary for ML
For Undergrads
1 Coursework: Multivariable calculus, Linear Algebra, NumericalAnalysis, Probability.
2 Online resources: http://cs229.stanford.edu/,https://www.coursera.org/learn/machine-learning
Additional resources for Grads
1 Coursework: Harmonic Analysis, Image Processing, Statistics.
2 Some programming.
3 Deep learning book: http://www.deeplearningbook.org/
4 18.657: Graduate Course on Mathematics of MachineLearning taught at MIT (all materials/lecture notes availableonline)
5 Blogs: http://nuit-blanche.blogspot.com/
![Page 15: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/15.jpg)
Machine Learning for Mathematicians
Deep Neural Networks
1 Key Insight: Vectorizing/ feature extraction is the mostimportant step.
2 Many techniques from Applied Harmonic Analysis (e.g.Wavelets, Curvelets,. . . ) could be used.
3 Deep Learning: Use many convolutional layers to extractgood, problem specific features. Then use a few, fullyconnected layers to classify.
4 Hinton, Osindero, and Teh 2006 3 was first to show this wasfeasible.
5 Krizhevsky, Sutskever, and Hinton 2012 presented a Deep NNhalving previous error rate for image classification
6 Key Drivers of DL: Increased processing power (GPU’s).Large training sets (sourced from the internet).
3Geoff Hinton is the great-great-grandson of George Boole, inventor ofBoolean logic.
![Page 16: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/16.jpg)
Machine Learning for Mathematicians
Deep Neural Networks 4
4Figure from:https://developingideas.me/deepneuralnetworkoverview/
![Page 17: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/17.jpg)
Machine Learning for Mathematicians
Prototypical Applications of Machine Learning
1 Handwritten Digit Classification State-of-the-art algorithmsare > 99.75% accurate.
2 Automated Captioning: Given an image, algorithm shouldoutput brief sentence describing what is going on .
3 Natural Language Processing: Alexa, Siri et. al.Sentiment Analysis.
Figure: First two pictures from Karpathy and Fei-Fei 2015
![Page 18: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/18.jpg)
Machine Learning for Mathematicians
Current Research Trends
1 Dealing with Data scarcity.
2 Regularization and priors.
3 Transfer Learning: Getting a neural network trained to do onething (e.g. play ‘Pong’) to learn to do another thing quickly(e.g. play ‘Seaquest’) (see Fernando et al. 2017).
4 What the hell is actually going on here? Still not clear howdeep neural networks do what they do. This leaves themsusceptible to manipulation (adversarial attacks).
![Page 19: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/19.jpg)
Machine Learning for Mathematicians
An Adversarial Patch5
5Brown et al. 2017.
![Page 20: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/20.jpg)
Machine Learning for Mathematicians
Applications to Mathematics: Data Driven DynamicalSystems6
1 For many physical/ biological systems of interest:x(t) = f (x(t)).
2 Can usually collect historical data via observation:X = {x(t1), x(t2), . . . , x(tn)} andY = {x(t1), x(t2), . . . , x(tn)}
3 Model: Assume that f (x(t)) is a sparse linear combinationof elementary functions ϕ1, . . . , ϕN (e.g. polynomials, trig.functions etc)
4 Use Machine Learning to find an optimal f # =∑N
i=1 aiϕi .(Strong connections with Compressive Sensing).
6Brunton, Proctor, and Kutz 2016.
![Page 21: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/21.jpg)
Machine Learning for Mathematicians
Application to Mathematics: Predicting Hodge Numbers
1 Let WP4 be weighted projective space.
2 Large finite number of 3-dim Calabi Yau Ma ⊂WP4. Eachcut out by a degree w =
∑4i=0 wi homogeneous polynomial.
3 Of interest to string theorists to compute Hodge numbers hi ,j
4 To each Ma associate the data vector xa of coefficients ofdefining polynomial7.
5 Training set: X = {x1, . . . , xm} andY = {h2,1(M1), . . . , h2,1(Mm)}.
6 In (He 2017), Neural Network was trained in above dataset topredict whether h2,1(M) large ( > 50) or not large (≤ 50)given M.
7 They report 94.4% accuracy on unseen data.
7Plus possibly some side info like χ
![Page 22: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/22.jpg)
Machine Learning for Mathematicians
Thanks! Any questions?
Figure: Neural Style Transfer from Johnson, Alahi, and Fei-Fei 2016
![Page 23: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/23.jpg)
Machine Learning for Mathematicians
References I
Brown, Tom B et al. (2017). “Adversarial patch”. In: arXivpreprint arXiv:1712.09665.
Brunton, Steven L, Joshua L Proctor, and J Nathan Kutz (2016).“Discovering governing equations from data by sparseidentification of nonlinear dynamical systems”. In: Proceedingsof the National Academy of Sciences 113.15, pp. 3932–3937.
Fernando, Chrisantha et al. (2017). “Pathnet: Evolution channelsgradient descent in super neural networks”. In: arXiv preprintarXiv:1701.08734.
Goodfellow, Ian et al. (2016). Deep learning. Vol. 1. MIT pressCambridge.
He, Yang-Hui (2017). “Deep-learning the landscape”. In: arXivpreprint arXiv:1706.02714.
![Page 24: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/24.jpg)
Machine Learning for Mathematicians
References II
Hinton, Geoffrey E, Simon Osindero, and Yee-Whye Teh (2006).“A fast learning algorithm for deep belief nets”. In: Neuralcomputation 18.7, pp. 1527–1554.
Johnson, Justin, Alexandre Alahi, and Li Fei-Fei (2016).“Perceptual losses for real-time style transfer andsuper-resolution”. In: European Conference on ComputerVision. Springer, pp. 694–711.
Karpathy, Andrej and Li Fei-Fei (2015). “Deep visual-semanticalignments for generating image descriptions”. In: Proceedingsof the IEEE conference on computer vision and patternrecognition, pp. 3128–3137.
![Page 25: Machine Learning for Mathematicians€¦ · Machine Learning for Mathematicians Data Science Data Scientists use 1 Statistical know how to ‘wrangle’ complex data in a variety](https://reader034.vdocument.in/reader034/viewer/2022042521/5f4a11bf91bb81620f672a89/html5/thumbnails/25.jpg)
Machine Learning for Mathematicians
References III
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton (2012).“Imagenet classification with deep convolutional neuralnetworks”. In: Advances in neural information processingsystems, pp. 1097–1105.
Robinson, David (2018). What’s the difference between datascience, machine learning, and artificial intelligence?http://varianceexplained.org. Blog.