![Page 1: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/1.jpg)
DASFAA 2015 Hanoi Tutorial
Scalable Learning Technologies
for Big Data MiningGerard de Melo, Tsinghua University
http://gerard.demelo.org
Aparna Varde, Montclair State Universityhttp://www.montclair.edu/~vardea/
DASFAA 2015 Hanoi Tutorial
Scalable Learning Technologies
for Big Data MiningGerard de Melo, Tsinghua University
http://gerard.demelo.org
Aparna Varde, Montclair State Universityhttp://www.montclair.edu/~vardea/
![Page 2: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/2.jpg)
Big DataBig DataBig DataBig Data
Image: Caixin
Image: Corbis
Alibaba: 31 million ordersper day! (2014)
Alibaba: 31 million ordersper day! (2014)
![Page 3: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/3.jpg)
Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web
Source: Coup Media 2013
![Page 4: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/4.jpg)
Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web
Source: Coup Media 2013
![Page 5: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/5.jpg)
From Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to Knowledge
Image:Brett Ryder
![Page 6: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/6.jpg)
Learning from DataLearning from Data
PreviousKnowledge
PreparationTime
Passed Exam?
Student 1 80% 48h Yes
Student 2 50% 75h Yes
Student 3 95% 24h Yes
Student 4 60% 24h No
Student 5 80% 10h No
![Page 7: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/7.jpg)
Learning from DataLearning from Data
PreviousKnowledge
PreparationTime
Passed Exam?
Student 1 80% 48h Yes
Student 2 50% 75h Yes
Student 3 95% 24h Yes
Student 4 60% 24h No
Student 5 80% 10h No
Student A 30% 20h ?
![Page 8: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/8.jpg)
Learning from DataLearning from Data
PreviousKnowledge
PreparationTime
Passed Exam?
Student 1 80% 48h Yes
Student 2 50% 75h Yes
Student 3 95% 24h Yes
Student 4 60% 24h No
Student 5 80% 10h No
Student A 30% 20h ?
Student B 80% 45h ?
![Page 9: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/9.jpg)
Machine LearningMachine Learning
Labelsfor Test
Data
PredictionPrediction
ProbablySpam!
ClassifierModel
Unsupervisedor SupervisedLearning
Unsupervisedor SupervisedLearning
D1
0.324
0.739
0.000
0.112
Data withor without
labels
![Page 10: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/10.jpg)
Unsupervisedor SupervisedLearning
Unsupervisedor SupervisedLearning
Data MiningData Mining
Labelsfor Test
DataClassifierModel
Data withor without
labels
World
Data Acquisition
Data Acquisition
Raw Data Preprocessing +Feature Engineering
Preprocessing +Feature Engineering
VisualizationVisualizationPredictionPrediction
Use of newKnowledge
Use of newKnowledge
Model
Data Acquisition
Data Acquisition
010010100101101110011
Model
D1
0.324
0.739
0.000
0.112
Unsupervisedor SupervisedAnalysis
Unsupervisedor SupervisedAnalysis
AnalysisResults
![Page 11: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/11.jpg)
Problem with Classic Methods: Problem with Classic Methods: ScalabilityScalability
Problem with Classic Methods: Problem with Classic Methods: ScalabilityScalability
http://www.whistlerisawesome.com/wp-content/uploads/2011/12/drinking-from-firehose.jpg
![Page 12: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/12.jpg)
Scaling UpScaling Up
![Page 13: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/13.jpg)
Scaling Up: More FeaturesScaling Up: More Features
PreviousKnowledge
PreparationTime
Passed Exam?
Student 1 80% 48h Yes
Student 2 50% 75h Yes
Student 3 95% 24h Yes
Student 4 60% 24h No
Student 5 80% 10h No
![Page 14: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/14.jpg)
Scaling Up: More FeaturesScaling Up: More Features
PreviousKnowledge
PreparationTime
... ... ... ... ... Passed Exam?
Student 1 80% 48h ... ... ... ... ... Yes
Student 2 50% 75h ... ... ... ... ... Yes
Student 3 95% 24h ... ... ... ... ... Yes
Student 4 60% 24h ... ... ... ... ... No
Student 5 80% 10h ... ... ... ... ... No
For example:- words and phrases mentioned in exam response- Facebook likes, Websites visited- user interaction details in online learning
For example:- words and phrases mentioned in exam response- Facebook likes, Websites visited- user interaction details in online learning
Could be many millions!Could be many millions!
![Page 15: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/15.jpg)
Scaling Up: More FeaturesScaling Up: More Features
PreviousKnowledge
PreparationTime
... ... ... ... ... Passed Exam?
Student 1 80% 48h ... ... ... ... ... Yes
Student 2 50% 75h ... ... ... ... ... Yes
Student 3 95% 24h ... ... ... ... ... Yes
Student 4 60% 24h ... ... ... ... ... No
Student 5 80% 10h ... ... ... ... ... No
Classic solution:Feature Selection
![Page 16: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/16.jpg)
Scaling Up: More FeaturesScaling Up: More Features
PreviousKnowledge
PreparationTime
... ... ... ... ... Passed Exam?
Student 1 80% 48h ... ... ... ... ... Yes
Student 2 50% 75h ... ... ... ... ... Yes
Student 3 95% 24h ... ... ... ... ... Yes
Student 4 60% 24h ... ... ... ... ... No
Student 5 80% 10h ... Σ ... ... ... No
Scalable solution:Buckets with sumsof original features
“Clicked onhttp://physics...”
“Clicked onhttp://icsi.berkeley...”
![Page 17: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/17.jpg)
Scaling Up: More FeaturesScaling Up: More Features
F0 F1 F2 F3 ... ... Fn Passed Exam?
Student 1 ... ... ... ... ... ... ... Yes
Student 2 ... ... ... ... ... ... ... Yes
Student 3 ... ... ... ... ... ... ... Yes
Student 4 ... ... ... ... ... ... ... No
Student 5 ... ... ... ... ... ... ... No
Feature Hashing: Use fixed feature dimensionality n.Hash original Feature ID (e.g. “Clicked on http://...”)to a bucket number in 0 to n-1Normalize features and use bucket-wise sums
Feature Hashing: Use fixed feature dimensionality n.Hash original Feature ID (e.g. “Clicked on http://...”)to a bucket number in 0 to n-1Normalize features and use bucket-wise sums
Small loss of precisionusually trumped by big gains from being able to use morefeatures
Small loss of precisionusually trumped by big gains from being able to use morefeatures
![Page 18: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/18.jpg)
Scaling Up: More Training ExamplesScaling Up: More Training Examples
Banko & Brill (2001): Word confusion experiments(e.g. “principal” vs. “principle”)
![Page 19: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/19.jpg)
Scaling Up: More Training ExamplesScaling Up: More Training Examples
Banko & Brill (2001): Word confusion experiments(e.g. “principal” vs. “principle”)
More Dataoften trumpsbetterAlgorithms
Alon Halevy, Peter Norvig, Fernando Pereira (2009).The Unreasonable Effectiveness of Data
More Dataoften trumpsbetterAlgorithms
Alon Halevy, Peter Norvig, Fernando Pereira (2009).The Unreasonable Effectiveness of Data
![Page 20: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/20.jpg)
Scaling Up: More Training ExamplesScaling Up: More Training Examples
Léon Bottou. Learning with Large Datasets Tutorial.Text Classification experiments
![Page 21: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/21.jpg)
Background:Stochastic Gradient Descent
Background:Stochastic Gradient Descent
Images:http://en.wikipedia.org/wiki/File:Hill_climb.png
http://en.wikipedia.org/wiki/Hill_climbing#mediaviewer/File:Local_maximum.png
move towardsoptimum by
approximatinggradient based
on 1 (or a smallbatch of) random
training examples
move towardsoptimum by
approximatinggradient based
on 1 (or a smallbatch of) random
training examples
Stochasticnature mayhelp us escapelocal optima
Stochasticnature mayhelp us escapelocal optima
Improvedvariants:
AdaGrad(Duchi et al. 2011)
AdaDelta(Zeiler et al. 2012)
Improvedvariants:
AdaGrad(Duchi et al. 2011)
AdaDelta(Zeiler et al. 2012)
![Page 22: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/22.jpg)
Scaling Up: More Training ExamplesScaling Up: More Training Examples
Recommended Tool
VowPal WabbitBy John Langford et al.http://hunch.net/~vw/
Recommended Tool
VowPal WabbitBy John Langford et al.http://hunch.net/~vw/
![Page 23: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/23.jpg)
Scaling Up: More Training ExamplesScaling Up: More Training Examples
Parallelization?Parallelization?
Use lock-free approach to
updating weight vectorcomponents
(HogWild! byNiu, Recht, et al.)
![Page 24: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/24.jpg)
Problem: Where to get Training Examples?
Problem: Where to get Training Examples?
Gerard de Melo
Labeled Data is expensive!
● Penn Chinese Treebank: 2 years for 4000 sentences
● Adaptation is difficult
● Wall Street Journal ≠ Novels ≠ Twitter
● For Speech Recognition, ideally need training data for each domain, voice/accent, microphone, microphone setup, social setting, etc.
http://en.wikipedia.org/wiki/File:Chronic_fatigue_syndrome.JPG
![Page 25: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/25.jpg)
Semi-Supervised Learning
Semi-Supervised Learning
![Page 26: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/26.jpg)
● Goal: When learning a model, use unlabeled data in addition to labeled data
● Example: Cluster-and-label
● Run a clustering algorithm on labeled and unlabeled data
● Assign cluster majority label to unlabeled examples of every cluster Image: Wikipedia
Semi-Supervised LearningSemi-Supervised Learning
![Page 27: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/27.jpg)
● Bootstrapping or Self-Training (e.g. Yarowsky 1995)
– Use classifier to label the unlabelled examples
– Add the labels with the highest confidence to the training data and re-train
– Repeat
Semi-Supervised LearningSemi-Supervised Learning
![Page 28: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/28.jpg)
● Co-Training (Blum & Mitchell 1998)
● Given:multiple (ideally independent) viewsof the same data(e.g. left context and right context of a word)
● Learn separate models for each view
● Allow different views to teach each other: Model 1 can generate labels that will be helpful to improve model 2 and vice versa.
Semi-Supervised LearningSemi-Supervised Learning
![Page 29: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/29.jpg)
Semi-Supervised Learning:Transductive Setting
Semi-Supervised Learning:Transductive Setting
Image: Partha Pratim Talukdar
![Page 30: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/30.jpg)
Semi-Supervised Learning:Transductive Setting
Semi-Supervised Learning:Transductive Setting
Image: Partha Pratim Talukdar
Algorithms: Label Propagation (Zhu et al. 2003), Adsorption (Baluja et al. 2008),Modified Adsorption (Talukdar et al. 2009)
![Page 31: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/31.jpg)
● Sentiment Analysis:
● Look for Twitter tweets with emoticons like “:)”, “:(“
● Remove emoticons. Then use as training data!
Crimson Hexagon
Distant SupervisionDistant Supervision
![Page 32: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/32.jpg)
Representation Learning to BetterExploit Big Data
Representation Learning to BetterExploit Big Data
![Page 33: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/33.jpg)
RepresentationsRepresentationsRepresentationsRepresentations
Image: David Warde-Farley via Bengio et al. Deep Learning Book
![Page 34: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/34.jpg)
RepresentationsRepresentationsRepresentationsRepresentations
Inputs Bits:
0011001…..Images: Marc'Aurelio Ranzato
Note sharingbetween classes
Note sharingbetween classes
![Page 35: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/35.jpg)
RepresentationsRepresentationsRepresentationsRepresentations
Inputs Bits:
0011001…..Images: Marc'Aurelio Ranzato
Massive improvements inimage object recognition (human-level?),
speech recognition.
Good improvements inNLP and IR-related tasks.
Massive improvements inimage object recognition (human-level?),
speech recognition.
Good improvements inNLP and IR-related tasks.
![Page 36: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/36.jpg)
ExampleExampleExampleExample
Google's image Source: Jeff Dean, Google
![Page 37: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/37.jpg)
Inspiration: The BrainInspiration: The BrainInspiration: The BrainInspiration: The Brain
Source: Alex Smola
Input: delivered via dendrites from other neurons
Processing:Synapses may alter input signals. The cell then combines all input signals
Output: If enough activationfrom inputs, output signal sentthrough a long cable (“axon”)
Input: delivered via dendrites from other neurons
Processing:Synapses may alter input signals. The cell then combines all input signals
Output: If enough activationfrom inputs, output signal sentthrough a long cable (“axon”)
![Page 38: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/38.jpg)
PerceptronPerceptronPerceptronPerceptron
Input: Features
Every feature fi gets a weight w
i.
Input: Features
Every feature fi gets a weight w
i.
feature weight
dog 7.2
food 3.4
bank -7.3
delicious 1.5
train -4.2
Feature f1
Feature f2
Feature f3
Feature f4
Neuron
w1
w2
w3
w4
![Page 39: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/39.jpg)
PerceptronPerceptronPerceptronPerceptron
Neuron Output
w1
w2
w3
w4
Activation of NeuronMultiply the feature valuesof an object xwith the feature weights.
Activation of NeuronMultiply the feature valuesof an object xwith the feature weights.
a( x)=∑i
wi f i(x)=wtf ( x)
Feature f1
Feature f2
Feature f3
Feature f4
![Page 40: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/40.jpg)
PerceptronPerceptronPerceptronPerceptron
Neuron Output
w1
w2
w3
w4
Output of NeuronCheck if activation exceedsa threshold t = –b
Output of NeuronCheck if activation exceedsa threshold t = –b
Feature f1
Feature f2
Feature f3
Feature f4
output (x)=g (w t f (x)+b)
e.g. g could return1 (positive) if positive,-1 otherwise
e.g. 1 for “spam”,-1 for “not-spam”
e.g. 1 for “spam”,-1 for “not-spam”
![Page 41: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/41.jpg)
Decision SurfacesDecision SurfacesDecision SurfacesDecision Surfaces
Decision Trees Linear Classifiers(Perceptron, SVM)
Kernel-based Classifiers(Kernel Perceptron, Kernel SVM)
Multi-Layer Perceptron
Images: Vibhav Gogate
Not max-marginNot max-marginOnly straightdecision surface
Only straightdecision surface
Any decisionsurface
Any decisionsurface
![Page 42: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/42.jpg)
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Neuron1
Output
Feature f1
Feature f2
Feature f3
Feature f4
Neuron2
Neuron
Input Layer Output LayerHidden Layer
![Page 43: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/43.jpg)
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Neuron1
Output
Feature f1
Feature f2
Feature f3
Feature f4
Neuron2
Neuron
Neuron2
Input Layer Hidden Layer Output Layer
![Page 44: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/44.jpg)
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Neuron1
Output 1
Feature f1
Feature f2
Feature f3
Feature f4
Neuron2
Neuron
Neuron2
Neuron Output 2
Input Layer Hidden Layer Output Layer
![Page 45: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/45.jpg)
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Single-Layer:
output (x)=g (W f ( x)+b)
Input Layer (Feature Extraction)
f (x)
Three-Layer Network:
output ( x)=g2(W
2g
1(W
1f (x)+b
1)+b
2)
Four-Layer Network:
output ( x)=g3(W
3g
2(W
2g
1(W
1f ( x)+b
1)+b
2)+b
3)
![Page 46: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/46.jpg)
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
Deep Learning:Deep Learning:Multi-Layer PerceptronMulti-Layer Perceptron
![Page 47: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/47.jpg)
Deep Learning:Deep Learning:Computing the OutputComputing the Output
Deep Learning:Deep Learning:Computing the OutputComputing the Output
Simplyevaluate theoutput function(for each node,compute anoutput based onthe node inputs)
Simplyevaluate theoutput function(for each node,compute anoutput based onthe node inputs)
Output
y1Input x1
Input x2
z1
z2
z3Output
y2
![Page 48: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/48.jpg)
Deep Learning:Deep Learning:TrainingTraining
Deep Learning:Deep Learning:TrainingTraining
Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it
Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it
Backpropagation
The error ispropagated backfrom outputnodes towardsthe input layer
Backpropagation
The error ispropagated backfrom outputnodes towardsthe input layer
Output
y1Input x1
Input x2
z1
z2
z3Output
y2
![Page 49: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/49.jpg)
Exploit the chain rule to computethe gradient
Deep Learning:Deep Learning:TrainingTraining
Deep Learning:Deep Learning:TrainingTraining
Backpropagation
The error ispropagated backfrom outputnodes towardsthe input layer
Backpropagation
The error ispropagated backfrom outputnodes towardsthe input layer
Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it
Compute erroron output,if non-zero,do a stochasticgradient stepon the errorfunction to fix it
x
y=f(x)
z=g(y)
∂ z
∂ y
∂ y
∂ x∂ z
∂ x=
∂ z∂ y
∂ y∂ x
We are interested in the gradient,i.e. the partial derivatives for the output function z=g(y) with respect to all [inputs and] weights, including those at a deeper part of the network
![Page 50: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/50.jpg)
DropOut TechniqueDropOut TechniqueDropOut TechniqueDropOut Technique
Basic Idea
While training, randomly drop inputs (make the feature zero)
Basic Idea
While training, randomly drop inputs (make the feature zero)
Effect
Training on variations of original training data (artificial increaseof training data size). Trained network relies less on the existence of specific features.
Effect
Training on variations of original training data (artificial increaseof training data size). Trained network relies less on the existence of specific features.
Reference: Hinton et al. (2012)
Also: Maxout Networks by Goodfellow et al. (2013)
![Page 51: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/51.jpg)
Deep Learning:Deep Learning:Convolutional Neural NetworksConvolutional Neural Networks
Deep Learning:Deep Learning:Convolutional Neural NetworksConvolutional Neural Networks
Image: http://torch.cogbits.com/doc/tutorials_supervised/
Reference: Yann LeCun's work
![Page 52: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/52.jpg)
Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks
Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks
Source: Bayesian Behavior Lab, Northwestern University
![Page 53: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/53.jpg)
Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks
Deep Learning:Deep Learning:Recurrent Neural NetworksRecurrent Neural Networks
Source: Bayesian Behavior Lab, Northwestern University
Then can do backpropagation.Challenge: Vanishing/Exploding gradients
Then can do backpropagation.Challenge: Vanishing/Exploding gradients
![Page 54: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/54.jpg)
Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks
Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks
Source: Bayesian Behavior Lab, Northwestern University
![Page 55: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/55.jpg)
Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks
Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks
Deep LSTMsfor Sequence-to-sequenceLearning
Suskever et al. 2014(Google)
![Page 56: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/56.jpg)
Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks
Deep Learning:Deep Learning:Long Short Term Memory NetworksLong Short Term Memory Networks
French Original:La dispute fait rage entre les grands constructeurs aéronautiques propos de la largeur des siges de la classe touriste sur les vols long-courriers, ouvrant la voie une confrontation amre lors du salon aéronautique de Duba qui a lieu de mois-ci.
LSTM's English Translation: The dispute is raging between large aircraft manufacturers on the size of the touristseats on the long-haul flights, leading to a bitter confrontation at the Dubai Airshowin the month of October.
Ground Truth English Translation:A row has flared up between leading plane makers over the width of tourist-classseats on long-distance flights, setting the tone for a bitter confrontation at thisMonth's Dubai Airshow.
Suskever et al. 2014 (Google)
![Page 57: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/57.jpg)
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Source: Bayesian Behavior Lab, Northwestern University
![Page 58: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/58.jpg)
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Source: Bayesian Behavior Lab, Northwestern University
![Page 59: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/59.jpg)
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Source: Bayesian Behavior Lab, Northwestern University
![Page 60: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/60.jpg)
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Source: Bayesian Behavior Lab, Northwestern University
![Page 61: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/61.jpg)
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Source: Bayesian Behavior Lab, Northwestern University
![Page 62: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/62.jpg)
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Deep Learning:Deep Learning:Neural Turing MachinesNeural Turing Machines
Learning to sort!Learning to sort!
Vectors for numbersare random
![Page 63: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/63.jpg)
Big Data inFeature Engineering
andRepresentation Learning
Big Data inFeature Engineering
andRepresentation Learning
![Page 64: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/64.jpg)
● Language Models for Autocompletion
Web Semantics:Statistics from Big Data as Features
Web Semantics:Statistics from Big Data as Features
![Page 65: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/65.jpg)
Source: Wang et al. An Overview of Microsoft Web N-gram Corpus and Applications
Word SegmentationWord Segmentation
![Page 66: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/66.jpg)
● NP Coordination
Source: Bansal & Klein (2011)
Parsing: AmbiguityParsing: Ambiguity
![Page 67: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/67.jpg)
Source: Bansal & Klein (2011)
Parsing: Web SemanticsParsing: Web Semantics
![Page 68: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/68.jpg)
● Lapata & Keller (2004): The Web as a Baseline (also: Bergsma et al. 2010)
● “big fat Greek wedding”but not “fat Greek big wedding”
Source: Shane Bergsma
Adjective OrderingAdjective Ordering
![Page 69: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/69.jpg)
Source: Bansal & Klein 2012
Coreference ResolutionCoreference Resolution
![Page 70: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/70.jpg)
Source: Bansal & Klein 2012
Coreference ResolutionCoreference Resolution
![Page 71: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/71.jpg)
● Data Sparsity:E.g. most words are rare (in the “long tail”)→ Missing in training data
● Solution (Blitzer et al. 2006, Koo & Collins 2008, Huang & Yates 2009, etc.)
● Cluster together similar features
● Use clustered features instead of / in addition to original features
BrownCorpus
Source:Baroni& Evert
Distributional SemanticsDistributional Semantics
![Page 72: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/72.jpg)
Even worse: Arnold Schwarzenegger
Spelling CorrectionSpelling Correction
![Page 73: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/73.jpg)
Vector RepresentationsVector Representations
xx
x x
petronia
sparrow
parched
aridxdry
x bird
Put words intoa vector space
(e.g. with d=300dimensions)
Put words intoa vector space
(e.g. with d=300dimensions)
![Page 74: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/74.jpg)
Word Vector RepresentationsWord Vector Representations
Tomas Mikolov et al.Proc. ICLR 2013.
Available from https://code.google.com/p/word2vec/
![Page 75: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/75.jpg)
WikipediaWikipedia
![Page 76: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/76.jpg)
● Exploit edit history, especially on Simple English Wikipedia
● “collaborate” → “work together”“stands for” → “is the same as”
Text SimplificationText Simplification
![Page 77: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/77.jpg)
Answering Questions
IBM's Jeopardy!-winning Watson system
Gerard de Melo
![Page 78: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/78.jpg)
Knowledge Integration
![Page 79: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/79.jpg)
UWN/MENTA
multilingual extension of WordNet forword senses and taxonomical information over 200 languages
www.lexvo.org/uwn/
![Page 80: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/80.jpg)
WebChild
AAAI 2014WSDM 2014AAAI 2011
WebChild
AAAI 2014WSDM 2014AAAI 2011
WebChild:Common-Sense Knowledge
WebChild:Common-Sense Knowledge
![Page 81: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/81.jpg)
Challenge: From Really Big DataChallenge: From Really Big Datato Real Insightsto Real Insights
Challenge: From Really Big DataChallenge: From Really Big Datato Real Insightsto Real Insights
Image:Brett Ryder
![Page 82: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/82.jpg)
Big Data Miningin Practice
Big Data Miningin Practice
![Page 83: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/83.jpg)
Gerard de Melo (Tsinghua University, Bejing China) Aparna Varde (Montclair State University, NJ, USA)
DASFAA, Hanoi, Vietnam, April 2015
1
![Page 84: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/84.jpg)
Dr. Aparna Varde
2
![Page 85: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/85.jpg)
Internet-based computing - shared resources, software &
data provided on demand, like the electricity grid
Follows a pay-as-you-go model
3
![Page 86: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/86.jpg)
Several technologies, e.g., MapReduce & Hadoop
MapReduce: Data-parallel programming model for
clusters of commodity machines
• Pioneered by Google
• Processes 20 PB of data per day
Hadoop: Open-source framework, distributed
storage and processing of very large data sets
• HDFS (Hadoop Distributed File System) for storage
• MapReduce for processing
• Developed by Apache 4
![Page 87: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/87.jpg)
• Scalability
– To large data volumes
– Scan 100 TB on 1 node @ 50 MB/s = 24 days
– Scan on 1000-node cluster = 35 minutes
• Cost-efficiency – Commodity nodes (cheap, but unreliable)
– Commodity network (low bandwidth)
– Automatic fault-tolerance (fewer admins)
– Easy to use (fewer programmers)
5
![Page 88: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/88.jpg)
Data type
key-value records
Map function
(Kin, Vin) list(Kinter, Vinter)
Reduce function
(Kinter, list(Vinter)) list(Kout, Vout)
6
![Page 89: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/89.jpg)
MapReduce Example
the quick brown
fox
the fox ate the mouse
how now brown
cow
Map
Map
Map
Reduce
Reduce
brown, 2 fox, 2 how, 1 now, 1 the, 3
ate, 1 cow, 1
mouse, 1 quick, 1
the, 1 brown, 1
fox, 1
quick, 1
the, 1 fox, 1 the, 1
how, 1 now, 1
brown, 1 ate, 1
mouse, 1
cow, 1
Input Map Shuffle & Sort Reduce Output
7
![Page 90: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/90.jpg)
40 nodes/rack, 1000-4000 nodes in cluster 1 Gbps bandwidth in rack, 8 Gbps out of rack Node specs (Facebook):
8-16 cores, 32 GB RAM, 8×1.5 TB disks, no RAID
Aggregation switch
Rack switch
8
![Page 91: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/91.jpg)
Files split into 128MB blocks
Blocks replicated across
several data nodes (often 3)
Name node stores metadata
(file names, locations, etc)
Optimized for large files,
sequential reads
Files are append-only
Namenode
Datanodes
1 2 3 4
1 2 4
2 1 3
1 4 3
3 2 4
File1
9
![Page 92: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/92.jpg)
Hive: Relational
D/B on Hadoop developed at Facebook
Provides SQL-like query language
10
![Page 93: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/93.jpg)
Supports table partitioning,
complex data types, sampling,
some query optimization
These help discover knowledge
by various tasks, e.g., • Search for relevant terms
• Operations such as word count
• Aggregates like MIN, AVG
11
![Page 94: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/94.jpg)
/* Find documents of enron table with word frequencies within range of 75 and 80
*/ SELECT DISTINCT D.DocID FROM docword_enron D WHERE D.count > 75 and D.count < 80 limit 10; OK 1853… 11578 16653 Time taken: 64.788 seconds
12
![Page 95: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/95.jpg)
/* Create a view to find the count for WordID=90 and docID=40, for the nips table */
CREATE VIEW Word_Freq AS SELECT D.DocID, D.WordID, V.word,
D.count FROM docword_Nips D JOIN vocabNips V ON D.WordID=V.WordID AND D.DocId=40 and D.WordId=90; OK Time taken: 1.244 seconds
13
![Page 96: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/96.jpg)
/* Find documents which use word "rational" from nips table */
SELECT D.DocID,V.word FROM docword_Nips D JOIN vocabnips V ON D.wordID=V.wordID and V.word="rational" LIMIT 10; OK 434 rational 275 rational 158 rational …. 290 rational 422 rational Time taken: 98.706 seconds
14
![Page 97: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/97.jpg)
/* Find average frequency of all words in
the enron table */
SELECT AVG(count)
FROM docWord_enron;
OK
1.728152608060543
Time taken: 68.2 seconds
15
![Page 98: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/98.jpg)
Query Execution Time for HQL & MySQL on big data sets
Similar claims for other SQL packages
16
![Page 99: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/99.jpg)
17
Server Storage Capacity
Max Storage per instance
![Page 100: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/100.jpg)
18
![Page 101: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/101.jpg)
Hive supports rich data types: Map, Array & Struct; Complex types
It supports queries with SQL Filters, Joins, Group By, Order By etc.
Here is when (original) Hive users miss SQL….
No support in Hive to update data after insert
No (or little) support in Hive for relational semantics (e.g., ACID)
No "delete from" command in Hive - only bulk delete is possible
No concept of primary and foreign keys in Hive
19
![Page 102: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/102.jpg)
Ensure dataset is already compliant with
integrity constraints before load
Ensure that only compliant data rows are
loaded SELECT & temporary staging table
Check for referential constraints using Equi-
Join and query on those rows that comply
20
![Page 103: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/103.jpg)
Providing more power than Hive, Hadoop & MR
21
![Page 104: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/104.jpg)
Cloudera’s Impala: More efficient SQL-compliant analytic database
Hortonworks’ Stinger: Driving the future
of Hive with enterprise SQL at Hadoop scale
Apache’s Mahout: Machine Learning
Algorithms for Big Data Spark: Lightning fast framework for Big
Data MLlib: Machine Learning Library of Spark
MLbase: Platform Base for MLlib in Spark
22
![Page 105: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/105.jpg)
Fully integrated, state-of-the-art analytic
D/B to leverage the flexibility &
scalability of Hadoop
Combines benefits • Hadoop: flexibility, scalability, cost-effectiveness
• SQL: performance, usability, semantics
23
![Page 106: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/106.jpg)
MPP: Massively Parallel Processing
![Page 107: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/107.jpg)
NDV: function for
counting • Table w/ 1 billion rows
COUNT(DISTINCT) • precise answer
• slow for large-scale
data
NDV() function • approximate result
• much faster
25
![Page 108: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/108.jpg)
Hardware Configuration • Generates less CPU load than Hive
• Typical performance gains: 3x-4x
• Impala cannot go faster than hardware permits!
Query Complexity • Single-table aggregation queries : less gain
• Queries with at least one join: gains of 7-45X
Main Memory as Cache • Data accessed by query is in cache, speedup is more
• Typical gains with cache: 20x-90x
26
![Page 109: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/109.jpg)
No - Many viable use cases for MR
& Hive • Long-running data transformation
workloads & traditional DW frameworks
• Complex analytics on limited, structured
data sets
Impala is a complement to the
approaches • Supports cases with very large data sets
• Especially to get focused result sets
quickly
27
![Page 110: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/110.jpg)
Drive future of Hive with enterprise SQL at
Hadoop scale
3 main objectives • Speed: Sub-second query response times
• Scale: From GB to TB & PB
• SQL: Transactions & SQL:2011 analytics for Hive
28
![Page 111: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/111.jpg)
Wider use cases with modifications to data
BEGIN, COMMIT, ROLLBACK for multi-stmt transactions
29
![Page 112: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/112.jpg)
Hybrid engine with LLAP (Live Long and Process) • Caching & data reuse across queries
• Multi-threaded execution
• High throughput I/O
• Granular column level security 30
![Page 113: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/113.jpg)
Common Table Expressions
Sub-queries: correlated & uncorrelated
Rollup, Cube, Standard Aggregates
Inner, Outer, Semi & Cross Joins
Non Equi-Joins
Set Functions: Union, Except & Intersect
Most sub-queries, nested and otherwise
31
![Page 114: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/114.jpg)
32
![Page 115: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/115.jpg)
Supervised and Unsupervised Learning Algorithms
33
![Page 116: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/116.jpg)
ML algorithms on distributed frameworks
good for mining big data on the cloud
Supervised Learning: e.g. Classification
Unsupervised Learning: e.g. Clustering
The word Mahout means “elephant rider” in Hindi (from India), an interesting analogy
34
![Page 117: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/117.jpg)
Clustering (Unsupervised) • K-means, Fuzzy k-means, Streaming k-means etc.
Classification (Supervised) • Random Forest, Naïve Bayes etc.
Collaborative Filtering (Semi-Supervised)
• Item Based, Matrix Factorization etc.
Dimensionality Reduction (For Learning)
• SVD, PCA etc.
Others
• LDA for Topic Models, Sparse TF-IDF Vectors from Text etc. 35
![Page 118: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/118.jpg)
Input: Big Data from Emails • Goal: automatically classify text in various categories
Prior to classifying text, TF-IDF applied • Term Frequency – Inverse Document Frequency
• TF-IDF increases with frequency of word in doc, offset
by frequency of word in corpus
Naïve Bayes used for classification • Simple classifier using posteriori probability
• Each attribute is distributed independently of others
36
![Page 119: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/119.jpg)
Training Data
Pre-Process
Training Algorithm
Model
Building the Model
Historical data
with reference
decisions:
• Collected a set of e-
mails organized in
directories labeled
with predictor
categories:
• Mahout
• Hive
• Other
• Stored email as text
in HDFS on an EC2
virtual server
Using Apache
Mahout:
• Convert text
files to HDFS
Sequential
File format
• Create TF-IDF
weighted
Sparse
Vectors
Build and
evaluate the
model with
Mahout’s implementation
of Naïve Bayes
Classifier
Classification
Model which takes
as input vectorized
text documents
and assigns one of
three document
topics:
• Mahout
• Hive
• Other
37
![Page 120: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/120.jpg)
New Data
Model Output
Using Model to Classify New Data
• Store a set of new
email documents
as text in HDFS on
an EC2 virtual
server
• Pre-process using
Apache Mahout’s Libraries
Use the existing model
to predict topics for
new text files.
This was implemented in Java
• With Apache Mahout Libraries and
• Apache Maven to manage dependencies and build the project
• The executable JAR file is submitted with the project documentation
• The program works with data files stored in HDFS
For each input
document, the model
returns one of the
following categories:
• Mahout
• Hive
• Other
38
![Page 121: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/121.jpg)
Text Classification - Results
Predicted Category: Hive
Input:
Output
:
Possible uses
• Automatic email
sorting
• Automatic news
classification
• Topic modeling
39
![Page 122: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/122.jpg)
Lightning fast processing for Big Data
Open-source cluster computing developed in
AMPLab at UC Berkeley
Advanced DAG execution engine for cyclic data
flow & in-memory computing
Very well-suited for large-scale Machine Learning
40
![Page 123: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/123.jpg)
Spark runs much faster
than Hadoop & MR
100x faster in memory
10x faster on disk
![Page 124: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/124.jpg)
42
![Page 125: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/125.jpg)
Uses TimSort (derived from merge-sort & insertion-sort), faster than quick-sort
Exploits cache locality due to in-memory computing
Fault-tolerant when scaling, well-designed for failure-recovery
Deploys power of cloud through enhanced N/W & I/O intensive throughput
43
![Page 126: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/126.jpg)
Spark Core: Distributed task dispatching, scheduling, basic I/O
Spark SQL: Has SchemaRDD (Resilient Distributed Databases); SQL support with CLI, ODBC/JDBC
Spark Streaming: Uses Spark’s fast scheduling for stream analytics, code written for batch analytics can be used for streams
MLlib: Distributed machine learning framework (10x of Mahout)
Graph X: Distributed graph processing framework with API 44
![Page 127: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/127.jpg)
MLBase - tools & interfaces to bridge gap b/w
operational & investigative ML
Platform to support MLlib - the distributed
Machine Learning Library on top of Spark
45
![Page 128: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/128.jpg)
MLlib: Distributed ML library for classification, regression, clustering & collaborative filtering
MLI: API for feature extraction, algorithm development, high-level ML programming abstractions
ML Optimizer: Simplifies ML problems for end users by automating model selection
46
![Page 129: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/129.jpg)
Classification • Support Vector Machines (SVM), Naive Bayes, decision trees
Regression
• linear regression, regression trees
Collaborative Filtering • Alternating Least Squares (ALS)
Clustering
• k-means
Optimization • Stochastic gradient descent (SGD), Limited-memory BFGS
Dimensionality Reduction
• Singular value decomposition (SVD), principal component analysis (PCA)
47
![Page 130: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/130.jpg)
Easily interpretable Ensembles are top performers
Support for categorical variables Can handle missing data Distributed decision trees Scale well to massive datasets
48
![Page 131: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/131.jpg)
Uses modified version of k-means
Feature extraction & selection
• Extraction: lot of time and tools
• Selection: domain expertise
• Wrong selection of features: bad quality clusters
Glassbeam’s SCALAR platform • SPL (Semiotic Parsing Language)
• Makes feature engineering easier & faster
49
![Page 132: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/132.jpg)
50
![Page 133: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/133.jpg)
High-D data: Not all features IMP to build
model & answer Qs
Many applications: Reduce dimensions before
building model
MLlib: 2 algorithms for dimensionality
reduction • Principal Component Analysis (PCA)
• Singular Value Decomposition (SVD)
51
![Page 134: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/134.jpg)
52
![Page 135: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/135.jpg)
53
![Page 136: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/136.jpg)
Grow into unified platform for data scientists
Reduce time to market with platforms like
Glassbeam’s SCALAR for feature engineering
Include more ML algorithms
Introduce enhanced filters
Improve visualization for better performance
54
![Page 137: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/137.jpg)
Processing Streaming Data on the Cloud
55
![Page 138: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/138.jpg)
Apache Storm • Reliably process unbounded streams, do for
real-time what Hadoop did for batch processing
Apache Flink • Fast & reliable large scale data processing
engine with batch & stream based alternatives
56
![Page 139: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/139.jpg)
Integrates queueing & D/B
Nimbus node
• Upload computations • Distribute code on cluster • Launch workers on cluster • Monitor computation
ZooKeeper nodes
• Coordinate the Storm cluster
Supervisor nodes • Interacts w/ Nimbus through Zookeeper
• Starts & stops workers w/ signals from Nimbus
57
![Page 140: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/140.jpg)
Tuple: ordered list of elements,
e.g., a “4-tuple” (7, 1, 3, 7)
Stream: unbounded sequence of tuples
Spout: source of streams in a computation (e.g. Twitter API)
Bolt: process I/P streams & produce O/P streams to run functions
Topology: overall calculation, as N/W of spouts and bolts
58
![Page 141: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/141.jpg)
Exploits in-memory data streaming & adds iterative processing into system
Makes system super fast for data-intensive & iterative jobs
59
Performance Comparison
![Page 142: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/142.jpg)
Requires few config parameters
Built-in optimizer finds best way to run program
Supports all Hadoop I/O & data types
Runs MR operators unmodified & faster
Reads data from HDFS
60
Execution of Flink
![Page 143: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/143.jpg)
Summary and Ongoing Work
61
![Page 144: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/144.jpg)
MapReduce & Hadoop: Pioneering tech
Hive: SQL-like, good for querying
Impala: Complementary to Hive, overcomes its drawbacks
Stinger: Drives future of Hive w/ advanced SQL semantics
Mahout: ML algorithms (Sup & Unsup)
Spark: Framework more efficient & scalable than Hadoop
MLlib: Machine Learning Library of Spark
MLbase: Platform supporting MLlib
Storm: Stream processing for cloud & big data
Flink: Both stream & batch processing
62
![Page 145: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/145.jpg)
Store & process Big Data? • MR & Hadoop - Classical technologies
• Spark – Very large data, Fast & scalable
Query over Big Data?
• Hive – Fundamental, SQL-like
• Impala – More advanced alternative
• Stinger - Making Hive itself better
ML supervised / unsupervised?
• Mahout - Cloud based ML for big data
• MLlib - Super large data sets, super fast
Mine over streaming big data?
• Only streams – Storm
• Batch & Streams – Flink
63
![Page 146: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/146.jpg)
Big Data Skills • Cloud Technology • Deep Learning • Business Perspectives • Scientific Domains
Salary $100k +
• Big Data Programmers
• Big Data Analysts
Univ Programs & concentrations • Data Analytics • Data Science
64
![Page 147: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/147.jpg)
Big Data Concentration being developed in CS Dept • http://cs.montclair.edu/ • Data Mining, Remote Sensing, HCI,
Parallel Computing, Bioinformatics, Software Engineering
Global Education Programs available for visiting and exchange students • http://www.montclair.edu/global-
education/
Please contact me for details 65
![Page 148: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/148.jpg)
Include more cloud intelligence in big data analytics
Further bridge gap between Hive & SQL
Add features from standalone ML packages to Mahout, MLlib …
Extend big data capabilities to focus more on PB & higher scales
Enhance mining of big data streams w/ cloud services & deep learning
Address security & privacy issues on a deeper level for cloud & big data
Build lucrative applications w/ scalable technologies for big data
Conduct domain-specific research, e.g., Cloud & GIS, Cloud & Green Computing
66
![Page 149: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/149.jpg)
[1] M. Bansal, D. Klein. Coreference Semantics from Web Features. In Proceedings of
ACL 2012.
[2] R. Bekkerman, M. Bilenko, J. Langford (Eds.). Scaling Up Machine Learning:
Parallel and Distributed Approaches. Cambridge University Press, 2011.
[3] T. Brants, A. Franz. Web 1T 5-Gram Version 1, Linguistic Data Consortium, 2006.
[4] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa. Natural
Language Processing (Almost) from Scratch, Journal of Machine Learning Research
(JMLR), 2011.
[5] G. de Melo. Exploiting Web Data Sources for Advanced NLP. In Proceedings of
COLING 2012.
[6] G. de Melo, K. Hose. Searching the Web of Data. In Proceedings of ECIR 2013.
Springer LNCS.
[7] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large
Clusters. In USENIX OSDI-04, San Francisco, CA, pp. 137-149.
[8] C. Engle, A. Lupher, R. Xin, M. Zaharia, M. Franklin, S. Shenker, I. Stoica. Shark: Fast
Data Analysis using Coarse-Grained Distributed Memory. In Proceedings of
SIGMOD 2013, ACM, pp. 689-692.
[9] A. Ghoting, P. Kambadur, E. Pednault, R. Kannan. NIMBLE: A Toolkit for the
Implementation of Parallel Data Mining and Machine Learning Algorithms on
MapReduce. In Proceedings of KDD 2011. ACM, New York, NY, USA, pp. 334-342.
[10] K. Hammond, A.Varde. Cloud-Based Predictive Analytics,. In ICDM-13 KDCloud
Workshop, Dallas, Texas, December 2013.
67
![Page 150: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/150.jpg)
[11] K. Hose, R. Schenkel, M. Theobald, G. Weikum: Database Foundations for
Scalable RDF Processing. Reasoning Web 2011, pp. 202-249.
[12] R. Kiros, R. Salakhutdinov, R. Zemel. Multimodal Neural Language Models. In
Proc. of ICML, 2014.
[13] T. Kraska, A. Talwalkar, J.Duchi, R. Griffith, M. Franklin, M. Jordan. MLbase: A
Distributed Machine Learning System. In Conference on Innovative Data Systems
Research, 2013.
[14] Q.V. Le, T. Mikolov. Distributed Representations of Sentences and Documents. In
Proceedings of ICML, 2014.
[15] J. Leskovec, A. Rajaraman, J. Ullman. Mining of Massive Datasets. Cambridge
University Press, 2011.
[16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean. Distributed
Representations of Words and Phrases and Their Compositionality. In NIPS 26, pp.
3111–3119, 2013.
[17] R. Nayak, P. Senellart, F. Suchanek, A. Varde. Discovering Interesting Information
with Advances in Web Technology. SIGKDD Explorations 14(2): 63-81 (2012).
[18] M. Riondato, J. DeBrabant, R. Fonseca, E. Upfal. PARMA: A Parallel Randomized
Algorithm for Approximate Association Rules Mining in MapReduce. In
Proceedings of CIKM 2012. ACM, pp. 85-94.
[19] F. Suchanek, A. Varde, R.Nayak, P.Senellart. The Hidden Web, XML and the
Semantic Web: Scientific Data Management Perspectives. EDBT 2011, Uppsala,
Sweden, pp. 534-537.
68
![Page 151: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/151.jpg)
[20] N. Tandon, G. de Melo, G. Weikum. Deriving a Web-Scale Common Sense Fact
Database. In Proceedings of AAAI 2011.
[21] J. Tancer, A. Varde. The Deployment of MML For Data Analytics Over The Cloud.
In ICDM-11 KDCloud Workshop, December 2011, Vancouver, Canada, pp. 188-195.
[22] A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R.
Murthy. Hive: A Petabyte Scale Data Warehouse Using Hadoop, 2009.
[23] J. Turian, L. Ratinov, Y. Bengio. Word Representations: A simple and general
method for semi-supervised learning. In Proceedings of ACL 2010.
[24] A. Varde, F. Suchanek, R.Nayak, P.Senellart. Knowledge Discovery over the Deep
Web, Semantic Web and XML. DASFAA 2009, Brisbane, Australia, pp. 784-788.
[25] T. White. Hadoop. The Definitive Guide, O’Reilly, 2011. [26] http://flink.incubator.apache.org
[27] https://github.com/twitter/scalding [28] http://hortonworks.com/labs/stinger/
[29]http://linkeddata.org/
[30] http://mahout.apache.org/
[31] https://storm.apache.org
69
![Page 152: Scalable Learning Technologies for Big Data Mining](https://reader033.vdocument.in/reader033/viewer/2022042818/55b745f8bb61ebc4188b4781/html5/thumbnails/152.jpg)
Contact Information
Gerard de Melo ([email protected]) [http://gerard.demelo.org]
Aparna Varde ([email protected]) [http://www.montclair.edu/~vardea]
70