![Page 1: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/1.jpg)
CMU SCS
U Kang (CMU) 1KDD 2012
GigaTensor: Scaling Tensor Analysis Up By 100 Times –
Algorithms and Discoveries
U Kang
ChristosFaloutsos
School of Computer ScienceCarnegie Mellon University
EvangelosPapalexakis
AbhayHarpale
![Page 2: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/2.jpg)
CMU SCS
U Kang (CMU) 2KDD 2012
Outline
Problem Definition
Algorithm
Discoveries
Conclusions
![Page 3: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/3.jpg)
CMU SCS
U Kang (CMU) 3KDD 2012
Background: Tensor
Tensors (=multi-dimensional arrays) are every-where Hyperlinks and anchor texts in Web graphs
URL 1
URL 2
Anchor Text
Java
C++
C#
11
1
1
1
11
![Page 4: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/4.jpg)
CMU SCS
U Kang (CMU) 4KDD 2012
Background: Tensor
Tensors (=multi-dimensional arrays) are every-where Sensor stream (time, location, type) Predicates (subject, verb, object) in knowledge base
“Barrack Obama is the president of U.S.”
“Eric Clapton playsguitar”
(26M)
(26M)
(48M) NELL (Never Ending
Language Learner) dataNonzeros =144M
![Page 5: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/5.jpg)
CMU SCS
U Kang (CMU) 5KDD 2012
Problem Definition
Q1: How to decompose a billion-scale tensor? Corresponds to SVD in 2D case
![Page 6: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/6.jpg)
CMU SCS
U Kang (CMU) 6KDD 2012
Problem Definition
Q2: What are the important concepts and syn-onyms in a KB tensor? Q2.1: What are the dominant concepts in the
knowledge base tensor? Q2.2: What are the synonyms to a given noun
phrase?
(26M)
(26M)
(48M) NELL (Never Ending
Language Learner) dataNonzeros =144M
![Page 7: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/7.jpg)
CMU SCS
U Kang (CMU) 7KDD 2012
Outline
Problem Definition
Algorithm
Discoveries
Conclusions
![Page 8: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/8.jpg)
CMU SCS
U Kang (CMU) 8KDD 2012
Algorithm: Problem Definition
Q1: How to decompose a billion-scale tensor? Corresponds to SVD in 2D case
![Page 9: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/9.jpg)
CMU SCS
U Kang (CMU) 9KDD 2012
Challenge
Alternating Least Square (ALS) Algorithm
• •
: pseudo-inverse
How to design fast MapReduce algorithm for the ALS?
: Hadamard: Khatri-Rao
(J=26M)
(I=26M)
(K=48M)
Details
![Page 10: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/10.jpg)
CMU SCS
U Kang (CMU) 10KDD 2012
Main Idea
1. Ordering of Computation Our choice
FLOPS (NELL data)𝟖 ⋅𝟏𝟎𝟗
FLOPS (NELL data)𝟐 .𝟓⋅𝟏𝟎𝟏𝟕
Details
![Page 11: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/11.jpg)
CMU SCS
U Kang (CMU) 11KDD 2012
Main Idea
2. Avoiding Intermediate Data Explosion
Size of Intermediate Data (NELL) - Naïve: 100 PB
(J=26M)
(I=26M)
(K=48M)
Details
![Page 12: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/12.jpg)
CMU SCS
U Kang (CMU) 12KDD 2012
Main Idea
2. Avoiding Intermediate Data Explosion
Size of Intermediate Data (NELL)- Proposed: 1.5 GB
Details
Size of Intermediate Data (NELL) - Naïve: 100 PB
(Before) (After)
![Page 13: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/13.jpg)
CMU SCS
U Kang (CMU) 13KDD 2012
Experiments
GigaTensor solves 100x larger problem
Number of nonzero= I / 50
(J)
(I)
(K)
GigaTensor
Tensor
Toolbox Out ofMemory
100x
![Page 14: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/14.jpg)
CMU SCS
U Kang (CMU) 14KDD 2012
Outline
Problem Definition
Algorithm
Discoveries
Conclusions
![Page 15: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/15.jpg)
CMU SCS
U Kang (CMU) 15KDD 2012
Discoveries: Problem Definition
Q2: What are the important concepts and syn-onyms in a KB tensor? Q2.1: What are the dominant concepts in the
knowledge base tensor? Q2.2: What are the synonyms to a given noun
phrase?
(26M)
(26M)
(48M) NELL (Never Ending
Language Learner) dataNonzeros =144M
![Page 16: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/16.jpg)
CMU SCS
U Kang (CMU) 16KDD 2012
A2.1: Concept Discovery
Concept Discovery in Knowledge Base
![Page 17: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/17.jpg)
CMU SCS
U Kang (CMU) 17KDD 2012
A2.1: Concept Discovery
![Page 18: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/18.jpg)
CMU SCS
U Kang (CMU) 18KDD 2012
A2.2: Synonym Discovery
Synonym Discovery in Knowledge Base
a1 a2 aR…
(Given) noun phrase
(Discovered) synonym 1
(Discovered) synonym 2
![Page 19: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/19.jpg)
CMU SCS
U Kang (CMU) 19KDD 2012
A2.2: Synonym Discovery
![Page 20: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/20.jpg)
CMU SCS
U Kang (CMU) 20KDD 2012
Outline
Problem Definition
Algorithm
Discoveries
Conclusions
![Page 21: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/21.jpg)
CMU SCS
U Kang (CMU) 21KDD 2012
Conclusion
GigaTensor: scalable tensor decomposition al-gorithm for billion-length modes tensors Algorithm: avoid intermediate data explosion Discoveries: concept discovery and contextual syn-
onym detection on KB tensor
![Page 22: CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer](https://reader036.vdocument.in/reader036/viewer/2022062321/56649edc5503460f94bed1aa/html5/thumbnails/22.jpg)
CMU SCS
U Kang (CMU) 22KDD 2012
Thank you !www.cs.cmu.edu/~pegasuswww.cs.cmu.edu/~ukang