![Page 2: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/2.jpg)
Braxton McKee Founder, CEO
Keith KegleyPresident
Eyal GoldwergerExec. Chairman
Ben BeineckeHead of Strategy
Ronen HilewiczVP Engineering
Tom PetersEngineer
Steven BenarioHead of Product
Jay MoolenaarHead of Sales
Alex TsannesEngineer
Ross GoodwinEngineer
Amichai LevyEngineer
Tony JebaraScientific Advisor & Head of ML Lab at
Columbia
![Page 3: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/3.jpg)
Why should I have to write a different program if I have 1000 rows or 1 billion?
![Page 4: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/4.jpg)
Advantage
Shortest time-‐to-‐codeWrite in single-‐threaded python, R, or equivalent.
Disadvantage
Slow, can’t handle scale.
![Page 5: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/5.jpg)
Advantage
Shortest time-‐to-‐code
Maximal performance
Write in single-‐threaded python, R, or equivalent.
Code parallel version by hand
Disadvantage
Slow, can’t handle scale.
Lots of code, hard to get right (race conditions).
![Page 6: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/6.jpg)
Advantage
Shortest time-‐to-‐code
Maximal performance
Only have to implement a few patterns.
Write in single-‐threaded python, R, or equivalent.
Code parallel version by hand
Code against APIs that implement particular patterns. (MapReduce)
Disadvantage
Slow, can’t handle scale.
Lots of code, hard to get right (race conditions).
Can be hard to fit a problem into the given pattern, or it’s really inefficient.
![Page 7: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/7.jpg)
Advantage
Shortest time-‐to-‐code
Maximal performance
Only have to implement a few patterns.
Users can implement lots of algorithms very quickly without a lot of code.
Write in single-‐threaded python, R, or equivalent.
Code parallel version by hand
Code against APIs that implement particular patterns. (MapReduce)
Figure out how to automatically scale up programs written in python, R, etc.
Disadvantage
Slow, can’t handle scale.
Lots of code, hard to get right (race conditions).
Can be hard to fit a problem into the given pattern, or it’s really inefficient.
Tough to implement. Can be hard to make optimal decisions without knowing what the computation will do.
![Page 8: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/8.jpg)
Advantage
Shortest time-‐to-‐code
Maximal performance
Only have to implement a few patterns.
Users can implement lots of algorithms very quickly without a lot of code.
Write in single-‐threaded python, R, or equivalent.
Code parallel version by hand
Code against APIs that implement particular patterns. (MapReduce)
Figure out how to automatically scale up programs written in python, R, etc.
Disadvantage
Slow, can’t handle scale.
Lots of code, hard to get right (race conditions).
Can be hard to fit a problem into the given pattern, or it’s really inefficient.
Tough to implement. Can be hard to make optimal decisions without knowing what the computation will do.
![Page 9: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/9.jpg)
Business Vision:
Empower data scientists to run any computation on any data quickly and easily
by automating the engineering.
![Page 10: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/10.jpg)
Technical Vision:
Create a VM that can automatically scale any algorithm across a large number of machines
without direct supervision by the user.
![Page 11: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/11.jpg)
What is Ufora?
A data-‐processing platform that automatically parallelizes user programs and executes them across a cluster of machines.
![Page 12: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/12.jpg)
Use Cases:
•Data processing and cleaning•Large-‐scale machine learning•Modeling and simulation
![Page 13: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/13.jpg)
Design Goal:
Completely separate “what” from “how”
![Page 14: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/14.jpg)
Key Components
• Implicit parallelism at language level• JIT compilation•Fault Tolerance•Automatic co-‐location of data and compute
![Page 15: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/15.jpg)
What do we give up?
•Restrict mutability of data-‐structures•Restrict side-‐effects•Emphasize “functional” programming style•Some features of host languages won’t work
![Page 16: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/16.jpg)
def isPrime(p):x = 2while x*x <= p:
if p%x == 0:return 0
x = x + 1return 1
def filter(v,f):if len(v) == 0:
return []if len(v) == 1:
return v if f(v[0]) else []
mid = len(v)/2return filter(v[:mid],f) + \
filter(v[mid:],f)
print filter(range(100000000),isPrime)
Naturally parallel(divide and conquer)
A Nice Example
Naturally Sequential (because of the loop)
![Page 17: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/17.jpg)
CORE #1 CORE #2 CORE #3 CORE #4
0 – 25M 25M – 50M 50M – 75M 75M – 100M
100M Integers
0 – 50M 50M – 100M
filter(v, isPrime)
Splitting
Adaptive Parallelism
![Page 18: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/18.jpg)
Key Components
• Implicit parallelism• JIT compilation•Fault Tolerance•Automatic co-‐location of data and compute
![Page 19: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/19.jpg)
Our solution – react dynamically as the program runs
Watch running threads to see what blocks of data they’re accessing.
![Page 20: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/20.jpg)
Our solution – react dynamically as the program runs
Watch running threads to see what blocks of data they’re accessing.
Detect when blocks of data absolutely have to be on the same machine.
![Page 21: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/21.jpg)
Our solution – react dynamically as the program runs
Watch running threads to see what blocks of data they’re accessing.
Detect when blocks of data absolutely have to be on the same machine.
Build a statistical models of correlations between block accesses.
![Page 22: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/22.jpg)
Our solution – react dynamically as the program runs
Watch running threads to see what blocks of data they’re accessing.
Detect when blocks of data absolutely have to be on the same machine.
Build a statistical models of correlations between block accesses.
Place data to minimize expected future number of machine boundary crossings.
![Page 23: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/23.jpg)
A Simple Example
v = range(0, 2*10**9)
[0.0,1.0,2.0,3.0,4.0,…,1999999999.0]
v = range(0, 2*10**9)
User writes
To the user, ‘v’ is now a big contiguous array:
[0.0,1.0,2.0,3.0,4.0,…,1999999999.0]
![Page 24: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/24.jpg)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
What’s going on under the hood?
In this example, each block is 1 GB. This block contains the first
125,000,000 numbers.
v
“v” is actually a pointer to a bunch of blocks of data
v = range(0, 2*10**9)
![Page 25: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/25.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
What if the dataset is bigger than one machine can hold?
Machines w/ 8 GB RAM16 GB of data
![Page 26: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/26.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Simple case – just stripe data linearly
Place first 4 GB on machine 1
![Page 27: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/27.jpg)
What happens when we start using the data?
User writes
Now the computation wants to scan sequentially over the dataset
for x in v: # some complicated state machine
![Page 28: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/28.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Computation starts on Machine 1
When the computation exhausts the data on one machine, the runtime moves it to the next
for x in v: …
![Page 29: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/29.jpg)
But real access patterns are more complex!
User writes
Now the computation is looking at all pairs v[i] and v[i+10]
res = 0def f(x,y):
# some functionfor i in xrange(0, len(v)-‐10):
res = res + f(v[i], v[i+10])
![Page 30: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/30.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
But when the computation reaches the end of block 4, v[i] and v[i+10] aren’t on the same machine!
At first, everything is OK, since v[ix] and v[ix+10] are close to each other in the data
![Page 31: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/31.jpg)
When v[i+10] needs data on machine 2, the runtime has to move the
computation to the other machine
Block 4 on Machine 1
Block 5 on Machine 2v[i]
v[i+10]
![Page 32: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/32.jpg)
But then we increment ‘i’ and have to go back to the first machine.
Block 4 on Machine 1
Block 5 on Machine 2v[ix]
v[ix+10]
![Page 33: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/33.jpg)
Now the computation is alternating between accessing data on Machine 1
and Machine 2
Block 4 on Machine 1
Block 5 on Machine 2v[ix]
v[ix+10]
![Page 34: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/34.jpg)
Every time we have to move the computation, we’re hitting the network.
Block 4 on Machine 1
Block 5 on Machine 2v[ix]
v[ix+10]
This is really slow!
![Page 35: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/35.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Solution: Replicate blocks so that they overlap
5
9
13
Data can live on two different machines at the same time because its immutable!
![Page 36: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/36.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Solution: Replicate blocks so that they overlap
5
Now v[i] and v[i+10] are always available together on some machine
9
13
![Page 37: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/37.jpg)
What about a different access pattern?
Imagine the user writes
Now the computation will jump back and forth between the beginning and the end of the vector.
res = 0def f(x,y):
# some functionfor i in xrange(0, len(v)):
res = res + f(v[i], v[-‐i])
![Page 38: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/38.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
5
9
13
The computation will jump back and forth between the beginning and the end of the vector.
f(v[i], v[-‐i])
V[1]V[2]V[3]
V[-‐1]V[-‐2]V[-‐3]
![Page 39: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/39.jpg)
Machine 1
Machine 2
1
16
Machine 3
Machine 4
2 3 4
5 6 7 8
9 10 11 12
13 14 15
5
9
13
But these blocks are on separate machines, which we know is slow
![Page 40: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/40.jpg)
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
5
9
13
Under this layout, the computation will have to move between machines 2 billion times!
![Page 41: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/41.jpg)
Machine 1
Machine 2
1 2
3 4
5 6
7 8 9 10
11 12
14
15 16
Machine 3
Machine 4
13
But if we lay out the data out like this,we’ll only have to move a few times
![Page 42: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/42.jpg)
There are lots of different access patterns.
Each one has a different optimal layout.
Let’s look at a couple that show up in ML applications
![Page 43: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/43.jpg)
def dot(v1, v2):return sum(0, len(v1),
lambda i: v1[i] * v2[i])
Example: Simple Dot Product
![Page 44: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/44.jpg)
Example: Simple Dot Product
def dot(v1, v2):return sum(0, len(v1),
lambda i: v1[i] * v2[i])
M1
M2
M3
M4
![Page 45: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/45.jpg)
tExample: Linear Regressiondef range(a,b,f):
if a >= b:return []
if a+1 >= b:return [f(a)]
mid = (a+b)/2return range(a,mid,f) + range(mid,b,f)
def computeXtX(columns):return range(0, len(columns), lambda c1:
range(0, len(columns), lambda c2:dot(columns[c1], columns[c2])))
Basically computing a covariance matrix
![Page 46: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/46.jpg)
t
100
Each column is 1 GB
Example: Linear Regression
![Page 47: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/47.jpg)
t
100
Example: Linear Regression
![Page 48: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/48.jpg)
t
100
4950 Pairs
Example: Linear Regression
![Page 49: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/49.jpg)
t
100
4950 Pairs
Example: Linear Regression
M1
![Page 50: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/50.jpg)
t
100
4950 Pairs
Example: Linear Regression
M1
M2
M3
M4
![Page 51: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/51.jpg)
tExample: DecisionTree
• Tries to learn a function of Y given a set of X’s in N dimensions.
• Common algorithm for both GBM and random forest
• Complex parallelism pattern• Has both static data (that sits still) and
rapidly moving data
![Page 52: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/52.jpg)
tExample: DecisionTree
X0 > 10
X9 > 1.2 X33 > -‐5.0
X65 > .111 Y=20 Y=30
Y=0Y=10
Y=1
![Page 53: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/53.jpg)
tExample: DecisionTree
How do we pick split points?
![Page 54: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/54.jpg)
tExample: DecisionTree
X0
Y
Scan over the data for each column
![Page 55: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/55.jpg)
tExample: DecisionTree
X0
Y
Build a histogram
![Page 56: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/56.jpg)
tExample: DecisionTree
X0
Y
Pick the “X” point that maximizes separation of Y
![Page 57: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/57.jpg)
tExample: DecisionTree Pick the best rule over all the columns,
and divide the dataset in half
according to that rule
X9 > 1.2
![Page 58: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/58.jpg)
tExample: DecisionTreeAnd recurse!
X9 > 1.2
X0 > 0.5
X33 > 7.1
![Page 59: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/59.jpg)
tExample: DecisionTreeAnd compute
averages over Y over all the leaf datasets.
X9 > 1.2
X0 > 0.5
X33 > 7.1
Y = 20 Y = 30
![Page 60: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/60.jpg)
tExample: DecisionTreeAnd compute
averages over Y over all the leaf datasets.
X9 > 1.2
X0 > 0.5
X33 > 7.1
Y = 20 Y = 30
![Page 61: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/61.jpg)
tExample: DecisionTree
We don’t actually make a full copy of the data at each split.
Instead, we can track “Active Indices,” which contain the set of row indices in a given subset.
This is much smaller than the whole set.
![Page 62: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/62.jpg)
YT`
100
Example: DecisionTree
M1
M2
M3
M4
Active Indices
![Page 63: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/63.jpg)
t
100
Example: DecisionTree
M1
M2
M3
M4
Active Indices
Y
![Page 64: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/64.jpg)
t
100
Example: DecisionTree
M1
M2
M3
M4
Active Indices
Y
![Page 65: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/65.jpg)
t
100
Example: DecisionTree
M1
M2
M3
M4
Active Indices
Y
![Page 66: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/66.jpg)
How is this implemented?
Backend in C++Codegen using LLVM
Language bindings in Python
![Page 67: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/67.jpg)
Python Code
Ufora IR
FORA code
Threads
Data
![Page 68: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/68.jpg)
Ufora Worker
Thread Scheduler
![Page 69: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/69.jpg)
Ufora Worker
JIT compiler (LLVM)
Thread Scheduler
![Page 70: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/70.jpg)
Ufora Worker
JIT compiler (LLVM)
Thread Scheduler
![Page 71: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/71.jpg)
Ufora Worker
JIT compiler (LLVM)
Thread Scheduler
Data Scheduler
![Page 72: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/72.jpg)
Ufora Worker
JIT compiler (LLVM)
Thread Scheduler
Data Scheduler
![Page 73: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/73.jpg)
Ufora Worker
JIT compiler (LLVM)
Thread Scheduler
Data Scheduler Global Scheduler
![Page 74: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/74.jpg)
Data backplane to other Machines
Ufora Worker
JIT compiler (LLVM)
Thread Scheduler
Data Scheduler Global Scheduler
Local Disk
![Page 75: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/75.jpg)
![Page 76: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/76.jpg)
![Page 77: NYC ML Meetup Talk July2015 Final ML Meetup Talk - Ufora.pdf · Braxton McKee Founder, CEO Keith Kegley President EyalGoldwerger Exec. Chairman Ben Beinecke Head of Strategy Ronen](https://reader034.vdocument.in/reader034/viewer/2022052102/603c52b65877394cc63c157d/html5/thumbnails/77.jpg)
The End•We’re Hiring
•We give out trial licenses for compelling projects
•Contact me: [email protected]