![Page 1: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/1.jpg)
go.indeed.com/IndeedEngTalks
![Page 2: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/2.jpg)
Machine Learning at Indeed
Scaling Decision Trees
![Page 3: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/3.jpg)
Andrew HudsonCTO
![Page 4: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/4.jpg)
I help people get jobs.
![Page 5: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/5.jpg)
Indeed is aSearch Engine for Jobs
![Page 6: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/6.jpg)
![Page 7: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/7.jpg)
Which jobs to show?
![Page 8: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/8.jpg)
![Page 9: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/9.jpg)
18,749 jobs
![Page 10: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/10.jpg)
Which jobs to show?
Maximize job seeker’s chance to get the job
![Page 11: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/11.jpg)
Which jobs to show?
Maximize job seeker’s chance to get the job
● Will job seeker click on the job?● Is the job still available?● Will job seeker apply to the job?● Is job seeker qualified for the job?
![Page 12: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/12.jpg)
Which jobs to show?
Maximize job seeker’s chance to get the job
● Will job seeker click on the job?● Is the job still available?● Will job seeker apply to the job?● Is job seeker qualified for the job?
![Page 13: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/13.jpg)
How?
Log job seeker behavior
Analyze logs, what best explains why they clicked on some jobs and not on others?
May help predict future behavior
![Page 14: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/14.jpg)
How?
Log job seeker behavior
Analyze logs, what best explains why they clicked on some jobs and not on others?
May help predict future behavior
Supervised learning
![Page 15: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/15.jpg)
Supervised Learning Approaches
Neural networks Bayesian methods Decision trees
Genetic programming
Logistic model tree Nearest neighbor
Support Vector Machines
Random forests Boosting
Bagging Regression Ensemble methods
![Page 16: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/16.jpg)
Supervised Learning Approaches
Neural networks Bayesian methods Decision trees
Genetic programming
Logistic model tree Nearest neighbor
Support Vector Machines
Random forests Boosting
Bagging Regression Ensemble methods
![Page 17: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/17.jpg)
Supervised Learning Approaches
Decision trees
Genetic programming
Logistic model tree
Random forests Boosting
Bagging Ensemble methods
![Page 18: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/18.jpg)
Decision Trees
![Page 19: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/19.jpg)
What is a Decision Tree?
A tree like structure that presents a relevant sequence of questions which determine a path and ultimately some outcome or prediction
![Page 20: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/20.jpg)
I’m Thinking About Buying a Laptop
![Page 21: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/21.jpg)
I’m Thinking About Buying a Laptop
Is quality important?
![Page 22: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/22.jpg)
I’m Thinking About Buying a Laptop
ASUSIs quality important?NO
![Page 23: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/23.jpg)
I’m Thinking About Buying a LaptopASUS -or whatever woot hasIs quality important?
NO
![Page 24: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/24.jpg)
I’m Thinking About Buying a Laptop
YES
ASUS -or whatever woot has
NO
Want to run linux?
Is quality important?
![Page 25: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/25.jpg)
I’m Thinking About Buying a Laptop
MACBOOKWant to run linux?
YES
ASUS -or whatever woot hasIs quality important?
NO
NO
![Page 26: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/26.jpg)
YES
I’m Thinking About Buying a Laptop
LENOVO
MACBOOKWant to run linux?
YES
ASUS -or whatever woot hasIs quality important?
NO
NO
![Page 27: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/27.jpg)
I’m Thinking About Buying a Laptop
DELLIDGAF
SYSTEM76HELLYESYES
LENOVO
MACBOOKWant to run linux?
YES
ASUS -or whatever woot hasIs quality important?
NO
NO
![Page 28: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/28.jpg)
Benefits of Decision Trees
Algorithm relatively simple to understand and implement
Model produced also human understandable
![Page 29: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/29.jpg)
Decision Tree Learning
Programmatic creation of decision trees
![Page 30: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/30.jpg)
Decision Tree Learning
Given a set of documents, split it into two or more subsets that optimize some criteria
Repeat this process until a set can no longer be split
![Page 31: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/31.jpg)
Titanic Example
1309 passengers500 survivors38.2% survival rate
What best explains who survived?
![Page 32: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/32.jpg)
classclass of ticket; first, second or third
fsizefamily size; number of family members onboard
gendermale or female
What best explains who survived?
![Page 33: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/33.jpg)
![Page 34: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/34.jpg)
1309 passengers500 survivors
38.2% survival
![Page 35: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/35.jpg)
1309 passengers500 survivors
38.2% survival
class = 1
![Page 36: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/36.jpg)
1309 passengers500 survivors
38.2% survival
class = 1323 passengers
200 survivors61.9% survival
![Page 37: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/37.jpg)
class ≠ 1986 passengers
300 survivors30.4% survival
class = 1323 passengers
200 survivors61.9% survival
1309 passengers500 survivors
38.2% survival
![Page 38: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/38.jpg)
class ≠ 1986 passengers
300 survivors30.4% survival
class = 1323 passengers
200 survivors61.9% survival
Score = ?1309 passengers
500 survivors38.2% survival
![Page 39: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/39.jpg)
Score
conditional entropy
![Page 40: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/40.jpg)
Conditional Entropy as Score
lower conditional entropy↓
less uncertainty about prediction based on term
![Page 41: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/41.jpg)
class ≠ 1986 passengers
300 survivors30.4% survival
class = 1323 passengers
200 survivors61.9% survival
Score = 0.62671309 passengers
500 survivors38.2% survival
![Page 42: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/42.jpg)
class ≠ 1986 passengers
300 survivors30.4% survival
class = 1323 passengers
200 survivors61.9% survival
Score = 0.6267 Best Score:0.6267, class = 1
1309 passengers500 survivors
38.2% survival
![Page 43: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/43.jpg)
class = 1
Best Score:0.6267, class = 1
1309 passengers500 survivors
38.2% survival
![Page 44: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/44.jpg)
Best Score:0.6267, class = 1
1309 passengers500 survivors
38.2% survival
class ≤ 2
![Page 45: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/45.jpg)
Best Score:0.6267, class = 1
1309 passengers500 survivors
38.2% survival
class ≤ 2600 passengers
319 survivors53.2% survival
![Page 46: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/46.jpg)
Best Score:0.6267, class = 1
1309 passengers500 survivors
38.2% survival
class ≤ 2600 passengers
319 survivors53.2% survival
class > 2709 passengers
181 survivors25.5% survival
![Page 47: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/47.jpg)
Best Score:0.6267, class = 1
1309 passengers500 survivors
38.2% survival
class ≤ 2600 passengers
319 survivors53.2% survival
class > 2709 passengers
181 survivors25.5% survival
Score = 0.6244
![Page 48: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/48.jpg)
Best Score:0.6244, class ≤ 2
1309 passengers500 survivors
38.2% survival
class ≤ 2600 passengers
319 survivors53.2% survival
class > 2709 passengers
181 survivors25.5% survival
Score = 0.6244
![Page 49: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/49.jpg)
Best Score:0.6244, class ≤ 2
1309 passengers500 survivors
38.2% survival
class ≠ 3600 passengers
319 survivors53.2% survival
class = 3709 passengers
181 survivors25.5% survival
Score = 0.6244
![Page 50: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/50.jpg)
Best Score:0.6244, class ≤ 2
1309 passengers500 survivors
38.2% survival
gender = female
![Page 51: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/51.jpg)
Best Score:0.6244, class ≤ 2
1309 passengers500 survivors
38.2% survival
gender = female466 passengers
339 survivors72.7% survival
![Page 52: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/52.jpg)
Best Score:0.6244, class ≤ 2
1309 passengers500 survivors
38.2% survival
gender ≠ female843 passengers
161 survivors19.1% survival
gender = female466 passengers
339 survivors72.7% survival
![Page 53: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/53.jpg)
Best Score:0.6244, class ≤ 2
1309 passengers500 survivors
38.2% survival
gender ≠ female843 passengers
161 survivors19.1% survival
gender = female466 passengers
339 survivors72.7% survival
Score = 0.5525
![Page 54: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/54.jpg)
Best Score:0.5525, gender=f
1309 passengers500 survivors
38.2% survival
gender ≠ female843 passengers
161 survivors19.1% survival
gender = female466 passengers
339 survivors72.7% survival
Score = 0.5525
![Page 55: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/55.jpg)
Best Score:0.5525, gender=f
1309 passengers500 survivors
38.2% survival
fsize = 0790 passengers
239 survivors30.3% survival
Score = 0.6448
fsize ≠ 0519 passengers
261 survivors50.3% survival
![Page 56: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/56.jpg)
Best Score:0.5525, gender=f
![Page 57: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/57.jpg)
![Page 58: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/58.jpg)
![Page 59: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/59.jpg)
![Page 60: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/60.jpg)
![Page 61: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/61.jpg)
![Page 62: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/62.jpg)
![Page 63: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/63.jpg)
![Page 64: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/64.jpg)
![Page 65: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/65.jpg)
![Page 66: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/66.jpg)
19.1% survival
72.7% survival
![Page 67: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/67.jpg)
![Page 68: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/68.jpg)
![Page 69: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/69.jpg)
![Page 70: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/70.jpg)
![Page 71: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/71.jpg)
![Page 72: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/72.jpg)
![Page 73: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/73.jpg)
![Page 74: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/74.jpg)
gender=male843 passengers
161 survivors19.1% survival
![Page 75: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/75.jpg)
gender=male843 passengers
161 survivors19.1% survival
class = 1179 passengers
61 survivors34.1% survival
![Page 76: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/76.jpg)
gender=male843 passengers
161 survivors19.1% survival
class = 1179 passengers
61 survivors34.1% survival
class ≠ 1664 passengers
100 survivors15.1% survival
Score = 0.4700
![Page 77: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/77.jpg)
class = 1 class ≠ 1
![Page 78: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/78.jpg)
class = 1 class ≠ 1
![Page 79: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/79.jpg)
![Page 80: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/80.jpg)
![Page 81: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/81.jpg)
![Page 82: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/82.jpg)
![Page 83: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/83.jpg)
![Page 84: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/84.jpg)
15.1% survival
34.1% survival
![Page 85: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/85.jpg)
38.2%
![Page 86: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/86.jpg)
72.7%19.1%MALE
38.2%
FEMALE
![Page 87: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/87.jpg)
72.7%19.1%MALE
34.1%15.1%
38.2%
FEMALE
CLASS≠1 CLASS=1
![Page 88: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/88.jpg)
72.7%19.1%MALE
34.1%15.1%
13.1% 33.9%
38.2%
FEMALE
CLASS≠1 CLASS=1
FSIZE≠2 FSIZE=2
![Page 89: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/89.jpg)
72.7%19.1%MALE
34.1%15.1%
13.1% 33.9%
93.2%49.1%
38.2%
FEMALE
CLASS≠1 CLASS=1 CLASS>2 CLASS<=2
FSIZE≠2 FSIZE=2
![Page 90: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/90.jpg)
72.7%19.1%MALE
34.1%15.1%
13.1% 33.9%
93.2%49.1%
24.4% 54.9%
38.2%
FEMALE
CLASS≠1 CLASS=1 CLASS>2 CLASS<=2
FSIZE≠2 FSIZE=2 FSIZE>2 FSIZE<=2
![Page 91: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/91.jpg)
Predicting Click Probabilities
Passenger → Job ImpressionSurvived → Clicked on Job
For each candidate job, follow path through tree then take click through rate of terminal node
![Page 92: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/92.jpg)
sales account manager
representative manager associate
outside service inside
YES
2.9%
YES
4.4%
YES
2.1%
NO
NO
YES
2.9%
NO
YES
1.8%
NO
1.9%
NO
2.6%
NO NO NO
4.6%
NO
3.8%
YES
YES
5.1%
YES
YES
Simplified Decision Tree for query="sales"
![Page 93: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/93.jpg)
sales
YES
representative
YES
outsideNO
serviceNO
insideNO
4.6%
job title = “sales representative”
YES YES
2.9%
YES
4.4%5.1%
NO
YES
3.8%
YES
2.1%
accountNO
manager
NO
YES
2.9%
managerNO
YES
1.8%
associate
NO
1.9%
NO
2.6%
![Page 94: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/94.jpg)
job title = “account executive”
YES YES
2.9%
YES
4.4%5.1%
NO
YES
3.8%
YES
2.1%
accountNO
manager
NO
YES
2.9%
managerNO
YES
1.8%
associate
NO
1.9%
NO
2.6%
sales
3.8%
account
NO NO NO
YES
representative
YES
outside service inside 4.6%
![Page 95: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/95.jpg)
YES
2.9%
YES
4.4%
YES
2.1%
NO
manager
NO
YES
2.9%
managerNO
YES
1.8%
associate
NO
1.9%
NO
2.6%
NO NO NO
service inside 4.6%
salesNO
representative
outside
5.1%
job title = “outside sales representative”
account
3.8%YES
YES
YES
YES
![Page 96: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/96.jpg)
YES
manager
1.8%
associate
job title = “sales associate”
YES
2.1%
NO
managerNO
1.9%NO
account
3.8%
YES
YES YES
2.9%
YES
4.4%5.1%
NO NO NO
outside service inside 4.6%
sales
representative
YES
NO
2.6%
YES
2.9%
NO NO
YES
![Page 97: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/97.jpg)
sales
representative
outside service
4.4%
inside
job title = “inside sales representative”
YES
2.1%
NO
manager
NO
YES
2.9%
managerNO
YES
1.8%
associate
NO
1.9%
NO
2.6%
NO
account
3.8%
YES
YES YES
2.9%5.1%
4.6%
YES
NO NO NO
YES
YES
![Page 98: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/98.jpg)
NO
2.9%
manager
job title = “sales manager”
YES
2.1%
NO
managerNO
1.9%NO
account
3.8%
YES
YES YES
2.9%
YES
4.4%5.1%
NO NO NO
outside service inside 4.6%
YES
sales
representative
YES
YES
1.8%
associateNO
2.6%NO
YES
![Page 99: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/99.jpg)
job title = “sales consultant”
YES YES
2.9%
YES
4.4%5.1%
NO NO NO
outside service inside 4.6%
YES
YES
2.9%
YES
1.8%
YES
2.1%
NO
managerNO
1.9%NO
account
3.8%
YES
manager associate
sales
representative
YES
NO
2.6%NO NO
![Page 100: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/100.jpg)
sales account manager
job title = “store manager”
YES YES
2.9%
YES
4.4%5.1%
NO
YES
2.9%
managerNO
YES
1.8%
associateNO
2.6%
NO NO NO
YES
representative
YES
outside service inside 4.6%
NO
1.9%
YES
2.1%3.8%
YES
NONO
![Page 101: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/101.jpg)
2.9%
job title = “service sales representative”
YES
2.1%
NO
manager
NO
YES
2.9%
managerNO
YES
1.8%
associate
NO
1.9%
NO
2.6%
NO
account
3.8%
YES
YES
4.4%
NO
inside 4.6%NO
YES
5.1%
sales
representative
outside service
YES
NO
YES
YES
![Page 102: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/102.jpg)
job title = “customer service representative”
YES YES
2.9%
YES
4.4%5.1%
NO
YES
2.9%
managerNO
YES
1.8%
associateNO
2.6%
NO NO NO
representative
YES
outside service inside 4.6%
YES
sales accountNONO
managerNO
1.9%
YES
2.1%3.8%
YES
![Page 103: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/103.jpg)
Final CTR Predictions
5.1% outside sales representative4.6% sales representative4.4% inside sales representative3.8% account executive2.9% sales manager2.9% service sales representative2.6% sales consultant2.1% store manager1.9% customer service representative1.8% sales associate
![Page 104: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/104.jpg)
Single Machine Implementation
![Page 105: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/105.jpg)
Overview
![Page 106: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/106.jpg)
Tree Building Strategies
One node at a time- depth first- breadth first
![Page 107: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/107.jpg)
1
Depth First
![Page 108: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/108.jpg)
1
2 3
Depth First
![Page 109: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/109.jpg)
1
3
Depth First
2
4 5
![Page 110: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/110.jpg)
1
3
Depth First
2
5
6 7
4
![Page 111: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/111.jpg)
1
3
Depth First
2
54
6 7
![Page 112: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/112.jpg)
1
3
Depth First
2
54
6 7
![Page 113: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/113.jpg)
1
3
Depth First
2
4
6 7
5
![Page 114: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/114.jpg)
1
Depth First
2
4
6 7
5
3
8 9
![Page 115: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/115.jpg)
1
Breadth First
![Page 116: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/116.jpg)
1
2 3
Breadth First
![Page 117: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/117.jpg)
1
3
Breadth First
2
4 5
![Page 118: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/118.jpg)
1
Breadth First
2
4 5
3
6 7
![Page 119: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/119.jpg)
1
Breadth First
2
5
3
6 7
8 9
4
![Page 120: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/120.jpg)
1
Breadth First
2 3
6 7
8 9
4 5
![Page 121: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/121.jpg)
1
Breadth First
2 3
8 9
4 5
10 11
6 7
![Page 122: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/122.jpg)
1
Breadth First
2 3
8 9
4 5
10 11
6
12 13
7
![Page 123: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/123.jpg)
Tree Building Strategies
One node at a time- depth first- breadth first
One layer at a time, all nodes simultaneous
![Page 124: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/124.jpg)
1
![Page 125: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/125.jpg)
1iter #1
![Page 126: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/126.jpg)
1
2 3
iter #1
![Page 127: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/127.jpg)
1
2 3
iter #1
iter #2
![Page 128: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/128.jpg)
1
2
4
3
5 6 7
iter #1
iter #2
![Page 129: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/129.jpg)
iter #3
1
2 3
5 6 7
iter #1
iter #2
4
![Page 130: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/130.jpg)
8 9 0 10 11 12 13
iter #3
1
2
4
3
5 6 7
iter #1
iter #2
![Page 131: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/131.jpg)
8 9 0 10 11 12 13
iter #3
1
2
4
3
5 6 7
iter #1
iter #2
![Page 132: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/132.jpg)
iter #48 9 0 10 11 12 13
iter #3
1
2
4
3
5 6 7
iter #1
iter #2
![Page 133: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/133.jpg)
8 9 0 10 11 12 13
iter #3
1
2
4
3
5 6 7
iter #1
iter #2
![Page 134: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/134.jpg)
Data Formatid class fsize gender survived
0 1 0 f 1
1 1 3 m 1
2 1 3 f 0
3 1 3 m 0
4 1 3 f 0
5 1 0 m 1
6 1 1 f 1
7 1 0 m 0
8 1 2 f 1
9 1 0 m 0
id class fsize gender survived
10 1 1 m 0
11 1 1 f 1
12 1 0 f 1
13 1 0 f 1
14 1 0 m 1
15 1 0 m 0
16 1 1 m 0
17 1 1 f 1
18 1 0 f 1
19 1 0 m 0
….
![Page 135: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/135.jpg)
Data Format
Create an inverted index
Key to efficiently building one layer at a time
![Page 136: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/136.jpg)
Inverted Index
Maps terms to the list of documents that contain that term
Terms and docs stored in sorted order
![Page 137: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/137.jpg)
Inverted Index
class=1 → 0,1,2,3,4,5,6,7,8,9,10,11,12,13….class=2 → 323,324,325,326,327,328,329….class=3 → 600,601,602,603,604,605,606….
![Page 138: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/138.jpg)
Inverted Index
class=1 → 0,1,2,3,4,5,6,7,8,9,10,11,12,13….class=2 → 323,324,325,326,327,328,329….class=3 → 600,601,602,603,604,605,606….
Field
![Page 139: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/139.jpg)
Inverted Index
class=1 → 0,1,2,3,4,5,6,7,8,9,10,11,12,13….class=2 → 323,324,325,326,327,328,329….class=3 → 600,601,602,603,604,605,606….
Term
![Page 140: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/140.jpg)
Inverted Index
class=1 → 0,1,2,3,4,5,6,7,8,9,10,11,12,13….class=2 → 323,324,325,326,327,328,329….class=3 → 600,601,602,603,604,605,606….
Docs
![Page 141: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/141.jpg)
Inverted Index
class=1 → 0,1,2,3,4,5,6,7,8,9,10,11,12,13….class=2 → 323,324,325,326,327,328,329….class=3 → 600,601,602,603,604,605,606….
Docs
![Page 142: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/142.jpg)
Inverted Index
class=1 → 0,1,2,3,4,5,6,7,8,9,10,11,12,13….class=2 → 323,324,325,326,327,328,329….class=3 → 600,601,602,603,604,605,606….
Docs
![Page 143: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/143.jpg)
Inverted Index
fsize=0 → 0,5,7,9,12,13,14,15,18,19,22….fsize=1 → 6,10,11,16,17,26,27,36,49,50….fsize=2 → 8,20,21,42,76,77,78,79,81,82….fsize=3 → 1,2,3,4,54,55,56,57,90,339….fsize=4 → 249,250,251,252,253,449,806….….
![Page 144: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/144.jpg)
Inverted Index
gender=f → 0,2,4,6,8,11,12,13,17,18,21….gender=m → 1,3,5,7,9,10,14,15,16,19,20….
![Page 145: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/145.jpg)
Inverted Index
survived=0 → 2,3,4,7,9,10,15,16,19,25….survived=1 → 0,1,5,6,8,11,12,13,14,17….
![Page 146: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/146.jpg)
Inverted Index Implementations
Lucene
Flamdex
![Page 147: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/147.jpg)
Primary Lookup Tables
groups[doc]Where in the tree each doc isInitialized to all ones, all docs start in root
values[doc]Value to be classified, for each docIn this case it’s 1 if survived, 0 otherwise
![Page 148: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/148.jpg)
Primary Lookup Tables
values[doc]
Constructed from an inverted index of the values
Invert the field of interest (e.g. survived)
![Page 149: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/149.jpg)
Main Loop Overview
foreach fieldforeach term
get group statsevaluate splits
apply best splitsrepeat n times or until no more splits found
![Page 150: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/150.jpg)
Main Loop - First Iteration
foreach field (class, fsize, gender)
![Page 151: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/151.jpg)
Main Loop - First Iteration
foreach field (class, fsize, gender)foreach term (class=1,class=2,class=3...)
![Page 152: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/152.jpg)
Main Loop - First Iteration
foreach field (class, fsize, gender)foreach term (class=1,class=2,class=3...)
get group stats
![Page 153: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/153.jpg)
Get Group Stats
count[grp]Count of how many documents within that group contain current term, initialized to zeros
vsum[grp]Summation of the value to be classified from the documents within that group that contain current term, initialized to zeros
![Page 154: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/154.jpg)
Get Group Stats
for current field/term
![Page 155: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/155.jpg)
Get Group Stats
for current field/termforeach doc
![Page 156: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/156.jpg)
Get Group Stats
for current field/termforeach doc
grp = grps[doc]
![Page 157: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/157.jpg)
Get Group Stats
for current field/termforeach doc
grp = grps[doc]if grp == 0 skip
![Page 158: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/158.jpg)
Get Group Stats
for current field/termforeach doc
grp = grps[doc]if grp == 0 skipcount[grp]++vsum[grp] += vals[doc]
![Page 159: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/159.jpg)
Get Group Stats
for current field/term (class=1)
![Page 160: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/160.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
![Page 161: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/161.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)
![Page 162: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/162.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skip
![Page 163: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/163.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
![Page 164: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/164.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 0, vsum[1] = 0
![Page 165: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/165.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 1, vsum[1] = 1
![Page 166: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/166.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 2, vsum[1] = 2
![Page 167: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/167.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 3, vsum[1] = 2
![Page 168: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/168.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 4, vsum[1] = 2
![Page 169: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/169.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 5, vsum[1] = 2
![Page 170: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/170.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 6, vsum[1] = 3
![Page 171: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/171.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[1] = 323, vsum[1] = 200
![Page 172: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/172.jpg)
1309 passengers500 survivors
38.2% survival
class = 1
![Page 173: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/173.jpg)
1309 passengers500 survivors
38.2% survival
class = 1
Group 1
![Page 174: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/174.jpg)
1309 passengers500 survivors
38.2% survival
class = 1
Group 1
![Page 175: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/175.jpg)
1309 passengers500 survivors
38.2% survival
class = 1323 passengers count[1]
Group 1
![Page 176: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/176.jpg)
1309 passengers500 survivors
38.2% survival
class = 1323 passengers
200 survivors
count[1]
vsum[1]
Group 1
![Page 177: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/177.jpg)
Get Group Stats
for current field/term (class=2)foreach doc (323,324,325,326,327,328,329...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (0,1,0,0,0,0,1,0,1…)
…count[1] = 277, vsum[1] = 119
![Page 178: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/178.jpg)
Get Group Stats
for current field/term (class=3)foreach doc (600,601,602,603,604,605,606...)
grp = grps[doc] (1,1,1,1,1,1,1,1,1…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (0,0,0,1,1,1,1,1,0…)
…count[1] = 709, vsum[1] = 181
![Page 179: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/179.jpg)
Main Loop - First Iteration
foreach field (class, fsize, gender)foreach term (class=1,class=2,class=3...)
get group statsevaluate splits
![Page 180: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/180.jpg)
Evaluate Splits
Consider current field/term as a potential split for each group
1) check if split is admissiblebalance check, significance check
2) score the splitconditional entropy or some other heuristic
3) keep best scoring split
![Page 181: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/181.jpg)
Evaluate Splits
totalcount[group] / totalvalue[group]Total number of documents and total values for each group, i.e. # passengers / # survivors
bestsplit[group] / bestscore[group]Current best split and score for each group, initially nulls
![Page 182: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/182.jpg)
foreach field/term (class=1)get group stats (count[1]=323,vsum[1]=200)foreach group
if not admissible( … ) skipscore = calcscore(cnt[grp], vsum[grp],
totcnt[grp], totval[grp])if score < bestscore[grp]
bestscore[grp] = scorebestsplit[grp] = field/term
![Page 183: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/183.jpg)
foreach field/term (class=1)get group stats (count[1]=323,vsum[1]=200)foreach group
if not admissible( … ) skipscore = calcscore(cnt[grp], vsum[grp],
totcnt[grp], totval[grp])if score < bestscore[grp]
bestscore[grp] = scorebestsplit[grp] = field/term
![Page 184: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/184.jpg)
Main Loop - First Iteration
foreach field (class, fsize, gender)foreach term (class=1,class=2,class=3...)
get group statsevaluate splits
apply best splits (bestsplit[1]=“gender=f”)
![Page 185: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/185.jpg)
Apply Best Splits
Each split is a combination of a target group, a condition, a positive destination group, and a negative destination group
![Page 186: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/186.jpg)
Apply Best Splits
Each split is a combination of a target group, a condition, a positive destination group, and a negative destination group
target group: 1 1
![Page 187: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/187.jpg)
Apply Best Splits
Each split is a combination of a target group, a condition, a positive destination group, and a negative destination group
target group: 1condition: gender=female
1
![Page 188: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/188.jpg)
Apply Best Splits
Each split is a combination of a target group, a condition, a positive destination group, and a negative destination group
target group: 1condition: gender=femalepositive group: 3
3
1
![Page 189: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/189.jpg)
Apply Best Splits
Each split is a combination of a target group, a condition, a positive destination group, and a negative destination group
target group: 1condition: gender=femalepositive group: 3negative group: 2 2 3
1
![Page 190: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/190.jpg)
Apply Best Splits
Each split is a combination of a target group, a condition, a positive destination group, and a negative destination group
target group: 1condition: gender=femalepositive group: 3negative group: 2 2 3
1
![Page 191: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/191.jpg)
Apply Best Splits
Using inverted index, iterate over docs that match split condition
If current document is in targeted group, move it to the positive group
At the end, move anything left in target group to negative group
![Page 192: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/192.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 1 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 1 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 1 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 193: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/193.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 1 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 1 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 1 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 194: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/194.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 1 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 1 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 1 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 195: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/195.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 1 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 1 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 1 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 196: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/196.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 1 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 1 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 197: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/197.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 3 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 1 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 198: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/198.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 3 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 3 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 199: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/199.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 1 group[15] = 1group[2] = 3 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 1group[4] = 3 group[11] = 1 group[18] = 1group[5] = 1 group[12] = 1 group[19] = 1group[6] = 1 group[13] = 1 group[20] = 1
![Page 200: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/200.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 3 group[15] = 1group[2] = 3 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 3group[4] = 3 group[11] = 3 group[18] = 3group[5] = 1 group[12] = 3 group[19] = 1group[6] = 3 group[13] = 3 group[20] = 1
![Page 201: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/201.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 3 group[15] = 1group[2] = 3 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 3group[4] = 3 group[11] = 3 group[18] = 3group[5] = 1 group[12] = 3 group[19] = 1group[6] = 3 group[13] = 3 group[20] = 1
![Page 202: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/202.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 1 group[14] = 1group[1] = 1 group[8] = 3 group[15] = 1group[2] = 3 group[9] = 1 group[16] = 1group[3] = 1 group[10] = 1 group[17] = 3group[4] = 3 group[11] = 3 group[18] = 3group[5] = 1 group[12] = 3 group[19] = 1group[6] = 3 group[13] = 3 group[20] = 1
![Page 203: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/203.jpg)
Apply Best Splits
gender=f -> 0,2,4,6,8,11,12,13,17,18,21,23….
group[0] = 3 group[7] = 2 group[14] = 2group[1] = 2 group[8] = 3 group[15] = 2group[2] = 3 group[9] = 2 group[16] = 2group[3] = 2 group[10] = 2 group[17] = 3group[4] = 3 group[11] = 3 group[18] = 3group[5] = 2 group[12] = 3 group[19] = 2group[6] = 3 group[13] = 3 group[20] = 2
![Page 204: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/204.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splitsrepeat n times or until no more splits found
![Page 205: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/205.jpg)
1
![Page 206: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/206.jpg)
iter #1
1
![Page 207: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/207.jpg)
iter #1
gender = female
1
![Page 208: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/208.jpg)
2 3
iter #1
gender = femalegender ≠ female
1
![Page 209: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/209.jpg)
iter #1
iter #22 3
iter #1
1
![Page 210: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/210.jpg)
Main Loop - Second Iteration
foreach field (class, fsize, gender)foreach term (class=1,class=2,class=3...)
get group stats
![Page 211: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/211.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (3,2,3,2,3,2,3,2,3…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
![Page 212: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/212.jpg)
Get Group Stats
for current field/term (class=1)foreach doc (0,1,2,3,4,5,6,7,8...)
grp = grps[doc] (3,2,3,2,3,2,3,2,3…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,1,0,0,0,1,1,0,1…)
…count[2] = 179, vsum[2] = 61count[3] = 144, vsum[3] = 139
![Page 213: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/213.jpg)
Get Group Stats
for current field/term (class=2)foreach doc (323,324,325,326,327,328,329...)
grp = grps[doc] (2,3,2,2,2,2,3,2,2…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (0,1,0,0,0,0,1,0,1…)
…count[2] = 171, vsum[2] = 25count[3] = 106, vsum[3] = 94
![Page 214: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/214.jpg)
Get Group Stats
for current field/term (class=3)foreach doc (600,601,602,603,604,605,606...)
grp = grps[doc] (2,2,2,3,3,2,2,3,2…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (0,0,0,1,1,1,1,1,0…)
…count[2] = 493, vsum[2] = 75count[3] = 216, vsum[3] = 106
![Page 215: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/215.jpg)
Get Group Stats
for current field/term (gender=female)foreach doc (0,2,4,6,8,11,12,13,17,18,21,23….)
grp = grps[doc] (3,3,3,3,3,3,3,3,3,3,3,3…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,0,0,1,1,1,1,1,1…)
…count[2] = 0, vsum[2] = 0count[3] = 467, vsum[3] = 339
![Page 216: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/216.jpg)
Get Group Stats
for current field/term (gender=male)foreach doc (1,3,5,7,9,10,14,15,16,19,20,22...)
grp = grps[doc] (2,2,2,2,2,2,2,2,2,2,2…)if grp == 0 skipcount[grp]++vsum[grp] += vals[doc] (1,0,1,0,0,0,1,0,0…)
…count[2] = 844, vsum[2] = 161count[3] = 0, vsum[3] = 0
![Page 217: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/217.jpg)
What AboutInequality Splits?
e.g. class ≤ 2
![Page 218: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/218.jpg)
Main Loop + Inequality Splits
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 219: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/219.jpg)
Main Loop + Inequality Splits
foreach fieldreset inequality statsforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 220: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/220.jpg)
Main Loop + Inequality Splits
foreach fieldreset inequality statsforeach term
get group statsupdate inequality statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 221: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/221.jpg)
Main Loop + Inequality Splits
foreach fieldreset inequality statsforeach term
get group statsupdate inequality statsevaluate splitsevaluate inequality splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 222: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/222.jpg)
Scalability
Performs quite well on a single machine
Worked well for a while, but started to hit limits
Ultimately needed to distribute to multiple machines
![Page 223: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/223.jpg)
Multiple Machine Implementation
![Page 224: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/224.jpg)
Hadoop?
![Page 225: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/225.jpg)
Hadoop
Experimented with using Hadoop
Each level took five sequential map reduce jobs
Much slower than single machine; repeatedly writes intermediate data and lots of shuffling
![Page 226: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/226.jpg)
Hadoop
Experimented with using Hadoop
Each level took five sequential map reduce jobs
Much slower than single machine; repeatedly writes intermediate data and lots of shuffling
Hadoop not great for iterative algorithms
![Page 227: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/227.jpg)
Partition Data
![Page 228: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/228.jpg)
![Page 229: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/229.jpg)
Inverted Index
![Page 230: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/230.jpg)
Inverted Index
![Page 231: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/231.jpg)
Inverted Index
![Page 232: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/232.jpg)
Inverted Index
Shard 1 Shard 2
![Page 233: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/233.jpg)
Shard 1 Shard 2
Machine 1 Machine 2
![Page 234: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/234.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 235: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/235.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 236: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/236.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
FTGS
![Page 237: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/237.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
FTGS
![Page 238: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/238.jpg)
Main Loop
foreach field
foreach termget group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
FTGS
![Page 239: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/239.jpg)
Main Loop
foreach field
foreach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
FTGS
![Page 240: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/240.jpg)
FTGS Stream - Single Machine
class=1|1|323|200class=2|1|277|119
class=3|1|709|181fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
![Page 241: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/241.jpg)
class=1|1|323|200class=2|1|277|119
class=3|1|709|181fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 242: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/242.jpg)
class=1|1|323|200class=2|1|277|119
class=3|1|709|181fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 243: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/243.jpg)
class=1|1|323|200class=2|1|277|119
class=3|1|709|181fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 244: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/244.jpg)
class=1|1|323|200class=2|1|277|119
class=3|1|709|181fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
Sorted
FTGS Stream - Single Machine
![Page 245: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/245.jpg)
class=1|1|323|200class=2|1|277|119
class=3|1|709|181fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 246: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/246.jpg)
class=1|1|323|200
class=2|1|277|119class=3|1|709|181
fsize=0|1|790|239fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 247: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/247.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181fsize=0|1|790|239
fsize=1|1|235|126fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 248: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/248.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239fsize=1|1|235|126
fsize=2|1|159|90fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 249: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/249.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126fsize=2|1|159|90
fsize=3|1|43|30fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 250: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/250.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90fsize=3|1|43|30
fsize=4|1|22|6fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 251: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/251.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30fsize=4|1|22|6
fsize=5|1|25|5fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 252: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/252.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6fsize=5|1|25|5
fsize=6|1|16|4fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 253: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/253.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5fsize=6|1|16|4
fsize=7|1|8|0fsize=10|1|11|0gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 254: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/254.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4fsize=7|1|8|0fsize=10|1|11|0gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 255: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/255.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0fsize=10|1|11|0gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 256: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/256.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 257: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/257.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339gender=m|1|843|161
FTGS Stream - Single Machine
![Page 258: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/258.jpg)
class=1|1|323|200
class=2|1|277|119
class=3|1|709|181
fsize=0|1|790|239
fsize=1|1|235|126
fsize=2|1|159|90
fsize=3|1|43|30
fsize=4|1|22|6
fsize=5|1|25|5
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|466|339
gender=m|1|843|161
FTGS Stream - Single Machine
![Page 259: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/259.jpg)
FTGS Stream
How to distribute?
![Page 260: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/260.jpg)
Shard 1 Shard 2
Machine 1 Machine 2
![Page 261: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/261.jpg)
Shard 1 Shard 2
FTGS 1
Machine 2
![Page 262: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/262.jpg)
Shard 1 Shard 2
FTGS 1 FTGS 2
![Page 263: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/263.jpg)
Shard 1 Shard 2
FTGS 1 FTGS 2
Machine 3
![Page 264: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/264.jpg)
Shard 1 Shard 2
FTGS 1 FTGS 2Merge
Machine 3
![Page 265: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/265.jpg)
FTGS Stream Merge
class=1|1|198|111class=2|1|277|119class=3|1|511|129fsize=0|1|790|239fsize=1|1|94|53
fsize=2|1|75|48
fsize=3|1|21|17
fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
Machine 1
![Page 266: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/266.jpg)
FTGS Stream Merge
class=1|1|125|89class=3|1|198|52
fsize=1|1|141|73fsize=2|1|84|42
fsize=3|1|22|13
fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
Machine 2
![Page 267: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/267.jpg)
FTGS Stream Merge
class=1|1|198|111class=2|1|277|119class=3|1|511|129fsize=0|1|790|239fsize=1|1|94|53
fsize=2|1|75|48
fsize=3|1|21|17
fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89class=3|1|198|52
fsize=1|1|141|73fsize=2|1|84|42
fsize=3|1|22|13
fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
Machine 1 Machine 2
![Page 268: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/268.jpg)
FTGS Stream Merge
class=1|1|198|111class=2|1|277|119class=3|1|511|129fsize=0|1|790|239fsize=1|1|94|53
fsize=2|1|75|48
fsize=3|1|21|17
fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89class=3|1|198|52
fsize=1|1|141|73fsize=2|1|84|42
fsize=3|1|22|13
fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
class=1|1|323|200
+
Machine 1 Machine 2
![Page 269: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/269.jpg)
FTGS Stream Merge
class=1|1|198|111
class=2|1|277|119class=3|1|511|129fsize=0|1|790|239fsize=1|1|94|53fsize=2|1|75|48
fsize=3|1|21|17
fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52fsize=1|1|141|73
fsize=2|1|84|42fsize=3|1|22|13
fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
class=1|1|323|200
Machine 1 Machine 2
![Page 270: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/270.jpg)
FTGS Stream Merge
class=1|1|198|111
class=2|1|277|119class=3|1|511|129fsize=0|1|790|239fsize=1|1|94|53fsize=2|1|75|48
fsize=3|1|21|17
fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52fsize=1|1|141|73
fsize=2|1|84|42fsize=3|1|22|13
fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
class=2|1|277|119class=1|1|323|200
Machine 1 Machine 2
![Page 271: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/271.jpg)
FTGS Stream Mergeclass=1|1|198|111
class=2|1|277|119
class=3|1|511|129fsize=0|1|790|239fsize=1|1|94|53fsize=2|1|75|48fsize=3|1|21|17
fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52fsize=1|1|141|73
fsize=2|1|84|42fsize=3|1|22|13
fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
class=2|1|277|119class=1|1|323|200
Machine 1 Machine 2
![Page 272: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/272.jpg)
FTGS Stream Mergeclass=1|1|198|111
class=2|1|277|119
class=3|1|511|129fsize=0|1|790|239fsize=1|1|94|53fsize=2|1|75|48fsize=3|1|21|17
fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52fsize=1|1|141|73
fsize=2|1|84|42fsize=3|1|22|13
fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
class=3|1|709|181class=2|1|277|119
class=1|1|323|200
+
Machine 1 Machine 2
![Page 273: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/273.jpg)
FTGS Stream Mergeclass=1|1|198|111
class=2|1|277|119
class=3|1|511|129
fsize=0|1|790|239fsize=1|1|94|53fsize=2|1|75|48fsize=3|1|21|17fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52
fsize=1|1|141|73fsize=2|1|84|42
fsize=3|1|22|13fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39
class=3|1|709|181class=2|1|277|119
class=1|1|323|200
Machine 1 Machine 2
![Page 274: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/274.jpg)
FTGS Stream Mergeclass=1|1|198|111
class=2|1|277|119
class=3|1|511|129
fsize=0|1|790|239fsize=1|1|94|53fsize=2|1|75|48fsize=3|1|21|17fsize=4|1|3|1
fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52
fsize=1|1|141|73fsize=2|1|84|42
fsize=3|1|22|13fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39fsize=0|1|790|239
class=3|1|709|181class=2|1|277|119
class=1|1|323|200Machine 1 Machine 2
![Page 275: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/275.jpg)
FTGS Stream Mergeclass=1|1|198|111
class=2|1|277|119
class=3|1|511|129
fsize=0|1|790|239
fsize=1|1|94|53fsize=2|1|75|48fsize=3|1|21|17fsize=4|1|3|1fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52
fsize=1|1|141|73fsize=2|1|84|42
fsize=3|1|22|13fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39fsize=0|1|790|239class=3|1|709|181
class=2|1|277|119class=1|1|323|200
Machine 1 Machine 2
![Page 276: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/276.jpg)
FTGS Stream Mergeclass=1|1|198|111
class=2|1|277|119
class=3|1|511|129
fsize=0|1|790|239
fsize=1|1|94|53fsize=2|1|75|48fsize=3|1|21|17fsize=4|1|3|1fsize=5|1|3|1
gender=f|1|308|237
gender=m|1|678|122
class=1|1|125|89
class=3|1|198|52
fsize=1|1|141|73fsize=2|1|84|42
fsize=3|1|22|13fsize=4|1|19|5
fsize=5|1|22|4
fsize=6|1|16|4
fsize=7|1|8|0
fsize=10|1|11|0
gender=f|1|158|102
gender=m|1|165|39fsize=1|1|235|126fsize=0|1|790|239
class=3|1|709|181class=2|1|277|119
class=1|1|323|200
+
Machine 1 Machine 2
![Page 277: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/277.jpg)
Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6
![Page 278: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/278.jpg)
FTGS 1 FTGS 2 FTGS 3 FTGS 4 FTGS 5 FTGS 6
![Page 279: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/279.jpg)
FTGS 1 FTGS 2 FTGS 3 FTGS 4 FTGS 5 FTGS 6
k-way merge
![Page 280: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/280.jpg)
FTGS 1-6
FTGS 1 FTGS 2 FTGS 3 FTGS 4 FTGS 5 FTGS 6
![Page 281: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/281.jpg)
FTGS 1-6 FTGS 7-12 FTGS 13-18
![Page 282: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/282.jpg)
FTGS 1-6 FTGS 7-12 FTGS 13-18
FTGS 1-18
![Page 283: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/283.jpg)
FTGS 1-18 FTGS 19-36
FTGS 1-36
![Page 284: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/284.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
FTGS
![Page 285: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/285.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 286: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/286.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
![Page 287: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/287.jpg)
Main Loop
foreach fieldforeach term
get group statsevaluate splits
apply best splits for each grouprepeat n times or until no more splits found
Regroup
![Page 288: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/288.jpg)
FTGS 1-6 FTGS 7-12 FTGS 13-18
FTGS
![Page 289: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/289.jpg)
Regroup 1-6 Regroup 7-12 Regroup 13-18
Regroup
![Page 290: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/290.jpg)
FTGS 1-6 FTGS 7-12 FTGS 13-18
FTGS
![Page 291: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/291.jpg)
Regroup 1-6 Regroup 7-12 Regroup 13-18
Regroup
![Page 292: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/292.jpg)
Imhotep
![Page 293: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/293.jpg)
Imhotep
Distributed System that does efficient FTGS and Regroup operations on inverted indexes
![Page 294: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/294.jpg)
Imhotep
32 machines
2 cpu x 6 core xeon westmere E5649128GB RAM10x1TB 7200 RPM SATA
Total:384 cores, 4TB RAM, 320TB disk
![Page 295: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/295.jpg)
Decision tree on 13 billion documents
Imhotep
![Page 296: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/296.jpg)
Decision tree on 13 billion documents330GB → ~25 bytes per doc
Imhotep
![Page 297: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/297.jpg)
Decision tree on 13 billion documents330GB → ~25 bytes per doc
First FTGS: 314 secondsFirst Regroup: 9.6 seconds
Imhotep
![Page 298: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/298.jpg)
Decision tree on 13 billion documents330GB → ~25 bytes per doc
First FTGS: 314 seconds (36.3 million terms)First Regroup: 9.6 seconds
Imhotep
![Page 299: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/299.jpg)
Decision tree on 13 billion documents330GB → ~25 bytes per doc
First FTGS: 314 seconds (36.3 million terms)First Regroup: 9.6 seconds (7 groups)
Imhotep
![Page 300: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/300.jpg)
Decision tree on 13 billion documents330GB → ~25 bytes per doc
First FTGS: 314 seconds (36.3 million terms)First Regroup: 9.6 seconds (7 groups)
Second FTGS: 57 secondsSecond Regroup: 23 seconds (217 groups)
Imhotep
![Page 301: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/301.jpg)
Imhotep
Distributed System that does efficient FTGS and Regroup operations
Powers our internal analytical tools
![Page 302: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/302.jpg)
Imhotep
Distributed System that does efficient FTGS and Regroup operations
Powers our internal analytical tools
… and more
![Page 303: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/303.jpg)
Imhotep - Next @IndeedEng Talk
Sharding and shard managementSession / FTGS network protocolMemory managementInverted IndexesFTGS MergeRegroup operationsFault Tolerance
![Page 304: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/304.jpg)
Conclusion
Now scales to larger and larger data sets by adding more machines
Increased freshness and frequency of builds
Decision trees have lots of tunable components, regularly get 1% wins via A/B test
![Page 305: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/305.jpg)
Continuous Improvement
Sponsored Job Click-through Rate (CTR)
![Page 306: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/306.jpg)
Thanks.
![Page 307: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/307.jpg)
Q & A
![Page 308: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/308.jpg)
More Questions?Jason David James Jeff
![Page 309: [@IndeedEng] Machine Learning at Indeed: Scaling Decision Trees](https://reader035.vdocument.in/reader035/viewer/2022081413/546e8e27b4af9fc8268b4700/html5/thumbnails/309.jpg)
Next @IndeedEng TalkImhotep: Large Scale Analytics
and Machine Learning at Indeed
Jeff Plaisance, Engineering ManagerMarch 26, 2014
http://engineering.indeed.com/talks