before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images...
TRANSCRIPT
![Page 1: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/1.jpg)
Before we begin, visit https://github.com/radanalyticsio/workshop to download images and code!
![Page 2: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/2.jpg)
Insightful Apps with Apache Spark and OpenShift
William Benton (@willb) Michael McCune (@FOSSJunkie)
![Page 3: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/3.jpg)
ForecastIntroducing insightful apps
Learning from data
Meet Apache Spark
Hands-on: data engineering and machine learning in Spark and building an insightful application in OpenShift
![Page 4: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/4.jpg)
PreliminariesMake sure you have OpenShift Origin installed (if you want to build an app) or at least Docker (if you just want to try out Apache Spark)
Pull all of the necessary images for the hands-on portion
Details here: https://github.com/radanalyticsio/workshop
![Page 5: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/5.jpg)
Introducing insightful apps
![Page 6: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/6.jpg)
Insightful applicationsInsightful applications collect and learn from data that users generate and provide in order to work better with longevity and popularity.
Almost every exciting or important contemporary app is insightful!
![Page 7: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/7.jpg)
![Page 8: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/8.jpg)
![Page 9: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/9.jpg)
![Page 10: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/10.jpg)
![Page 11: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/11.jpg)
![Page 12: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/12.jpg)
![Page 13: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/13.jpg)
Learning from data
![Page 14: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/14.jpg)
BASIC CONCEPTS
![Page 15: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/15.jpg)
def classify(bike): if bar_type(bike) == "flat": if tire_width(bike) > 80: return "winter bike" if tire_width(bike) > 50 or has_suspension(bike): return "mountain bike" if frame_type(bike) == "step-through": return "city bike" elif bar_type(bike) == "drop": if tire_width(bike) <= 27: return "road bike" if tire_type(bike) == "knobby": return "cyclocross bike" return "touring bike" return "unknown bike"
![Page 16: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/16.jpg)
def classify(bike): if bar_type(bike) == "flat": if tire_width(bike) > 80: return "winter bike" if tire_width(bike) > 50 or has_suspension(bike): return "mountain bike" if frame_type(bike) == "step-through": return "city bike" elif bar_type(bike) == "drop": if tire_width(bike) <= 27: return "road bike" if tire_type(bike) == "knobby": return "cyclocross bike" return "touring bike" return "unknown bike"
road bike
cyclocross bike
touring bike
mountain bike
![Page 17: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/17.jpg)
def classify(bike): if bar_type(bike) == "flat": if tire_width(bike) > 80: return "winter bike" if tire_width(bike) > 50 or has_suspension(bike): return "mountain bike" if frame_type(bike) == "step-through": return "city bike" elif bar_type(bike) == "drop": if tire_width(bike) <= 27: return "road bike" if tire_type(bike) == "knobby": return "cyclocross bike" return "touring bike" return "unknown bike"
road bike
cyclocross bike
touring bike
mountain bike
![Page 18: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/18.jpg)
def classify(bike): if bar_type(bike) == "flat": if tire_width(bike) > 80: return "winter bike" if tire_width(bike) > 50 or has_suspension(bike): return "mountain bike" if frame_type(bike) == "step-through": return "city bike" elif bar_type(bike) == "drop": if tire_width(bike) <= 27: return "road bike" if tire_type(bike) == "knobby": return "cyclocross bike" return "touring bike" return "unknown bike"
road bike
cyclocross bike
touring bike
mountain bike
triathlon bike
![Page 19: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/19.jpg)
Feature engineering
![Page 20: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/20.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
![Page 21: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/21.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
LABEL
![Page 22: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/22.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPELABEL
![Page 23: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/23.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATLABEL
![Page 24: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/24.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
LABEL
![Page 25: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/25.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
TIRE KNOBS
LABEL
![Page 26: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/26.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
LABEL
![Page 27: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/27.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
FRONT REARLABEL
![Page 28: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/28.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
FRONT REARLABEL
![Page 29: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/29.jpg)
Feature engineering
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
FRONT REARLABEL
cyclocross bike 1 0 33 1 0 0
![Page 30: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/30.jpg)
one-hot encoding
mountain bike 0 1 60 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
FRONT REARLABEL
cyclocross bike 1 0 33 1 0 0
(convert from a categorical feature with n values to an n-bit vector)
![Page 31: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/31.jpg)
value scaling
mountain bike 0 1 0.35 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
FRONT REARLABEL
cyclocross bike 1 0 0.13 1 0 0
(assuming that all tires are between 19mm and 130mm wide)
![Page 32: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/32.jpg)
Approximation techniques
![Page 33: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/33.jpg)
Approximation techniques
![Page 34: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/34.jpg)
Approximation techniques
![Page 35: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/35.jpg)
Approximation techniques
![Page 36: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/36.jpg)
Feature hashing
0 0 0 0 0 … 0 0 0 0 0
A a aa aal aalii zythem Zythia zythum Zyzomys Zyzzogeton
![Page 37: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/37.jpg)
Feature hashingdef hash_bucket(s): """ Assumes the existence of an external hash function. Returns a tuple of * a bucket (from 0-127, inclusive) and * a sign value (either +1 or -1). """ raw_hash = my_hash(s) & 0xFF sign = (raw_hash & 0x80) != 0 and -1 or 1 bucket = raw_hash & ~0x80 return (bucket, sign)
![Page 38: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/38.jpg)
"the" → (37, 1) "quick" → (121, -1) "brown" → (50, -1)
Feature hashingdef hash_bucket(s): """ Assumes the existence of an external hash function. Returns a tuple of * a bucket (from 0-127, inclusive) and * a sign value (either +1 or -1). """ raw_hash = my_hash(s) & 0xFF sign = (raw_hash & 0x80) != 0 and -1 or 1 bucket = raw_hash & ~0x80 return (bucket, sign)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
![Page 39: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/39.jpg)
"the" → (37, 1) "quick" → (121, -1) "brown" → (50, -1) "fox" → (71, 1)
Feature hashingdef hash_bucket(s): """ Assumes the existence of an external hash function. Returns a tuple of * a bucket (from 0-127, inclusive) and * a sign value (either +1 or -1). """ raw_hash = my_hash(s) & 0xFF sign = (raw_hash & 0x80) != 0 and -1 or 1 bucket = raw_hash & ~0x80 return (bucket, sign)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0
![Page 40: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/40.jpg)
"the" → (37, 1) "quick" → (121, -1) "brown" → (50, -1) "fox" → (71, 1) "jumps" → (39, 1)
Feature hashingdef hash_bucket(s): """ Assumes the existence of an external hash function. Returns a tuple of * a bucket (from 0-127, inclusive) and * a sign value (either +1 or -1). """ raw_hash = my_hash(s) & 0xFF sign = (raw_hash & 0x80) != 0 and -1 or 1 bucket = raw_hash & ~0x80 return (bucket, sign)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0
![Page 41: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/41.jpg)
"the" → (37, 1) "quick" → (121, -1) "brown" → (50, -1) "fox" → (71, 1) "jumps" → (39, 1) "over" → (100, -1) "the" → (37, 1)
Feature hashingdef hash_bucket(s): """ Assumes the existence of an external hash function. Returns a tuple of * a bucket (from 0-127, inclusive) and * a sign value (either +1 or -1). """ raw_hash = my_hash(s) & 0xFF sign = (raw_hash & 0x80) != 0 and -1 or 1 bucket = raw_hash & ~0x80 return (bucket, sign)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0
![Page 42: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/42.jpg)
"the" → (37, 1) "quick" → (121, -1) "brown" → (50, -1) "fox" → (71, 1) "jumps" → (39, 1) "over" → (100, -1) "the" → (37, 1) "lazy" → (120, -1) "dog" → (54, 1)
Feature hashingdef hash_bucket(s): """ Assumes the existence of an external hash function. Returns a tuple of * a bucket (from 0-127, inclusive) and * a sign value (either +1 or -1). """ raw_hash = my_hash(s) & 0xFF sign = (raw_hash & 0x80) != 0 and -1 or 1 bucket = raw_hash & ~0x80 return (bucket, sign)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0
0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0
![Page 43: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/43.jpg)
"the" → (37, 1) "quick" → (121, -1) "brown" → (50, -1) "fox" → (71, 1) "jumps" → (39, 1) "over" → (100, -1) "the" → (37, 1) "lazy" → (120, -1) "dog" → (54, 1)
Feature hashingdef hash_bucket(s): """ Assumes the existence of an external hash function. Returns a tuple of * a bucket (from 0-127, inclusive) and * a sign value (either +1 or -1). """ raw_hash = my_hash(s) & 0xFF sign = (raw_hash & 0x80) != 0 and -1 or 1 bucket = raw_hash & ~0x80 return (bucket, sign)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0
0 0 -1 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 -1 -1 0 0 0 0 0 0
![Page 44: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/44.jpg)
CLASSIFICATION
![Page 45: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/45.jpg)
![Page 46: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/46.jpg)
![Page 47: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/47.jpg)
![Page 48: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/48.jpg)
![Page 49: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/49.jpg)
![Page 50: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/50.jpg)
![Page 51: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/51.jpg)
![Page 52: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/52.jpg)
CLUSTERING
![Page 53: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/53.jpg)
![Page 54: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/54.jpg)
![Page 55: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/55.jpg)
RECOMMENDATION
![Page 56: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/56.jpg)
![Page 57: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/57.jpg)
?
![Page 58: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/58.jpg)
OUTLIER DETECTION
![Page 59: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/59.jpg)
![Page 60: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/60.jpg)
![Page 61: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/61.jpg)
map tiles by Stamen Design (CC-BY 3.0) • map data © OpenStreetMap
![Page 62: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/62.jpg)
map tiles by Stamen Design (CC-BY 3.0) • map data © OpenStreetMap
![Page 63: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/63.jpg)
map tiles by Stamen Design (CC-BY 3.0) • map data © OpenStreetMap
![Page 64: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/64.jpg)
map tiles by Stamen Design (CC-BY 3.0) • map data © OpenStreetMap
![Page 65: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/65.jpg)
map tiles by Stamen Design (CC-BY 3.0) • map data © OpenStreetMap
![Page 66: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/66.jpg)
UNDERSTANDING DATA WITH MANY DIMENSIONS
![Page 67: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/67.jpg)
![Page 68: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/68.jpg)
[4,7]
![Page 69: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/69.jpg)
[4,7]
![Page 70: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/70.jpg)
[4,7] [2,3,5]
![Page 71: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/71.jpg)
[4,7] [2,3,5]
![Page 72: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/72.jpg)
[4,7] [2,3,5][7,1,6,5,12,8,9,2,2,4, 7,11,6,1,5]
![Page 73: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/73.jpg)
[4,7] [2,3,5][7,1,6,5,12,8,9,2,2,4, 7,11,6,1,5]
![Page 74: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/74.jpg)
Similarity and distance
![Page 75: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/75.jpg)
Similarity and distance
![Page 76: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/76.jpg)
Similarity and distance
(q - p) • (q - p)
![Page 77: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/77.jpg)
Similarity and distance
pi - qi
i=1
n
![Page 78: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/78.jpg)
Similarity and distance
pi - qi
i=1
n
![Page 79: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/79.jpg)
Similarity and distance
p • qp q
![Page 80: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/80.jpg)
Similarity and distance
![Page 81: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/81.jpg)
Similarity and distance
![Page 82: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/82.jpg)
Similarity and distance
![Page 83: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/83.jpg)
Similarity and distance
10
10
3=.7
![Page 84: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/84.jpg)
Eliminating inessential features
mountain bike 0 1 0.35 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
FRONT REARLABEL
cyclocross bike 1 0 0.13 1 0 0
![Page 85: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/85.jpg)
Eliminating inessential features
mountain bike 0 1 0.35 1 1 1
HANDLEBAR TYPE
DROP FLATTIRE SIZE
SUSPENSION?
TIRE KNOBS
FRONT REARLABEL
cyclocross bike 1 0 0.13 1 0 0
![Page 86: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/86.jpg)
Very simple: random projection0 0 0 1 1 0 1 0 1 0
0 0 1 0 0 0 1 1 0 0
1 0 1 1 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 1
0 1 0 0 1 0 0 1 0 0
1 0 0 0 0 1 0 1 1 0
0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 1 1
0 0 0 0 1 0 0 1 0 1
1 1 0 0 0 0 0 0 0 1
![Page 87: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/87.jpg)
Very simple: random projection0 0 0 1 1 0 1 0 1 0
0 0 1 0 0 0 1 1 0 0
1 0 1 1 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 1
0 1 0 0 1 0 0 1 0 0
1 0 0 0 0 1 0 1 1 0
0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 1 1
0 0 0 0 1 0 0 1 0 1
1 1 0 0 0 0 0 0 0 1
0.13 0.13
0.06 0.07
0.07 0.06
0.02 0.08
0.17 0.11
0.11 0.09
0.04 0.18
0.13 0.04
0.13 0.21
0.14 0.03
*
![Page 88: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/88.jpg)
Very simple: random projection0 0 0 1 1 0 1 0 1 0
0 0 1 0 0 0 1 1 0 0
1 0 1 1 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 1
0 1 0 0 1 0 0 1 0 0
1 0 0 0 0 1 0 1 1 0
0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 1 1
0 0 0 0 1 0 0 1 0 1
1 1 0 0 0 0 0 0 0 1
0.13 0.13
0.06 0.07
0.07 0.06
0.02 0.08
0.17 0.11
0.11 0.09
0.04 0.18
0.13 0.04
0.13 0.21
0.14 0.03
* =
![Page 89: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/89.jpg)
A linear approach: PCA0 0 0 1 1 0 1 0 1 0
0 0 1 0 0 0 1 1 0 0
1 0 1 1 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 1
0 1 0 0 1 0 0 1 0 0
1 0 0 0 0 1 0 1 1 0
0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 1 1
0 0 0 0 1 0 0 1 0 1
1 1 0 0 0 0 0 0 0 1
![Page 90: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/90.jpg)
A linear approach: PCA0 0 0 1 1 0 1 0 1 0
0 0 1 0 0 0 1 1 0 0
1 0 1 1 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 1
0 1 0 0 1 0 0 1 0 0
1 0 0 0 0 1 0 1 1 0
0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 1 1
0 0 0 0 1 0 0 1 0 1
1 1 0 0 0 0 0 0 0 1
![Page 91: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/91.jpg)
![Page 92: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/92.jpg)
A nonlinear approach: t-SNE
![Page 93: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/93.jpg)
A nonlinear approach: t-SNE
p( | )
![Page 94: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/94.jpg)
p( | )≈
A nonlinear approach: t-SNE
p( | )
![Page 95: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/95.jpg)
![Page 96: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/96.jpg)
Tree-based approaches
![Page 97: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/97.jpg)
Tree-based approaches
yes
no
yes
no
if orange
if !orange
if red
if !red
if !gray
if !gray
![Page 98: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/98.jpg)
Tree-based approaches
yes
no
yes
no
if orange
if !orange
if red
if !red
if !gray
if !gray
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
![Page 99: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/99.jpg)
Tree-based approaches
yes
no
yes
no
if orange
if !orange
if red
if !red
if !gray
if !gray
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
yes
no
no
yes
yes
no
yes
no
![Page 100: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/100.jpg)
Self-organizing maps
![Page 101: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/101.jpg)
Self-organizing maps
![Page 102: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/102.jpg)
Self-organizing maps
![Page 103: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/103.jpg)
Self-organizing maps
![Page 104: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/104.jpg)
Self-organizing maps
https://github.com/radanalyticsio/silex
![Page 105: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/105.jpg)
Meet Apache Spark
![Page 106: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/106.jpg)
A FUNDAMENTAL ABSTRACTION, NOT AN EXECUTION MODEL
![Page 107: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/107.jpg)
Resilient Distributed Datasets are partitioned, lazy, and immutable homogeneous collections.
![Page 108: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/108.jpg)
![Page 109: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/109.jpg)
1 2 3
![Page 110: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/110.jpg)
1 2 3 λ x: x % 2 != 0
FILTER
![Page 111: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/111.jpg)
1 2 3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
![Page 112: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/112.jpg)
1 2 3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
λ x: [x, x+1]
FLATMAP
![Page 113: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/113.jpg)
3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
λ x: [x, x+1]
FLATMAP
3 4 9 10COLLECT
![Page 114: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/114.jpg)
1 2 3
![Page 115: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/115.jpg)
1 2 3 λ x: x % 2 != 0
FILTER
![Page 116: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/116.jpg)
2 3 λ x: x % 2 != 0
FILTER
![Page 117: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/117.jpg)
2 3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
![Page 118: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/118.jpg)
2 3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
λ x: [x, x+1]
FLATMAP
![Page 119: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/119.jpg)
2 3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
λ x: [x, x+1]
FLATMAP
3 4 9 10SAVE AS TEXT FILE
![Page 120: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/120.jpg)
2 3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
λ x: [x, x+1]
FLATMAP
3 4 9 10SAVE AS TEXT FILE
CACHE
![Page 121: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/121.jpg)
executor1
1 2 3
executorn
10 11 12
cluster manager
driver
![Page 122: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/122.jpg)
executor1
1 2 3
executorn
10 11 12
cluster manager
2 4 6 20 22 24
λ x: x * 2 λ x: x * 2
driver
![Page 123: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/123.jpg)
executor1
1 2 3
executorn
10 11 12
cluster manager
2 4 6 20 22 24
λ x: x * 2 λ x: x * 2
driver
CACHECACHE
![Page 124: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/124.jpg)
Example: word countfile = sc.textFile("file://...")
counts = file.flatMap(lambda l: l.split(" ")) .map(lambda w: (w, 1)) .reduceByKey(lambda x, y: x + y)
# computation actually occurs here counts.saveAsTextFile("file://...")
![Page 125: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/125.jpg)
Example: word countfile = sc.textFile("file://...")
counts = file.flatMap(lambda l: l.split(" ")) .map(lambda w: (w, 1)) .reduceByKey(lambda x, y: x + y)
# computation actually occurs here counts.saveAsTextFile("file://...")
![Page 126: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/126.jpg)
Example: word countfile = sc.textFile("file://...")
counts = file.flatMap(lambda l: l.split(" ")) .map(lambda w: (w, 1)) .reduceByKey(lambda x, y: x + y)
# computation actually occurs here counts.saveAsTextFile("file://...")
![Page 127: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/127.jpg)
Example: word countfile = sc.textFile("file://...")
counts = file.flatMap(lambda l: l.split(" ")) .map(lambda w: (w, 1)) .reduceByKey(lambda x, y: x + y)
# computation actually occurs here counts.saveAsTextFile("file://...")
![Page 128: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/128.jpg)
Example: word countfile = sc.textFile("file://...")
counts = file.flatMap(lambda l: l.split(" ")) .map(lambda w: (w, 1)) .reduceByKey(lambda x, y: x + y)
# computation actually occurs here counts.saveAsTextFile("file://...")
![Page 129: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/129.jpg)
Example: word countfile = sc.textFile("file://...")
counts = file.flatMap(lambda l: l.split(" ")) .map(lambda w: (w, 1)) .reduceByKey(lambda x, y: x + y)
# computation actually occurs here counts.saveAsTextFile("file://...")
![Page 130: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/130.jpg)
BEYOND THE RDD
![Page 131: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/131.jpg)
Spark core
![Page 132: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/132.jpg)
Spark core
Graph SQL ML Streaming
![Page 133: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/133.jpg)
Spark core
Graph SQL ML Streaming
ad hoc Mesos YARN
![Page 134: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/134.jpg)
Spark core
Graph SQL ML Streaming
ad hoc Mesos YARNk8s
![Page 135: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/135.jpg)
Machine learning with SparkSupport code for feature engineering and learning pipelines.
Many parallel implementations of classic algorithms for machine learning tasks: dimensionality reduction, classification, regression, clustering, recommendation engines, etc.
Parallel optimization primitives (gradient descent, etc.) and linear algebra to implement your own algorithms.
![Page 136: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/136.jpg)
Streaming dataGoal: use the same abstraction for batch and “streaming” (micro-batch) data by dividing a stream into many small RDDs.
input stream
![Page 137: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/137.jpg)
Streaming dataGoal: use the same abstraction for batch and “streaming” (micro-batch) data by dividing a stream into many small RDDs.
Streaming engine
input stream
![Page 138: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/138.jpg)
Streaming dataGoal: use the same abstraction for batch and “streaming” (micro-batch) data by dividing a stream into many small RDDs.
Streaming engine
input streamwindowed data (RDDs)
![Page 139: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/139.jpg)
Streaming dataGoal: use the same abstraction for batch and “streaming” (micro-batch) data by dividing a stream into many small RDDs.
Streaming engine Spark
input streamwindowed data (RDDs)
![Page 140: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/140.jpg)
Streaming dataGoal: use the same abstraction for batch and “streaming” (micro-batch) data by dividing a stream into many small RDDs.
Streaming engine Spark
input streamwindowed data (RDDs)
processed data (RDDs)
![Page 141: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/141.jpg)
Structured queriesThe capacity to run arbitrary code in RDDs is powerful but comes with an important tradeoff: Spark can’t rearrange RDD programs to improve their performance.
Writing Spark programs with a query DSL allows Spark to generate optimized execution plans.
![Page 142: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/142.jpg)
Query planninghugeCollection .join(anotherHugeCollection) .filter(lambda (n, (a, b)): ultraRare(a) and ultraRare(b))
![Page 143: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/143.jpg)
Query planning
hugeCollection.filter(lambda a: ultraRare(a)) .join(anotherHugeCollection.filter(lambda a: ultraRare(a)))
hugeCollection .join(anotherHugeCollection) .filter(lambda (n, (a, b)): ultraRare(a) and ultraRare(b))
![Page 144: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/144.jpg)
Structured query in SparkSQL interface (unchecked syntax or semantics)SELECT word, COUNT(*) FROM words GROUP BY word
Data frame interface (semantics checked at run-time)words.groupBy('word').count()
Dataset interface (mostly checked at compile-time)
![Page 145: Before we begin, visit radanalyticsio/workshop … · radanalyticsio/workshop to download images and code! Insightful Apps with Apache Spark and OpenShift William Benton (@willb)](https://reader033.vdocument.in/reader033/viewer/2022050121/5f515b83e5f918157102d2b0/html5/thumbnails/145.jpg)
Questions & hands-on exercises