![Page 1: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/1.jpg)
Feature Store: the missing data layer in ML pipelines?1
Spotify ML Guild Fika
Kim Hammar
February 26, 2019
1Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?https://www.logicalclocks.com/feature-store/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 1 / 29
![Page 2: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/2.jpg)
Model
ϕ(x)
Data Predictions
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 2 / 29
![Page 3: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/3.jpg)
Model
ϕ(x)
Data Predictions
Distributed TrainingData Validation
Feature Engineering
Data Collection
HardwareManagement
HyperParameterTuning
ModelServing
Pipeline Management
A/BTesting
Monitoring
2
2Image inspired from Sculley et al. (Google) Hidden Technical Debt in Machine Learning Systems
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 3 / 29
![Page 4: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/4.jpg)
Outline
1 Hopsworks: Quick background of the platform
2 What is a Feature Store
3 Why You Need a Feature Store, Things to Consider:How to encourage feature reusage?How to store large-scale datasets for deep learning?How to serve features for inference?
4 How to Build a Feature Store (Hopsworks Feature Store Case Study)
5 Demo
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 4 / 29
![Page 5: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/5.jpg)
REST API
Kafka
TF Serving
Data Ingestion Data Prep Feature Store Training Serving
Orchestration
CPUs GPUs
HopsML
HopsYARN (fork of YARN)
HopsFS (fork of HDFS)
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 5 / 29
![Page 6: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/6.jpg)
REST API
Kafka
TF Serving
Data Ingestion Data Prep Feature Store Training Serving
Orchestration
CPUs GPUs
HopsML
HopsYARN (fork of YARN)
HopsFS (fork of HDFS)
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 5 / 29
![Page 7: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/7.jpg)
ϕ(x)y1...yn
x1,1 . . . x1,n
... . . ....
xn,1 . . . xn,n
y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 6 / 29
![Page 8: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/8.jpg)
ϕ(x)y1...yn
x1,1 . . . x1,n
... . . ....
xn,1 . . . xn,n
y?_\_( ") )_/
_
“Data is the hardest part of ML and the most important piece toget right.
Modelers spend most of their time selecting and transformingfeatures at training time and then building the pipelines to deliverthose features to production models.”
- Uber3
3Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 6 / 29
![Page 9: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/9.jpg)
ϕ(x)y1...yn
x1,1 . . . x1,n
... . . ....
xn,1 . . . xn,n
y?_\_( ") )_/
_
“Data is the hardest part of ML and the most important piece toget right.
Modelers spend most of their time selecting and transformingfeatures at training time and then building the pipelines to deliverthose features to production models.”
- Uber3
3Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 6 / 29
![Page 10: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/10.jpg)
ϕ(x)y1...yn
x1,1 . . . x1,n
... . . ....
xn,1 . . . xn,n
yFeature Store
“Data is the hardest part of ML and the most important piece toget right.
Modelers spend most of their time selecting and transformingfeatures at training time and then building the pipelines to deliverthose features to production models.”
- Uber4
4Jeremy Hermann and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo.https://eng.uber.com/scaling-michelangelo/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 6 / 29
![Page 11: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/11.jpg)
Disentangle ML Pipelines with a Feature Store
Raw/Structured Data
Feature StoreFeature Engineering Training
Modelsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y
A feature store is a central vault for storing documented, curated, andaccess-controlled features.
The feature store is the interface between data engineering and datamodel development
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 7 / 29
![Page 12: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/12.jpg)
Dataset 1 Dataset 2 . . . Dataset n
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1,−1) (−8,−8)(−10, 0) (0,−10)
40 60 80 100
160
180
200
X
Y
FeatureEngineering
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 8 / 29
![Page 13: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/13.jpg)
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1,−1) (−8,−8)(−10, 0) (0,−10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 9 / 29
![Page 14: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/14.jpg)
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Backfilling
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1,−1) (−8,−8)(−10, 0) (0,−10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 9 / 29
![Page 15: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/15.jpg)
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Backfilling
Analysis
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1,−1) (−8,−8)(−10, 0) (0,−10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 9 / 29
![Page 16: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/16.jpg)
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Backfilling
Analysis
Versioning
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1,−1) (−8,−8)(−10, 0) (0,−10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 9 / 29
![Page 17: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/17.jpg)
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Backfilling
Analysis
Versioning
Documentation
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1,−1) (−8,−8)(−10, 0) (0,−10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 9 / 29
![Page 18: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/18.jpg)
What is a Feature?
A feature is a measurable property of some data-sample
A feature could be..An aggregate value (min, max, mean, sum)A raw value (a pixel, a word from a piece of text)A value from a database table (the age of a customer)A derived representation: e.g an embedding or a cluster
Features are the fuel for AI systems:
x1...xn
Features
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y
Model θ
y
Prediction
L(y , y)
Loss
Gradient∇θL(y , y)
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 10 / 29
![Page 19: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/19.jpg)
Raw text lower-case& remove noise
tokenization lemmatization
words.txt group by post words_post.csv
word2vec TF-IDF LDAontology-matchingannotation with weaksupervision
Model
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 11 / 29
![Page 20: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/20.jpg)
Raw text
Feature Store
TF-IDF word2vec LDA weakannotation
normalization
Model
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 12 / 29
![Page 21: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/21.jpg)
How to Encourage FeatureReusage?
![Page 22: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/22.jpg)
Feature Marketplace
Feature Marketplace
Download
Features
Search
FeaturesFeatures
Publish
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 14 / 29
![Page 23: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/23.jpg)
Feature Store API Service
from hops import featurestore
features_df =
featurestore.get_features([
"average_attendance",
"average_player_age"
])
Feature Relationships
Feature Groups
Shared Storage
Feature Store API ServiceFeature
Metadata
in ML pipelines
Include features
Figure: Feature Store API Service
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 15 / 29
![Page 24: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/24.jpg)
How to Store Datasets for DeepLearning?
![Page 25: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/25.jpg)
How to Store Datasets for Deep Learning?
Should be frameworkagnostic
Need to be able to storetensor datasets
Should support sharding fordistributed training
Advanced features:row-predicate filtering, SQLinterface, columnar selection.
?
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 17 / 29
![Page 26: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/26.jpg)
How to Store Datasets for Deep Learning?
, /
HDF5
/ ,
TFRecords
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 18 / 29
![Page 27: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/27.jpg)
How to Store Datasets for Deep Learning?
Petastorm is a datasetformat designed for deeplearning
Petastorm stores data asparquet files with extrametadata to handlemulti-dimensional tensors
Petastorm contains readersfor the popular machinelearning frameworks such asSparkML, Tensorflow,PyTorch
, ,
Petastorm
55Robbie Gruener, Owen Cheng, and Yevgeni Litvin. Introducing Petastorm: Uber ATG’s Data Access Library for
Deep Learning. https://eng.uber.com/petastorm/. 2018.Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 19 / 29
![Page 28: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/28.jpg)
How to Serve Features forInference?
![Page 29: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/29.jpg)
Delivering Features for Training and Serving is Different
Serving can require real-timefeatures
Ideally we want consistencybetween real-time featuresand batch features used fortraining
Complex engineeringproblem
Feature Store
Real-Time Features
xPrediction
Inference Request
b0
x 0,1
x 0,2
x 0,3
b1
x 1,1
x 1,2
x 1,3
y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 21 / 29
![Page 30: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/30.jpg)
How to Implement a (batch)Feature Store?
![Page 31: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/31.jpg)
The Components of a Feature Store
The Storage Layer: For storing feature data in the feature storeThe Metadata Layer: For storing feature metadata (versioning,feature analysis, documentation, jobs)The Feature Engineering Jobs: For computing featuresThe Feature Registry: A user interface to share and discoverfeaturesThe Feature Store API: For writing/reading to/from the featurestore
Feature Storage
Feature Metadata Jobs
Feature Registry API
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 23 / 29
![Page 32: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/32.jpg)
Feature Storage
FeatureComputation
Raw/Structured Data
Data Lake
FeatureGroup 1
FeatureGroup 2
FeatureGroup 3
FeatureGroup 4
project_featurestore.db
HiveMetastore
Foreign keys
Feature Storage
Feature Metadata Jobs
Feature Registry API
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 24 / 29
![Page 33: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/33.jpg)
Feature Metadata
FeatureComputation
Raw/Structured Data
Data Lake
FeatureGroup 1
FeatureGroup 2
FeatureGroup 3
FeatureGroup 4
project_featurestore.db
HiveMetastore
FeaturestoreMetadata
Foreign keys
Foreign keys
Feature Storage
Feature Metadata Jobs
Feature Registry API
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 25 / 29
![Page 34: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/34.jpg)
Feature Registry and API
FeatureComputation
Raw/Structured Data
Data Lake
FeatureGroup 1
FeatureGroup 2
FeatureGroup 3
FeatureGroup 4
project_featurestore.db
HiveMetastore
FeaturestoreMetadata
Foreign keys
Foreign keys
HopsworksFeature registry (UI)
REST API
Program APIs
Feature Storage
Feature Metadata Jobs
Feature Registry API
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 26 / 29
![Page 35: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/35.jpg)
FeatureComputation
Raw/Structured Data
Data Lake Feature Store
Curated FeaturesModel
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y
Demo-Setting
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 27 / 29
![Page 36: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/36.jpg)
Summary
Machine learning comes with a high technical costMachine learning pipelines needs proper data managementA feature store is a place to store curated and documented featuresThe feature store serves as an interface between feature engineeringand model development, it can help disentangle complex ML pipelinesHopsworks6 provides the world’s first open-source feature store
@hopshadoop
www.hops.io
@logicalclocks
www.logicalclocks.com
We are open source:https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
76Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.7Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou,
Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, and Alex Ormenisan
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 28 / 29
![Page 37: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/37.jpg)
References
Hopsworks’ feature store8 (the only open-source one!)Uber’s feature store9
Airbnb’s feature store10
Comcast’s feature store11
GO-JEK’s feature store12
HopsML13
Hopsworks14
8Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?https://www.logicalclocks.com/feature-store/. 2018.
9Li Erran Li et al. “Scaling Machine Learning as a Service”. In: Proceedings of The 3rd InternationalConference on Predictive Applications and APIs. Ed. by Claire Hardgrove et al. Vol. 67. Proceedings of MachineLearning Research. Microsoft NERD, Boston, USA: PMLR, 2017, pp. 14–29. URL:http://proceedings.mlr.press/v67/li17a.html.
10Nikhil Simha and Varant Zanoyan. Zipline: Airbnb’s Machine Learning Data Management Platform.https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform. 2018.
11Nabeel Sarwar. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions.https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to-predictions. 2018.
12Willem Pienaar. Building a Feature Platform to Scale Machine Learning | DataEngConf BCN ’18.https://www.youtube.com/watch?v=0iCXY6VnpCc. 2018.
13Logical Clocks AB. HopsML: Python-First ML Pipelines.https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. 2018.
14Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 29 / 29
![Page 38: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/38.jpg)
Backup Slides
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 30 / 29
![Page 39: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/39.jpg)
Modeling Data in the Feature Store
A feature group is a logical grouping of featuresTypically from the same input dataset and computed with the same job
A training dataset is a set of features suitable for a prediction taskFeatures in a training dataset are often from several feature groupsE.g features on customers, features on user activities, etc.
Training Datasets d
Feature groups g
Features f f1 f2 f3 f4 f5
g1
f6 f7 f8 f9 f10
g2
f11 f12
g3
d1 d2
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 31 / 29
![Page 40: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/40.jpg)
Hopsworks Feature Store API Service
SQL
hops-util
Query PlannerFeature Store Data
Hive on HopsFS
Feature StoreMetadata
Dataframe With Features
Client Interface Feature Store Service Output
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 32 / 29
![Page 41: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/41.jpg)
Training Pipeline in HopsML
1 Create job/notebook to compute features and publish to the featurestore
2 Create job/notebook to read features/labels and save to a trainingdataset
3 Read the training dataset into your model for training
HopsFSData Lake
HiveFeature store
Raw data
Feature computation
Feature store features
hops-utilhops-util-py
Basis features Training dataset Modelb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
y
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 33 / 29
![Page 42: Feature Store: the missing data layer in ML pipelines ... · Outline 1 Hopsworks: Quickbackgroundoftheplatform 2 What isaFeatureStore 3 Why YouNeedaFeatureStore,ThingstoConsider:](https://reader035.vdocument.in/reader035/viewer/2022071217/604c108c0d14d37be3387439/html5/thumbnails/42.jpg)
Hopsworks Feature Store API
Reading from the Feature Store:
from hops import featurestore
features_df = featurestore.get_features([
"average_attendance",
"average_player_age"
])
Writing to the Feature Store:
from hops import featurestore
raw_data = spark.read.parquet(filename)
pol_features = raw_data.map(lambda x: x^2)
featurestore.insert_into_featuregroup(pol_features , "pol_featuregroup")
Kim Hammar (Logical Clocks) Hopsworks Feature Store February 26, 2019 34 / 29