towards scalable multimedia analytics · 2018. 12. 5. · towards engineering a web-scale...
TRANSCRIPT
![Page 1: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/1.jpg)
Towards ScalableMultimedia Analytics
Björn Þór Jónssondatasys groupComputer Science DepartmentIT University of Copenhagen
![Page 2: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/2.jpg)
Today’s Media Collections
• Massive and growing– Europeana > 50 million items– DeviantArt > 250 million items (160K/day)– Facebook > 1,000 billion items (200M/day)
• Variety of users and applications– Novices à enthusiasts à scholars à experts– Current systems aimed at helping experts
• Need for understanding and insights2
![Page 3: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/3.jpg)
Media Tasks
3
SearchExploration
![Page 4: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/4.jpg)
Media Tasks
4[Zahálka and Worring, 2014]
![Page 5: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/5.jpg)
MultimediaAnalytics
Multimedia Analytics
Multimedia Analysis
VisualAnalytics
[Zahálka and Worring, 2014] 5
![Page 6: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/6.jpg)
[Zahálka and Worring, 2014; Keim et al., 2010]
From Data to Insight
6
![Page 7: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/7.jpg)
The Two Gaps
7
Generic data and annotation
based on objective understanding
Predefined, fixedannotation based
on understanding of the collection
Specific contextand task-driven subjective understanding
Dynamically evolving and interaction-driven understanding of collections
Semantic Gap[Smeulders et al., 2000]
Pragmatic Gap[Zahálka and Worring, 2014]
![Page 8: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/8.jpg)
Multimedia AnalyticsState of the Art
• Theory is developing• Early systems have appeared• No real-life applications (?)• Small collections only
8
![Page 9: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/9.jpg)
ScalableMultimediaAnalytics
ScalableMultimedia Analytics
Multimedia Analysis
VisualAnalytics
DataManagement
[Jónsson et al., 2016] 9
![Page 10: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/10.jpg)
The Three Gaps
10
Generic data and annotation
based on objective understanding
Predefined, fixedannotation based
on understanding of the collection
Pre-computed indices and bulk
processing of large datasets
Specific contextand task-driven subjective understanding
Dynamically evolving and interaction-driven understanding of collections
Serendipitousand highly interactive sessions on small data subsets
Semantic Gap[Smeulders et al., 2000]
Pragmatic Gap[Zahálka and Worring, 2014]
Scale Gap[Jónsson et al., 2016]
![Page 11: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/11.jpg)
VELO
CIT
YVOLUME
VAR
IETY
VISUALINTERACTION
[Jónsson et al., MMM 2016]
Ten Research Questions forScalable Multimedia Analytics
![Page 12: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/12.jpg)
Service Layer
Big Data Framework: Lambda Architecture
12
Batch Layer
Storage Layer
Speed Layer
[Marz and Warren, 2015]
![Page 13: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/13.jpg)
Service Layer
Big Data Framework: Lambda Architecture
13
Batch Layer
Storage Layer
Speed Layer
[Marz and Warren, 2015]
![Page 14: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/14.jpg)
Outline
• Motivation:Scalable multimedia analytics
• Batch Layer:Spark and 43 billion high-dim features
• Service Layer:Blackthorn and 100 million images
• Conclusion:Importance and challenges of scale!
14
![Page 15: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/15.jpg)
Gylfi Þór Guðmundsson, Laurent Amsaleg, Björn Þór Jónsson, Michael J. FranklinTowards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference (MMSys)Taipei, Taiwan, June, 2017
15
![Page 16: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/16.jpg)
Spark Case Study:Motivation
• How can multimedia tasks harness the power of cloud-computing? – Multimedia collections are growing– Computing power is abundant
• ADCFs = Hadoop || Spark– Automatically Distributed
Computing Frameworks– Designed for high-throughput processing
16
![Page 17: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/17.jpg)
Design Choices: ADCF = Spark
• Hadoop is not suitable (more later) • Resilient Distributed Datasets (RDDs)
– Transform one RDD to another via operators– Lazy execution– Master and Workers paradigm
• Supports deep pipelines• Supports worker’s memory sharing• Lazy execution allows for optimizations
17
![Page 18: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/18.jpg)
Design Choices: Application Domain
• Content-Based Image Retrieval (CBIR)– Well known application– Two phases: Off-line & “On-line”
Search resultsQuery Image
CBIRSystem
18
![Page 19: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/19.jpg)
Properties:• Clustering-based• Deep hierarchical index• Approximate k-NN search• Trades response time for
throughput by batching
Why?• Very simple• Prototypical of many
CBIR algorithms• Previous Hadoop
implementation facilitates comparison
Design Choices: DeCP Algorithm
19
![Page 20: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/20.jpg)
DeCP as a CBIR System
20
• Off-line– Build the index
hierarchy– Cluster the data
collection• On-line
– Approximate k-NN search
– Vote aggregation
Index isin RAM
Clusters reside on disk
Searching a single feature
IdentifyRetrieve
Scan
k-NN
Clustered collection
![Page 21: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/21.jpg)
Design Choices: Feature Collection
• YLI feature corpus from YFCC100M– Various feature sets (visual, semantic, …)– 99.2M images and 0.8M videos– Largest dataset publicly available
• Use all 42.9 billion SIFT features!– Goal is to test at a very large scale– No feature aggregation or compression– Largest feature collection reported!
21
![Page 22: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/22.jpg)
Research Questions
• What is the complexity of the Spark pipeline for typical multimedia-related tasks?
• How well does background processing scale as collection size and resources grow?
• How does batch size impact throughput of an online service?
22
![Page 23: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/23.jpg)
Requirementsfor the ADCF
R1 ScalabilityAbility to scale out with additional computing powerR2 Computational flexibilityAbility to carefully balance system resources as neededR3 CapacityAbility to gracefully handle data that vastly exceeds main memory capacity
R4 UpdatesAbility to gracefully update the data structures for dynamic workloadsR5 Flexible pipelineAbility to easily implement variations of the indexing and/or retrieval processR6 SimplicityHow efficiently the programmer’s time is spent
23
![Page 24: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/24.jpg)
DeCP on Hadoop
24
• Prior work evaluated DeCP on Hadoop using 30 billion SIFTs on 100+ machines
• Conclusion = limited success– Scalability limited due to RAM per core– Two-step Map-Reduce pipeline is too rigid
• Ex: Single data-source only• Ex: Could not search multiple clusters
– R1, R2, R3 = partially; R4 = no; R5, R6 = no
![Page 25: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/25.jpg)
DeCP on Spark
• A very different ADCF from Hadoop• Several advantages
– Arbitrarily deep pipelines • Easily implement all features and functionality
– Broadcast variables• Solves the RAM per core limitation
– Multiple data sources• Ex: Allows join operations for maintenance (R4)
25
![Page 26: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/26.jpg)
Spark Pipeline Symbols
26
• .map = one-to-one transformation• .flatmap = one-to-any transformation
• .groupByKey = Hadoop’s Shuffle • .reduceByKey = Hadoop’s Reduce
• .collectAsMap = collect to Master
.map
.flatmap
.groupByKey
.reduceByKey
.collectAsMap
![Page 27: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/27.jpg)
Search Pipeline
27
Indexing
Search
![Page 28: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/28.jpg)
Evaluation: Off-line Indexing
• Hardware: 51 AWS c3.8xl nodes– 800 real cores + 800 virtual cores– 2.8 TB of RAM and 30 TB of SSD space
• Indexing time as collection grows
28
Features(billions)
Indexing time (seconds)
Scaling(relative)
8.5 3,287 –17.2 5,030 1.5326.0 11,943 3.6334.5 14,192 4.3142.9 19,749 6.00
![Page 29: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/29.jpg)
Evaluation: “On-line” Search
29
● Throughput with batching
Hadoop limit
![Page 30: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/30.jpg)
Summary
30
R1
Scal
abili
ty
R2
Com
puta
tiona
lFl
exib
ility
R3
Cap
acity
R4
Upd
ates
R5
Flex
ible
Pi
pelin
es
R6
Sim
plic
ity
Spark Yes Yes YesPartialfull re-shuffle
Yes Yes
Hadoop PartialRAM
per corePartial Partial No No No
![Page 31: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/31.jpg)
Outline
31
• Motivation:Scalable multimedia analytics
• Batch Layer:Spark and 43 billion high-dim features
• Service Layer:Blackthorn and 100 million images
• Conclusion:Importance and challenges of scale!
![Page 32: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/32.jpg)
32
Jan Zahálka, Stevan Rudinac, Björn Þór Jónsson, Dennis C. Koelma, Marcel WorringBlackthorn: Large-Scale Interactive Multimodal LearningUnder revision at IEEE Transactions on Multimedia
Jan Zahálka, Stevan Rudinac, Björn Þór Jónsson, Dennis C. Koelma, Marcel WorringInteractive Multimodal Learning on 100 Million ImagesProceedings of the ACM International Conference on Multimedia Retrieval (ICMR)New York, NY, USA, June, 2016
![Page 33: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/33.jpg)
Service Layer
Framework:Lambda Architecture
33
Batch Layer
Storage Layer
Speed Layer
![Page 34: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/34.jpg)
BlackthornMotivation
• Do not impose a dictionary on the user• Let the user synthesize categories
of relevance from semantic annotations on the fly
• Let the user search and explore along those categories interactively
• Interactive semi-supervised learning
34
at scale!
![Page 35: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/35.jpg)
Honza’sScalability Illustration
• “Yesterday”: 10-100K images
• YFCC: 100M images
35Image credit: http://demonocracy.info/infographics/usa/us_debt/us_debt.html
![Page 36: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/36.jpg)
Scale > Size
36
• Single (high-end) workstation• 1000D features à 800GB
• Interactive response time!• Computing feature scores takes minutes!
![Page 37: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/37.jpg)
Blackthorn Overview
37
![Page 38: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/38.jpg)
Blackthorn Compression
38
![Page 39: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/39.jpg)
Blackthorn Results:1.2M Collection
39
• Compression: 880GB à 5GB• Precision: 89-108% of uncompressed• Scoring time:
60-80x faster• Recall over time:
Blackthorn rocks!
![Page 40: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/40.jpg)
Blackthorn Results: YFCC100M Collection
• Scoring time: ~1 second!
40
![Page 41: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/41.jpg)
Blackthorn Future Work
• More (user) evaluation is needed• Other applications may (will) require
adaptations• Further scalability:
Combine eCP and Blackthorn
41
![Page 42: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/42.jpg)
Outline
• Motivation:Scalable multimedia analytics
• Batch Layer:Spark and 43 billion high-dim features
• Service Layer:Blackthorn and 100 million images
• Conclusion:Importance and challenges of scale!
42
![Page 43: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/43.jpg)
Why Scale?
43
• Current and future applications• Future of computing• Because we cannot yet!
“We choose to … in this decade and do the other things,not because they are easy, but because they are hard, …”
We choose to go to the moon. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too.
JFK, September 12, 1962
![Page 44: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/44.jpg)
Scalability Hurdles: Can Industry Help?Industry-Level Collections
– Data– Processing capacity
The Small-Minded Reviewer– “Are there users willing to explore 100M data sets interactively?”
Interactive Applications– Application knowledge– User study “victims”
44
![Page 45: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/45.jpg)
46
![Page 46: Towards Scalable Multimedia Analytics · 2018. 12. 5. · Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark Proceedings of the ACM Multimedia Systems Conference](https://reader033.vdocument.in/reader033/viewer/2022051901/5fefb5fda3293969d21b7884/html5/thumbnails/46.jpg)
Summary
• Motivation:Scalable multimedia analytics
• Batch Layer:Spark and 43 billion high-dim features
• Serving Layer:Blackthorn and 100 million images
• Conclusion:Importance and challenges of scale!
45