cloudberry - big data visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...tutorial...

27
Cloudberry - Big Data Visualization 1 Sadeem Alsudais, Qiushi Bai, Chen Li UC Irvine BOSS Workshop 2019

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Cloudberry - Big Data Visualization

1

Sadeem Alsudais, Qiushi Bai, Chen Li

UC IrvineBOSS Workshop 2019

Page 2: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Big Data Visualization Tools

2

Page 3: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Big Data Visualization Tools

3

Page 4: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

A middleware solution for interactive analytics and visualization on large data

Our solution: Cloudberry

5

Page 5: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Cloudberry Architecture

6

Page 6: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Prototype: Twittermap

1.6+ billion records; 2TB; temporal/spatial/textual conditions;Hardware: < $6K

7

Page 7: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

● Twittermap demo● Cloudberry overview● Instructions to setup a Cloudberry application

on social media visualization● Under-the-hood details

Tutorial Overview

8

Page 8: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

CloudberryTutorial

9

Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Page 9: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Cloudberry - Big Data Visualization

10

Sadeem Alsudais, Qiushi Bai, Chen Li

UC IrvineBOSS Workshop 2019

Page 10: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

● Twittermap demo● Cloudberry overview● Instructions to setup a Cloudberry application

on social media visualization● Under-the-hood details

Tutorial Overview

11

Page 11: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Twittermap Application

http://cloudberry.ics.uci.edu/apps/twittermap/

12

Page 12: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Twittermap Settings● # of tweets: >1.6B (2TB)● Continuous tweet ingestion

○ 3M tweets / day● A cluster of 5 Intel NUC machines

○ Intel Core i7○ 32GB memory○ Samsung 1TB EVO NVMe SSD○ < $6K

13

Page 13: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

A middleware solution for interactive analytics and visualization on large data

Cloudberry

http://cloudberry.ics.uci.edu/

14

Page 14: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Cloudberry Architecture

15

Page 15: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Cloudberry Architecture

16

Page 16: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Metadata

17

Page 17: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Cloudberry Architecture

18

Page 18: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Answering Queries Using Views

Towards Interactive Analytics and Visualization on One Billion Tweets, Jianfeng Jia, Chen Li, Xi Zhang, Chen Li, Michael J. Carey, Simon Su, ACM SIGSPATIAL 2016 (Demo Paper)

Ask original dataset and view

19

Page 19: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Cloudberry Architecture

20

Page 20: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Drum: Adaptive Framework for Query Slicing

Drum: A Rhythmic Approach to Interactive Analytics on Large Data, Jianfeng Jia, Chen Li, Michael J. Carey, IEEE Big Data 201721

Page 21: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Tutorial Steps● Requirements

○ Shell terminal○ Web browser

● Google “UCI Cloudberry” ○ “Resources” -> “BOSS 19 Tutorial”

22

Page 22: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Under-the-hood details

23

Page 23: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Drum: Adaptive Framework for Query Slicing

24

Page 24: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

● Total running time● Smoothness of result delivery

25

Schedule cost

Page 25: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Linear regression with uncertainty

26

Page 26: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Tradeoff of Running Time and Penalty

27

Page 27: Cloudberry - Big Data Visualizationcloudberry.ics.uci.edu/wp-content/uploads/2019/09/...Tutorial Overview 8 Cloudberry Tutorial 9 Time: 11 AM & 2 PM Location: Santa Monica (3rd level)

Choosing ri to maximize the expected score

28