big data analytics - electrical engineering and...

10
Big Data Analytics Sunnie Chung Electrical Engineering and Computer Science

Upload: others

Post on 10-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

Big Data Analytics

Sunnie ChungElectrical Engineering and Computer Science

Page 2: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

2

Big DataHow Much of Data ? In Peta Bytes !

• Google processes 30 PB a day (2015)• eBay has 6.5 PB of user data + 50 TB/day (2009)• Facebook has 36 PB of user data + 80-90 TB/day

(2013)• CERN’s LHC: 15 PB a year (~2015)• LSST: 6-10 PB a year (~2015)

How many female WWF fans under the age of 30 visited the Toyota

community over the last 4 days and saw a Class A ad?

How are these people similar to those that visited

Nissan?

Unstructured Text Stream in PB a day

What Your Big Data Stream Looks Like?

Page 3: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

3

1. Data Cleaning/Extraction/Transformation

2. Data Staging/Processing

3. Data Mining Strategies: Data Modeling/ Validation

4. Data Visualization

Massively Parallel Processing Systems• Hadoop Based Multi Node Cluster: NoSQL Stack• Cloud Based Hadoop Cluster (20 – 2000 Nodes)Software: Automatic Parallel Execution in MapReduce

Analytic Parallel Data Warehouse Systems

Information Retrieval

Machine Learning: Neural Network, SVM, Classification

Database Research:Multi Level Association Rule Mining

Statistics Based Methods: Clustering

Page 4: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

4

1. Data Cleaning/Extraction/Transformation

2. Data Staging/Processing

3. Data Mining Strategies: Data Modeling/ Validation

4. Data Visualization

Massively Parallel Processing Systems• Hadoop Based Multi Node Cluster: NoSQL Stack• Cloud Based Hadoop Cluster (20 – 2000 Nodes)Software: Automatic Parallel Execution in MapReduce

Analytic Parallel Data Warehouse Systems

Information Retrieval

∑∑

==

==•=

•=

V

i i

V

i i

V

i ii

dq

dq

d

d

q

q

dq

dqdq

1

2

1

2

1),cos( r

r

r

r

rr

rrrr

Machine Learning: Neural Network, SVM, Classification

Database Research Based Methods:Multi Level Association Rule Mining

Statistics Based Methods ; Cluster

Page 5: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

5

010002000300040005000600070008000

Pacific

Paris,

Lo

ndo

n,

Easte

rn…

Am

ste

rda

m,

Ath

en

s,

Ce

ntr

al…

Jakart

a,

Gre

en

lan

d,

Bang

ko

k,

Bra

sili

a,

Ha

waii,

Atla

ntic…

Arizona

,

Lju

blja

na

,

Beiji

ng,

Belg

rade

,

Ne

w D

elh

i,

Berlin

,

Topics Most Talked About on Nov 22, 2015

Regions Most Tweeted on Nov 22, 2015

Data Extraction/Transformation

Your data Tweets Looks like on Nov 22, 2015

Page 6: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

6

Top Job titles recently listedlocations of jobs listed 1 day ago

Profile Headlines with Highest Connections

Page 7: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

7

Tweets Data Stream on Nov 5, 2015 Tweets Topics on Nov 5, 2015

Leads to the Company Stock FallUnusual Negative Tweets on the Company

Unusual Cluster on the Company Name

Page 8: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

8

Tweets Data Stream on Nov 13, 2015

Tweets Per Topic on Nov 13, 2015

Page 9: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

9

Database Security on Cloud

Encrypting Database on Cloud for Retrieving the Sensitive Data Without Decrypting

Achieving Cyber Security with Big Data Analytics

Fraud Detection in Credit Card

Intrusion Detection in Systems with sensitive data

Machine Fault Detection

Page 10: Big Data Analytics - Electrical Engineering and …eecs.csuohio.edu/~sschung/cis612/BigDataAnalytics_Ignite...Big Data Analytics Research Group Math, Statistics and Databases Big Data

10

Annual Big Data Workshop at CSU Big Data Analytics Curriculum at EECS

Big Data Analytics Research Group

Math, Statistics and DatabasesBig Data Specific Processing TechniquesCloud Computing Massively Parallel Big Data Processing SystemsData Source ModelingData Mining Strategies

Data Driven solutions

President’s Advisory Committee for Center Of ExcellenceData AnalyticsCyber SecurityCloud Computing