big data processing utilizing open-source technologies - may 2015

32
Big-Data Processing utilizing Open-Source Technologies 32 Slides Amir Sedighi Rayanesh Dadegan Data Solutions Ltd. May 2015

Upload: amir-sedighi

Post on 28-Jul-2015

539 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Big Data Processing Utilizing Open-source Technologies - May 2015

Big-Data Processing utilizingOpen-Source Technologies

32 Slides

Amir SedighiRayanesh Dadegan Data Solutions Ltd.

May 2015

Page 2: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 2

References● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1

● http://www.forbes.com/fdc/welcome_mjx.shtml

● ZYMR Spark Your Real-Time Big Data Analytics

● http://dataconomy.com

● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landscape/

● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8-9122f7210440&v=qf1&b=&from_search=12

● https://wiki.apache.org/hadoop/PoweredBy

● Making Sense Of Streaming Processing by Martin Kleppmann

Page 3: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 3

Data Explosion

Page 4: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 4

Data Explosion

Page 5: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 5

● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze.– Data Providers

● Business Companies● People

Page 6: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 6

Volume, Velocity, Variety● “There was 5 exabytes of

information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt

Page 7: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 7

Big-Data Processing

Page 8: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 8

How to setup a Big-Data processing platform using commodity machines?

Page 9: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 9

Vertical or Horizontal?

Page 10: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 10

Scale Up vs Scale Out

Page 11: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 11

Scale Up vs Scale Out

Page 12: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 12

Big-Data Processing Open-Source Technology Stack

Page 13: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 13

Map-Reduce

Page 14: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 14

Hadoop Framework

Page 15: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 15

Apache Hadoop Main Projects

Page 16: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 16

Page 17: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 17

SQL on Hadoop

● Apache Hive● Apache Drill (Dremel)● Cloudera Impala● Facebook Presto● Apache Kylin

Page 18: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 18

More Map-Reduce (YARN)

● Apache Spark● Apache Flink (Stratosphere)● Apache Hama● Apache Tez (DAG, Complex Data Processing)

Page 19: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 19

Service Programming

● Apache Thrift● Apache Zookeeper● Apache Avro● Google Kryo

Page 20: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 20

Data Stores

● Data Stores– KeyValue– Graph– Columnar– Document Store– In Memory

Page 21: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 21

Data Transfer

● Apache Flume● Apache Sqoop

Page 22: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 22

Search

● Elasticsearch● Apache SolR

Page 23: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 23

Log Management

● ELK● Logstash● FluentD

Page 24: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 24

Machine Learning

● Apache Mahout● MLLib● GraphX

Page 25: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 25

Messaging and Queuing● Apache Kafka● ZeroMQ

Page 26: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 26

Stream Processing

● Apache Storm● Apache Samza● Apache Spark

Page 27: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 27

Data Processing

Transient Query– Issued once, then forgotten

Persistent DataStored until deleted by user or apps

Page 28: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 28

Stream Processing

Transient Data– Deleted as Window Slides

Forward

Generated up-to-date answers as time goes on

Persistent Queries

Tim

e Ba

sed

Coun

t Bas

ed

Page 29: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 29

Page 30: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 30

Page 31: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 31

● http://recommender.ir

● http://helio.ir

Page 32: Big Data Processing Utilizing Open-source Technologies - May 2015

Amir Sedighi - May 2015 32

Thank You!

Find this slide here:

http://www.slideshare.net/AmirSedighi

LinkedIn:

http://www.linkedin.com/in/amirsedighi

Blog:

http://hexican.com

Email:

[email protected]

Twitter:

@amirsedighi