big data & hadoop

by Thanakrit [email protected]

BIG DATA & HADOOPBIG DATA & HADOOPThe future of the information economyThe future of the information economy

A Technology BlueprintA Technology Blueprint

Big Data StorymapBig Data Storymap

Big Data ConceptBig Data Concept

Big Data ArchitectureBig Data Architecture

Big Data EcosystemBig Data Ecosystem

Big Data LandscapeBig Data Landscape

Big Data Life-cycle ManagementBig Data Life-cycle Management

Hadoop ConceptHadoop Concept

Hadoop ArchitectureHadoop Architecture

Hadoop ArchitectureHadoop Architecture

Name NodeMaintains mapping of file

blocks to data node slaves

Job TrackerSchedules jobs across task

tracker slaves

Data NodeStores and serves

blocks of data

Hadoop ClientContacts Name Node for data or

Job Tracker to submit jobs

Task TrackerRuns tasks (work units)

within a jobShare Physical Node

Split 1

Split i

Split N

Map 1(docid, text)

(docid, text) Map i

(docid, text) Map M

Reduce 1Output File 1(sorted words,

sum of counts)

Reduce iOutput File i(sorted words,

sum of counts)

Reduce ROutput File R(sorted words,

sum of counts)

(words, counts)(sorted words, counts)

Map(in_key, in_value) => list of (out_key, intermediate_value) Reduce(out_key, list of intermediate_values) => out_value(s)

Shuffle(words, counts) (sorted words, counts)

“To Be Or Not To Be?”

Be, 5

Be, 12

Be, 7Be, 6

Be, 30

cat *.txt | mapper.pl | sort | reducer.pl > out.txt

Hadoop ProcessHadoop ProcessMapReduce Example for Word Count

Hadoop EcosystemHadoop Ecosystem

Thank YouThank You