big data
DESCRIPTION
Learn about big data and basics of HadoopTRANSCRIPT
PYOTR SMIRNOV, 1860
©copyright Ankur Raina 2012
Industry in TransitionI T stands for
©copyright Ankur Raina 2012
• 3 million lines of code are tracking your checked baggage.
• A billion lines of code are included in the working of the latest airbus plane.
• A billion transistors per person.• 4 billion mobile phone subscribers.• St. Anthony Falls Bridge ( Minneapolis) is fitted
with 200 embedded sensors.
• 0
Did you know ?
©copyright Ankur Raina 2012
Oops!!!DATA EXHAUST
©copyright Ankur Raina 2012
•20018 Lakh Petabytes of data
•202035 zettabytes of data
©copyright Ankur Raina 2012
7 TB/day
10 TB/day
Exhaust
Data
©copyright Ankur Raina 2012
The Trouble begins here…• 80% of the world’s information is
unstructured.
• Unstructured information is growing at 15 times the rate of structured information.
Are We Prepared ?
©copyright Ankur Raina 2012
BIG DATA
Ankur Raina09-IT-4505
©copyright Ankur Raina 2012
Contents• What is Big Data ?• The 3Vs.• What is a Big Data platform ?• Needle in a haystack problem.• Big Data & Social Media.• The Call Centre mantra.• ABCs of Hadoop.
©copyright Ankur Raina 2012
Big Data The information which cannot be
processed/analyzed using the traditional processes or tools.
• Instrumentation • Interconnection
• M2M interconnectivity• Intelligent Machines
©copyright Ankur Raina 2012
3Vs
©copyright Ankur Raina 2012
Big Data Platform• Lets you store the data in its native business
object format & get value out of it through massive parallelism on readily available components.
• It’s not a replacement of Data Warehouse.
©copyright Ankur Raina 2012
Is it worth it ?
This is what I need !!!
IT yearns for log longevity ?Service Oriented Architecture (SOA )
©copyright Ankur Raina 2012
Social MediaWe know…• What are the people saying ?
But…• Why are people saying what they are saying &
behaving in the way they are behaving ?
From the Business Perspective
©copyright Ankur Raina 2012
• Super Bowl 2011 (4064 Ttps ,Feb 2011)• Bin Laden’s death ( 5106 Ttps )• Japan Earthquake ( 6939 Ttps )• Paraghay’s football penalty shootout win over
Brazil in the Copa America quarter-final peaked at 7166 Ttps
• Same day U.S match win in the FIFA women’s world cup -> 7196 Ttps
• Singer Beyonce’s pregnancy announcement (8868 Ttps )
Twitter Tweets per Second
©copyright Ankur Raina 2012
• In-Motion Analytics ( Streams Computing )• Using At Rest ( BigInsights)
Call Centre mantra:“This call may be recorded
For Quality Assurance Purposes”
©copyright Ankur Raina 2012
HADOOP• Creator: Doug Cutting• Top-level Apache Project.• Inspired by Google’s work on it GFS ( Google
File System ).• Function-to-data model & not data-to-
function model.
©copyright Ankur Raina 2012
Does the word “Hadoop”mean anything ?
©copyright Ankur Raina 2012
Hadoop
HDFS Map Reduce Hadoop Common Components
©copyright Ankur Raina 2012
Hadoop Distributed File System• Data broken into blocks & distributed throughout the
cluster.• Data locality.• Mean Time To Failure ( MTTF )• Block size ( 64MB default )• Higher block sizes available for longer files to reduce
the amount of metadata. ( BigInsights 128 MB )• Redundancy• Name Node server
©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Map Reduce• Map job which takes a set of data and
converts it into another set of data where individual elements are broken down into tuples.
• Reduce job takes the output from a map as input & combines those data tuples into smaller set of tuples.
©copyright Ankur Raina 2012
Map Reduce• Job• Tasks• Job Tracker• Task Tracker Agents• Shuffle• Combiner
©copyright Ankur Raina 2012
©copyright Ankur Raina 2012
Hadoop Common Components• Set of libraries that support various Hadoop
subprojects.• /bin/hdfs dfs <args>Command Function
chmod Changes the permissions for reading & writing to a given file/set of files.
chown Changes the owner of a given file/set of files
copyFromLocal Copies a file from the local file system into HDFS
©copyright Ankur Raina 2012
Command Function
copyToLocal Copies a file from HDFS to the local file system.
cp Copies HDFS files from one directory to another.
expunge Empties all files that are in the trash.
cat Copies the files to standard output.
ls Displays a listing of files in a given directory.
mkdir Creates a directory in HDFS.
mv Moves files from one directory to another.
rm Deletes a file 7 sends it to the trash. ( use –skiptrash option for deleting permanently).
©copyright Ankur Raina 2012
References• www.ibm.com• www.hadoop.apache.org• Understanding Big Data by Chris, Dirk, Tom,
George & Paul ( McGraw Hill )• Oracle Magazine