intro to big data
DESCRIPTION
Introduction to Big Data.TRANSCRIPT
Intro to Big Data On Premise
Presented by: Jon BloomSenior Consultant, Agile Bay, Inc.
Jon BloomBlog: http://www.bloomconsultingbi.com
Twitter: @sqljon
Linked-in: http://www.linkedin.com/in/BloomConsultingBI
Email: [email protected]
Customers & Partners
w w w . a g i l e b a y . c o m
Session AgendaWhat is Big Data?What is Hadoop?BI vs. HadoopDemo:
Terms and Acronyms Hadoop:
Apache project (open source) project to develop software for reliable, scalable, distributed computing.
Cluster: A group of computers (nodes) linked together to perform a highly-available and high computation work
HDFS distributed file system that provides high-throughput access to application data.
YARNA framework for job scheduling and cluster resource management.
MapReduce A system for parallel processing of large data sets.
What is Big Data?
What is Big Data?Volume, Velocity, Variety
What is Hadoop?
What is HadoopApache open source project Batch Oriented Parallel Processing across
Commodity Servers Ecosystem
• Ambari• HBase• Avro• Cassandra• Chukwa
• Hive• Mahout• Pig• ZooKeeper
Distributed Computing & MapReduce
MapperReducer
BI vs. Hadoop?
BI vs. HadoopHadoop not a replacement of BIExtends BI capabilitiesBI = Scale up to 100s of GigabytesHadoop = From 100s of Gygabytes to
Terabytes (1,000s og Gygabytes) and Terabytes (1,000,000 Gigabytes)
Demo
Thank you for attending!Q & A
Blog: www.bloomconsultingbi.comTwitter: @sqljon
Linked-in: http://www.linkedin.com/in/BloomConsultingBI
Email: [email protected]