spark will replace hadoop ! know why
TRANSCRIPT
![Page 1: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/1.jpg)
http://www.edureka.co/apache-spark-scala-training
Spark will replace Hadoop ! Know Why ?
![Page 2: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/2.jpg)
Slide 2Slide 2Slide 2 http://www.edureka.co/apache-spark-scala-training
At the end of the session, you will be able to:
Understand Why Learn Spark?
Know Advantages of Spark & its Survey for 2015
Discover Spark Career Path
Understand how Companies are using Spark?
Agenda
![Page 3: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/3.jpg)
Slide 3Slide 3Slide 3 http://www.edureka.co/apache-spark-scala-training
Why Spark?
![Page 4: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/4.jpg)
Slide 4Slide 4Slide 4 http://www.edureka.co/apache-spark-scala-training
Rise of Big Data
By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes (ZB)
The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on
Earth.
0
1000
2000
3000
4000
5000
6000
7000
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Unstructured Data
Structured Data Un-structured Data
![Page 5: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/5.jpg)
Slide 5Slide 5Slide 5 http://www.edureka.co/apache-spark-scala-training
Application of Big Data
Source: Twitter
![Page 6: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/6.jpg)
Slide 6Slide 6Slide 6 http://www.edureka.co/apache-spark-scala-training
Application of Big Data
![Page 7: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/7.jpg)
Slide 7Slide 7Slide 7 http://www.edureka.co/apache-spark-scala-training
Hadoop is not Enough!
Limitations:
Conclusion:
Real-time Processing
Not Fast Enough
Hadoop MapReduce is Limited to Batch Processing. Real-time processing was a big “No” in Hadoop
Hadoop MapReduce is fast but not fast enough
It is essential and can be achieved using Spark!
![Page 8: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/8.jpg)
Slide 8Slide 8Slide 8 http://www.edureka.co/apache-spark-scala-training
Spark Survey and its Advantages
![Page 9: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/9.jpg)
Slide 9Slide 9Slide 9 http://www.edureka.co/apache-spark-scala-training
Spark Survey 2015!
Source: Typesafe
![Page 10: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/10.jpg)
Slide 10Slide 10Slide 10 http://www.edureka.co/apache-spark-scala-training
Advantages of Spark
Ease of Use
Generality
Runs Everywhere
100x faster than MR
![Page 11: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/11.jpg)
Slide 11Slide 11Slide 11 http://www.edureka.co/apache-spark-scala-training
Feature Comparision
Fast 100x faster than MapReduce
Batch Processing Batch and Real-time Processing
Stores Data on Disk Stores Data in Memory
OpenSource OpenSource
Written in Java Written in Scala
Hadoop MapReduce HADOOP Spark
Source: Databrix
![Page 12: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/12.jpg)
Slide 12Slide 12Slide 12 http://www.edureka.co/apache-spark-scala-training
Spark Features/Modules in Demand
Source: Typesafe
![Page 13: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/13.jpg)
Slide 13Slide 13Slide 13 http://www.edureka.co/apache-spark-scala-training
New Features in 2015
Data Frames
• Similar API to data frames in R and Pandas• Automatically optimised via Spark SQL• Released in Spark 1.3
SparkR
• Released in Spark 1.4• Exposes DataFrames, RDD’s & ML library in R
Machine Learning Pipelines
• High Level API• Featurization• Evaluation • Model Tuning
External Data Sources
• Platform API to plug Data-Sources into Spark• Pushes logic into sources
Source: Databrix
![Page 14: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/14.jpg)
Slide 14Slide 14Slide 14 http://www.edureka.co/apache-spark-scala-training
Spark Career Path
![Page 15: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/15.jpg)
Slide 15Slide 15Slide 15 http://www.edureka.co/apache-spark-scala-training
Job Roles & Industry Focus
Source: Typesafe
![Page 16: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/16.jpg)
Slide 16Slide 16Slide 16 http://www.edureka.co/apache-spark-scala-training
Salary Trends
![Page 17: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/17.jpg)
Slide 17Slide 17Slide 17 http://www.edureka.co/apache-spark-scala-training
Major Companies Using Hadoop
![Page 18: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/18.jpg)
Slide 18Slide 18Slide 18 http://www.edureka.co/apache-spark-scala-training
Industry Adoption
Source: Typesafe
![Page 19: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/19.jpg)
Slide 19Slide 19Slide 19 http://www.edureka.co/apache-spark-scala-training
How Companies are using Spark?
![Page 20: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/20.jpg)
Slide 20Slide 20Slide 20 http://www.edureka.co/apache-spark-scala-training
General Business Goals
Source: Typesafe
![Page 21: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/21.jpg)
http://www.edureka.co/apache-spark-scala-training
Demo
![Page 22: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/22.jpg)
Slide 22Slide 22Slide 22 http://www.edureka.co/apache-spark-scala-training
The Big Question!
Is Spark going to replace Hadoop?
![Page 23: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/23.jpg)
Slide 23Slide 23Slide 23 http://www.edureka.co/apache-spark-scala-training
The Big Question!
Is Spark going to replace Hadoop?
Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce
Reasons:
1. Hadoop MapReduce cannot handle real-time processing 2. Hadoop MapReduce is slower than Hadoop Spark3. With rise of IOT, Spark is a must
![Page 24: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/24.jpg)
Questions
Slide 24 http://www.edureka.co/apache-spark-scala-training
![Page 25: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/25.jpg)
Slide 25
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better!
Please spare few minutes to take the survey after the webinar.
http://www.edureka.co/apache-spark-scala-training
Survey
![Page 26: Spark Will Replace Hadoop ! Know Why](https://reader030.vdocument.in/reader030/viewer/2022032504/55c3c33dbb61ebd2048b47fd/html5/thumbnails/26.jpg)