overview of big data zoo
DESCRIPTION
Explains different open source big data tools and where they fitTRANSCRIPT
![Page 1: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/1.jpg)
Data Analysis as a ServiceIou Fag(halv)dag, 2014
Gurvinder Singh, Uninett
![Page 2: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/2.jpg)
Data is the King
![Page 3: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/3.jpg)
Big-Data is ...... ?
![Page 4: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/4.jpg)
Big-Data is relative
![Page 5: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/5.jpg)
What the hype is ..Cheap commodity hardware with amazing computing and storagecapacity
... but this time software has also catching up with hardware
![Page 6: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/6.jpg)
Hype Ingredient list is ..Cheap commodity hardware
Good network capacity
Software based on principal of "Divide and Conquer"
..thus scale out horizontally
![Page 7: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/7.jpg)
Storage
![Page 8: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/8.jpg)
Unstructure StorageStore data reliably, cheaply and scalably
Hadoop Distributed File System (HDFS)
Divide data into smaller chunks
Hetrogenous storage medium support
Similar DFS e.g. Lustre, IBM GPFS, Ceph, MooseFS
![Page 9: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/9.jpg)
Structured StorageStore structured data reliably, scalably and indexed
NoSQL databases to store structured data
HBase, Accumulo stores underlying data in HDFS
Many more in big data zoo: Cassandra, Voltdb, NuoDB...
BlinkDB offers tradeoff between accuracy & response time
Full text search offers by Elasticsearch, Solr
![Page 10: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/10.jpg)
ProcessingMapreduce methodology to process data in the distributed fashion
Data locality with Hadoop Mapreduce and HDFS
Spark supports mapreduce and utilize system & cluster's RAM
Support machine learning algorithms
Support python,scala,java
Support R, framework for data scientists
Hive, Shark, Pig to process structure data in distributed way
![Page 11: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/11.jpg)
Some performance numbers toguide..
L1 cache reference 0.5 nsL2 cache reference 7 nsRAM reference 100 ns (Queen)Flash IO card reference 75,000 ns (Princess)RTT within same datacenter 500,000 nsDisk reference 10,000,000 ns
![Page 12: Overview of Big data zoo](https://reader035.vdocument.in/reader035/viewer/2022081907/54c6ef104a795909498b4569/html5/thumbnails/12.jpg)
THE ENDBy Gurvinder Singh