big data analytics - introduction
Post on 16-Apr-2017
1.420 Views
Preview:
TRANSCRIPT
Big Data Analytics
What Is Big Data Analytics?
Big DataBuzz word
Two definitions:Data sets too large for modern relational databases
Semi-structured/Unstructured data sets
AnalyticsThe science of measuring and discovering patterns and trends with data
Source: http://www.socialtalent.co/blog/big-data-whats-the-big-deal
Data, Data, Everywhere...
In 2004:Internet traffic: 1 Exabyte (that's 134,217,728 8GB flash drives)
A lot of other media:Newspapers/books/magazines
DVDs
Data, Data, Everywhere...
Today:Internet traffic: 1.3 Zettabytes (that's 178,670,639,360 8 GB sticks)110.3 exabytes per month
Even more media:Mobile devices (phones/tablets/mp3 players/etc)
The Internet of Things
Streaming Media
The Internet of Things
How many of you have...Fitness trackers?
E-readers?
Ipods?
Tie them to social sites (i.e. Facebook)?
The Internet of Things
You're being tracked!
So what?Marketing
Medical
Government
Building fuller picture of what's tracked.
Social Network Integration
Six Degrees of Separation
Source: http://www.83toinfinity.com
Source: http://www.math.cornell.edu/~numb3rs/blanco/social_net.jpg
Data Storage
Data Storage
Relational DatabasesStructured data
Can scale to huge volumes of data
HadoopSemi-structured/unstructured data
Massively parallel storage and processing
Relational Database
Source: http://www.ntu.edu.sg/home/ehchua/programming/sql/images/ManyToOne.png
Unstructured Data
Source: http://storagegaga.com/2011/12/
Semi-structured
Source: http://www.stylusstudio.com/images/figures/sql_xml_xml_fragment.gif
What Solution to Pick?
Data Volume and SpeedRelational Databases Will Cap out
Big Data Stores Scale (For Now)Hadoop
Spark
Lucene
Alternative Modeling TechniquesHyper Normalized (6-8NF)Inmon's Textual Disambiguation
Anchor Modeling
Data Vault
Hadoop
Version 1Giant data store
File distribution
File parsing tools
Generic security
Version 2Giant data store
Replaced foundation work
Unified security -LDAP/Kerberos support
Tools
Oozie
Hive
NoSQL DatabasesHbase
MongoDB
JSON
{"employees": [{ "firstName":"John" , "lastName":"Doe" },{ "firstName":"Anna" , "lastName":"Smith" },{ "firstName":"Peter" , "lastName":"Jones" }]}
Source: http://www.w3schools.com/json/json_syntax.asp
How to Analyze?
Performance
Timeliness
Accuracy
Feedback
Big Data Solutions
Search the entire data set
Great performance
Highly accurate
Integrates into Analytics toolsOnly some of the tools are able to support Hadoop, etc.
Statistics
Designed for all sizes of data sets
Decreases time to results
As accurate as needed
Analytics tools fully support
Most Big Data tools support
Analytics Tools
Can access data of most sizesMost can handle Hadoop and some NoSQL databases
Built for Predictive Modeling
Starting to handle social/network modeling
How to Get Started
Grab some tools!RapidMiner (http://rapidminer.com/)
R (http://www.r-project.org/)
Weka (http://www.cs.waikato.ac.nz/ml/weka/)
Grab some data!http://www.kdnuggets.com/datasets/index.html
http://aws.amazon.com/publicdatasets/
http://www.reddit.com/r/datasets
Prizes/Challenges
Kaggle - https://www.kaggle.com/
MIT - http://bigdata.csail.mit.edu/challenge
Heritage Health Prize - http://www.heritagehealthprize.com/c/hhp
Twitter - @OpenDataAlex
LinkedIn alexmeadows
Github - dbaAlex
Questions? Comments?
top related