big data technologies
TRANSCRIPT
Introduction
• Big Data is Data that is hard to capture, store, and analyze with commonly used software tools due to its very large size
• “World’s nervous system—a real-time feedback loop which didn’t exist before” - Yahoo CEO Marissa Mayer
January 25, 2013www.societyconsulting.co
m2
• Mobile devices, smart energy meters, remote sensing, wireless sensors, software machine logs, cameras, rfid readers, etc. are creating massive amounts of data that businesses & governments now have the opportunity to analyze and act upon.
• Every day approx 2.5 quintillion (2.5×10^18) bytes of data is created.
• Business and economic possibilities of big data and its wider implications are important issues that business leaders and policy makers will tackle in the years ahead
Why you should care?
January 25, 2013www.societyconsulting.co
m3
Industry verticals using Big Data
January 25, 2013www.societyconsulting.c
om4
Digital Media & E-Commerce
Real-time ad targeting, Web analytics & trends
Energy and Utilities Smart meter analytics, Asset management
Financial Services Risk and fraud management, Portfolio management, Customer analytics
Government Threat Management, Law Enforcement (Real-time multimodal surveillance, Cyber security detection), Macro economic analytics
Healthcare and Life Sciences
New drug development, Medical record text analytics, Genomic analytics
Retail CRM, Targeted marketing analysis, Vendor delivery & Supply chain optimizations, Market basket analysis, Click-stream analysis
Telecommunications CRM, Call detail record analysis, Least cost routing, Fraud management
Transportation Logistics optimization, Traffic congestion
Any industry vertical which accumulates a sufficient quantity of data can leverage Big data technologies. Here are some of the verticals
Big Data landscape/technologies
January 25, 2013www.societyconsulting.c
om5
Source: http://www.forbes.com/sites/oracle/2012/12/13/billions-of-reasons-to-get-ready-for-big-data/http://www.rosebt.com/1/post/2012/6/big-data-vendor-landscape.htmlhttp://www.dataart.com/software-outsourcing/big-data http://www.capgemini.com/technology-blog/2012/09/big-data-vendors-technologies/
Big Data Process/Steps
January 25, 2013www.societyconsulting.c
om6
Data processing steps at a basic level can be broken into three stages. Data as being raw indicators, information as the meaningful interpretation of those signals, and insight as an actionable piece of knowledge.
• Consider 10 million page views a day on a popular web site• Capture User id for every page view and store them
as integer• 10 million x 4 bytes = 40 MB of storage/day• 40MB x 30 days = 1.17 GB/month
• Data quickly grows and so does challenges around storage, processing and analytics.
Why Web Analytics quickly leads to Big Data Science
10^7 elementsDomain of 32 – bit integers 40MB / day
January 25, 2013www.societyconsulting.co
m7
New Algorithm techniques in traditional computing• Probabilistic Data structures
• Cardinality Estimation, Frequency Estimation, Range Query, Membership Query etc.
Distributed computing /Divide and Conquer• Break processing units into equal parts, get individual
results, and aggregate• Distributed systems are complex to build and maintain
• Depended on academia & research labs for renting compute
Dealing with large datasets
January 25, 2013www.societyconsulting.c
om8
Traditional Distributed system challenges
Data exchange requires synchronization
Temporal dependencies are complicated
Difficult to deal with partial failures of the system
Mostly at compute time, data is copied to the compute nodes
Developers spend more time designing for failure than they do actually working on the problem itself
Transferring data to compute nodes becomes a bottleneck• Typical disk data transfer rate: 75MB/sec -- Time
taken to transfer 100GB of data to the processor: approx 22 mins.
New approach is needed
January 25, 2013www.societyconsulting.c
om9