big data

Post on 13-Nov-2014

130 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

It's about a book I read recently http://www.amazon.com/Big-Data-Revolution-Transform-Think/dp/0544002695

TRANSCRIPT

Big Datame@zynick.com

26th Dec 2013

Google Flu Trends Prediction (2008)● Epidemiologists use early detection of disease outbreak to reduce number

of people affected

● CDC (Centers of Disease Control and Prevention) collects Influenza-like Illness (ILI) from its surveillance network and from its surveillance network and publishes weekly

Google Flu Trends Prediction (2008)

Hurricane in 2004

Hurricane in 2004

Result: 7 times their normal sales rate!

Grammar Checking (Machine Learning) Algorithms

● Improve algorithm? Or pump in more data● Testing

○ 1 million, 10 million, 100 million, 1 billion data● Result

○ Worst algorithm perform better when it has billion data■ Accuracy rate from 75% to 95%

○ Best algorithm perform worst when it has billion data ■ Accuracy rate from 85% to 94%

Farecast.com (2006)

● Flight Price Prediction○ Model had no understanding of why, only what.

● Accuracy of 74.5%● Average $50 saving per Ticket● $10 million in potential customer savings● Acquired by Microsoft

○ Bing.com/travel

http://www.prnewswire.com/news-releases/farecast-launches-new-tools-to-help-savvy-travelers-catch-elusive-airfare-price-drops-this-summer-58165652.html

● Analyzing 4 Millions Product Using 25 Billion Price Observation○ Identifies data that people had never been able to

‘see’ before, i.e. prices might temporarily increase for older models once new ones are introduced

● Price prediction 77% accurate● Average savings $87 per product● Total savings $72 million+● Acquired by Ebay[1]http://techcrunch.com/2012/05/03/decide-com-brings-its-price-comparisons-to-ipad-reveals-plans-to-expand-to-household-goods-cars/[2]http://newbooksinbrief.com/2013/03/21/31-a-summary-of-big-data-a-revolution-that-will-transform-how-we-live-work-and-think-by-viktor-mayer-schonberger-and-kenneth-cukier/

Decide.com (2011)

UPS

● Use geo local data in multiple ways○ Sensors, wireless modules, gps○ Predict engine trouble○ Know the truck whereabouts (in case of delays)

● Monitor employees● Scrutinize itenary to optimie route● Result (2011):

○ 30m miles, 3m gallon of fuel saving● Safety efficiency, few turns, which tends to

lead to accidents, waste time, consume more fuels when struck in jam

Pregnancy Prediction

● Shopping behavior is about to change - explore for new brands and loyalty

● Baby gift registry, lotions (@ 3rd month), supplement (magnesium, calcium, zinc, etc)

● Pregnancy Prediction Score● Sends coupon

* http://icebreakerconsulting.com/target-predicts-pregnancy-with-big-data

Geo Local Data

● Targeted advertising on where he is located, or where he is to go

● Aggregated to reveal trend● Detects traffic jam without seeing the car -

number speed of smartphone travel in highway

● Estimate how many protesters turn out at a demonstration

Data Reuse (Secondary Usage)

● Google Street View○ Primary Usage: Street View○ Secondary Usage: Collecting Geo Local Data, Open

Wifi Connection to improve GPS Location

● Amazon○ Primary Usage: Sales○ Secondary Usage: Book Recommendation

Values of Big Data

● Data can be grabbed easily and cheaply● What > Why (corrrelation vs causation)● Traditional Sampling (n), Big Data (n=ALL)● Quantification > Qualification

Values of Big Data

● Data Driven○ Less Bias○ More Accurate○ Faster Result

● Pattern Prediction○ Saves lives○ Predict problem and correct them before the user

realize there were something wrong

Big Data 3 Major Shift

● Ability to analyze vast amount of data about a topic rather than settle for a smaller set

● Willingness to embrace data of messiness rather than privilege exactitude

● Growing respect correlation vs continue quest of causality

Correlation vs Causation

● Cause → Effect● Correlation → Effect

○ Correlation → Cause? Optional

● Chris Anderson○ Big Data make Science Method Obsolete○ “With enough data, the numbers speak for

themselves”

* http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

Is Correlation Good Enough?

It Depends.“For many everyday needs, knowing what not why is good enough.” The book is full of such examples from making better diagnostic decisions when caring for premature babies to which flavor Pop-Tarts to stock at the front of the Walmart store before a hurricane. Big data can help answer these questions, but they never required “knowing why.” Big data analysis can be about correlations OR causation—it all depends, as it has always been, on what question we are asking, what problem we are solving, and what goal we are trying to achieve.

Is Correlation Good Enough?

“If millions of electronic medical records reveal that cancer sufferers who take a certain combination of aspirin and orange juice see their disease go into remission, then the exact cause for the improvement in health may be less important than the fact that they lived. Likewise, if we can save money by knowing the best time to buy a plane ticket without understanding the method behind airfare madness, that’s good enough.”

Risk (The Dark Side of Big Data)

● Privacy Invasion○ Viewing Data in a Lower Level○ NSA, GCHQ○ Dangerous when falls into the wrong hands

● Minority Report (2002)○ “If we hold people responsible for predicted future

acts, ones they may never commit, we also deny that humans have a capacity for moral choice.”

Embracing Big Data

● Data● Skills● Ideas (Big Data Mindset)

Things to Aware Of

● Data Validity○ books you read 10 years ago may not be applicable

for amazon recommendation anymore

Questions?Read the book.

End.me@zynick.com

top related