tools and techniques adopted for big data analytics

31
TAMING BIG DATA

Upload: joseph-francis

Post on 14-Jul-2015

246 views

Category:

Data & Analytics


1 download

TRANSCRIPT

TAMING

BIG DATA

Tools and techniques

adopted for Big Data

Analytics

JOSEPH FRANCIS

1BI11IM020

WHAT IS BIG DATA?

WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE BASED

SERVICES AND PRODUCTS ?

BIG DATA CHARACTERISTICS

A BRIEF HISTORY ON ORIGINS OF BIG DATA

PHASES IN BIG DATA ANALYSIS

CHALLENGES IN BIG DATA ANALYSIS

TOOLS AND TECHNIQUES FOR DATA ANALYTICS

CASE STUDIES

CONCLUSION

WHAT IS BIG DATA?

Extremely large data sets that may be analysed

computationally to reveal patterns, trends, and

associations, especially relating to human

behaviour and interactions.

WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE

BASED SERVICES AND PRODUCTS ?

BUSINESS INTELLIGENCE

DECISION SUPPORT

PREDICTIVE ANALYTICS

GOVERNMENTS

HEALTHCARE

RESEARCH

MARKETTING STRATEGIES

A brief history on origins of big data

1880 - The Start of Information Overload

8years to complete US census

1932 - The Population Boom

1956 - Virtual Memory

1966 - Centralized Computing Systems Enter the Scene

1970 - Relational Database

1985 - Manufacturing Resources Planning Systems

1989 - Business Intelligence

1995 - The World Wide Web Explodes

1999 - Predictive Analysis Changes Business as Usual

http://www.winshuttle.com/big-data-timeline/

Data Acquisition and Recording

Information Extraction and Cleaning

Data Integration, Aggregation, and Representation

Query Processing, Data Modelling, and Analysis

Interpretation

Phases in Big Data analysis

CHALLENGES IN DATA ANALYSIS

HETEROGINITY

TIMELINESS

SCALE

PRIVACY

HUMAN COLLABORATION

Tools and Techniques

a/b testing Crowdsourcing

Genetic algorithms Machine learning

Natural language processing Time series analysis

Visualization Data mining

Association rule learning Classification tree analysis

Regression analysis

a/b testing

It is a form of statistical hypothesis testing with two

variants leading to the technical term, Two-sample hypothesis

testing, used in the field of statistics.

a = H0: NULL HYPOTHESIS

b = H1:ALTERNATE HYPOTHESIS

Crowdsourcing

Crowdsourcing represents the act of a company or institution

taking a function once performed by employees and outsourcing

it to an undefined (and generally large) network of people in the

form of an open call.

Analysis of the reviews for opinion

Analysis of the interactions for need and intent

Analysis of social network interactions

Machine learning

- scientific discipline that explores the construction and

study of algorithms.

- by building a model from example inputs and using that

to make predictions or decisions.

- by dynamic instructions.

Machine learning is closely related to and often overlaps

with computational statistics; a discipline which also specializes

in prediction-making.

Indian Elections 2014

- size of the Indian electorate. With 814 million voters, in

comparison to the USA’s 193.6 million and the UK’s 45.5

million.

0

100

200

300

400

500

600

700

800

900

INDIA USA UK

- variety of data – India’s voter rolls in 12 different

languages and 900,000 PDF’s amounting to 25

million pages made for a heterogeneous, non-

uniform and deeply diverse information set.

- the veracity of the information was often questionable

one report noted that some voters were listed as 19,545

years old, and others a confounding 0 years old. Name

overlapping (there are 327,000 women named “Sita” in

Bihar alone) only further complicated the process.

A tactical scenario

BJP

WEBSITE

BIKE

WEBSITEJOB

PORTAL

INDIA DESERVES

BETTER

Air BnB

-Airbnb’s team had a hunch that better photos would

increase rentals.

-They tested the idea by putting the least effort

possible into a test that would give them valid results.

-When the experiment showed good results, they

built the necessary components and rolled it out to all

customers.

Shoppers stop

Shoppers Stop stores retails clothing,

accessories, handbags, shoes, jewelry, fragrances,

cosmetics, health and beauty products, home furnishing

and decor products.

Shoppers Stop launched its e-store with delivery

across major cities in India in 2008. The website retails

all the products available at Shoppers Stop stores,

including apparel, cosmetics and accessories. Shoppers

Stop opened stores in Amritsar, Bhopal and

Aurangabad.

After analysing its First Citizen base, the company had

observed that not all those who buy shirts also buy trousers.

But those who buy both men’s shirts and trousers

spend 60% more a year on average than those who buy only

shirts, and thrice as much as those who don’t buy men’s shirts at

all

9,00,000

- included customers who showed a pattern

of being interested in new brands in other non

trouser categories. They were sent information on

new trouser brand launches and fits.

- exhibited multiple buying patterns in

other categories. They were sent attractive deals if

they bought two or more trousers.

- “control group” to measure success or

failure of the promotions.

The campaign proved 30 % increase in sales equivalent to

30 crore