École d'été: web science and the mind :uqam

Post on 24-Apr-2015

118 Views

Category:

Internet

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation to the Web Science summer school at UQAM, on the rise of the data scientist in the new economy

TRANSCRIPT

The opportunity for Social Data Scientists

@cgtheoret

Part 1 The Explosion

@cgtheoret

@cgtheoret

Every minute 8-10 months ago:

• 48 hours of video are downloaded on Youtube• 320 new accounts and 98,000 tweets appear

on Twitter• 168,000,000 million emails are sent • 20,000 new posts on Tumblr• 6,600 photos appear on Flickr• Over 20% of all websites are

CMS/wordpress/etc…

Every minute today:

• 100 hours of video are downloaded on Youtube

• ??? new accounts and 236,000 tweets appear on Twitter

• 204,000,000 million emails are sent • 28,000 new posts on Tumblr• 1,600 photos appear on Flickr !!! No shit!

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

But…• Facebook has lost 1.5 million users in Canada

and 6 million in the United States • Yahoo study: 50% of the content that is read

and shared by humans is produced by only 20, 000 accounts 0.05%

@cgtheoret

@cgtheoret

Gartner is predicting an explosion in Social Media Analytics It spending

@cgtheoret

@cgtheoret

@cgtheoret

In a lot of ways Social “Big Data” is like Oil…• Difficult and expensive to extract

@cgtheoret

Difficult and expensive to extract

@cgtheoret

Difficult and expensive to store and distribute

Cheapest (and least useful) when its unrefined

@cgtheoret

@cgtheoret

@cgtheoret

In a lot of ways “Big Data” is like Oil…• Can’t be used by consumers unless refined• More expensive at every step of refinement

@cgtheoret

The Market is Producing a plethora of derived higher value data products

@cgtheoret

@cgtheoret

In a lot of ways “Big Data” is like Oil…

• Difficult and expensive to extract• Difficult and expensive to store and distribute• Cheapest in its unrefined form• More expensive at every step of refinement• Produces a plethora of derived products• and it’s actually quite “dirty”!!!!

@cgtheoret

Part 2

Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition

VERACITY

@cgtheoret

Social Data Analytics = Oil Refineries

@cgtheoret

@cgtheoret

6 factors affect Data Veracity …

1. Accuracy: Is it true?2. Precision: If true, error margin?3. Reliability: Is it there all the time?4. Provenance: Can you trace the source?5. Fidelity: Did it change from the

source?6. Permission: Can you use it for the

context?

Black Hat SEO : Blogs

Twitter: 46% of brand followers are bots

Black Hat Social Marketing : Twitter

Or in some cases over 90 %…

Dissapearing Romney: FB as well…

And it is getting worse …

Trying to solve the Veracity problem …

Trying to solve the Veracity problem …

The Big Guys are now doing Veracity …

Murali Krishnam <murali.krishnam@saama.com>Murali Krishnam <murali.krishnam@saama.com>

@cgtheoret

Part 3The Opportunity for Social Data Scientists

@cgtheoret

@cgtheoret

“McKinsey Global Institute estimated that by 2018 there will be 4 million big data related positions in the U.S. that require quantitative and analytical skills. However, there will be a potential shortfall of 1.5 million data-savvy managers and analysts to fill these positions”

@cgtheoret @fffady

Zeitgeist

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret

cg.theoret@nexalogy.com

@cgtheoret

Merci!

top related