data science project presentation

16
DATA SCIENCE * PROJECT KEVIN BLUER

Upload: kevin-bluer

Post on 22-Apr-2015

244 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data science project presentation

DATASCIENCE *PROJECTKEVIN BLUER

Page 4: Data science project presentation

GOALS

Derive Insight from Dsruption (www.dsruption.com)

Focus on establishing company (startup) momentum & insights

#1 Article popularity (FB / Twitter shares)

#2 Auto generation of article tags

Page 5: Data science project presentation

FEATURES

dsruption.activity, 691 documents (744 KB)

dsruption.articles, 14022 documents (125.61 MB)

dsruption.comment, 43 (40 KB)

dsruption.companies, 524 (3.65 MB)

dsruption.tags, 329 (40 KB)

dsruption.trends, 32 (140 KB)

dsruption.users, 39 (632 KB)

Page 6: Data science project presentation

TECHNOLOGIES

MongoDB

JavaScript and Node.js

D3.js

Hadoop

Python

Facebook and Twitter API’s

Page 7: Data science project presentation

ARTICLE POPULARITY

Page 10: Data science project presentation

COMPANY TAGS FROM ARTICLES

Page 12: Data science project presentation

BEAUTIFUL SOUP<p><ul><li><span style="font-family: arial;"><i>100,000 refrigerators and freezers have now made their way through the revolutionary UNTHA Recycling Technology system</i></span></li><li><span style="font-family: arial;"><i>Innovative recycling system reduces landfill waste and greenhouse gas and ozone-depleting substance emissions</i></span></li><li><span style="font-family: arial;"><i>Initiative has diverted 5.5 million pounds of material from U.S. landfills<b><a href="#_ftn1" name="_ftnref1">[1]</a></b></i></span></li></ul> </p><p style="text-indent: -0.25in;"><i><b><br/><br/></b></i></p><p><div><br/><div id="ftn1"> </div> </div></p> 

  

100,000 refrigerators and freezers have now made their way through the revolutionary UNTHA Recycling Technology systemInnovative recycling system reduces landfill waste and greenhouse gas and ozone-depleting substance emissionsInitiative has diverted 5.5 million pounds of material from U.S. landfills.

Page 14: Data science project presentation

EXCLUDE NOISE

count: 252, word: "Dwolla”

count: 73, word: "money”

count: 45, word: "photo”

count: 44, word: "people”

count: 42, word: "pay”

count: 39, word: "payment”

count: 35, word: "payments”

count: 34, word: "business"

Page 15: Data science project presentation

WHAT’S NEXT?

Sentiment Analysis (both on the articles / comments)

Integration of Additional Datasets (Crunchbase, etc)

Broader Visualization

Page 16: Data science project presentation

THANK YOU