data science project presentation
DESCRIPTION
TRANSCRIPT
DATASCIENCE *PROJECTKEVIN BLUER
DSRUPTION
http://www.dsruption.com/
TRENDS TECHNOLOGY
http://www.dsruption.com/trend/wearable-computing
GOALS
Derive Insight from Dsruption (www.dsruption.com)
Focus on establishing company (startup) momentum & insights
#1 Article popularity (FB / Twitter shares)
#2 Auto generation of article tags
FEATURES
dsruption.activity, 691 documents (744 KB)
dsruption.articles, 14022 documents (125.61 MB)
dsruption.comment, 43 (40 KB)
dsruption.companies, 524 (3.65 MB)
dsruption.tags, 329 (40 KB)
dsruption.trends, 32 (140 KB)
dsruption.users, 39 (632 KB)
TECHNOLOGIES
MongoDB
JavaScript and Node.js
D3.js
Hadoop
Python
Facebook and Twitter API’s
ARTICLE POPULARITY
IMPORTINGTWEETS & SHARES
http://www.dsruption.com/dwolla/json-social
SIMPLE D3.JS VISUALIZATION
http://www.dsruption.com/dwolla/visualize
COMPANY TAGS FROM ARTICLES
HADOOP -> MONGO
http://www.dsruption.com/dwolla/articles
http://www.dsruption.com/data/dwolla.json
BEAUTIFUL SOUP<p><ul><li><span style="font-family: arial;"><i>100,000 refrigerators and freezers have now made their way through the revolutionary UNTHA Recycling Technology system</i></span></li><li><span style="font-family: arial;"><i>Innovative recycling system reduces landfill waste and greenhouse gas and ozone-depleting substance emissions</i></span></li><li><span style="font-family: arial;"><i>Initiative has diverted 5.5 million pounds of material from U.S. landfills<b><a href="#_ftn1" name="_ftnref1">[1]</a></b></i></span></li></ul> </p><p style="text-indent: -0.25in;"><i><b><br/><br/></b></i></p><p><div><br/><div id="ftn1"> </div> </div></p>
100,000 refrigerators and freezers have now made their way through the revolutionary UNTHA Recycling Technology systemInnovative recycling system reduces landfill waste and greenhouse gas and ozone-depleting substance emissionsInitiative has diverted 5.5 million pounds of material from U.S. landfills.
LOTS OF NOISE
http://www.dsruption.com/dwolla/words
EXCLUDE NOISE
count: 252, word: "Dwolla”
count: 73, word: "money”
count: 45, word: "photo”
count: 44, word: "people”
count: 42, word: "pay”
count: 39, word: "payment”
count: 35, word: "payments”
count: 34, word: "business"
WHAT’S NEXT?
Sentiment Analysis (both on the articles / comments)
Integration of Additional Datasets (Crunchbase, etc)
Broader Visualization
THANK YOU