online news popularity analysis

22
WEB ANALYTICS - ONLINE NEWS POPULARITY TEAM – 11 KRUTIKA DEDHIA KINJAL GADA ANKUR VORA ADVANCES IN DATA SCIENCES AND ARCHITECTURE - PROF. SRIKANTH KRISHNAMURTHY

Upload: ankur-vora

Post on 15-Apr-2017

116 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Online news popularity analysis

WEB ANALYTICS - ONLINE NEWS POPULARITY

TEAM – 11

KRUTIKA DEDHIA

KINJAL GADA

ANKUR VORA

ADVANCES IN DATA SCIENCES AND ARCHITECTURE

- PROF. SRIKANTH KRISHNAMURTHY

Page 2: Online news popularity analysis

INTRODUCTION

• The dataset summarizes a set of features about articles published by Mashable, a well-known news website over a period of two years.

• The objective is to predict the number of shares depending on the features if the article to be published would be popular on the internet or no.

Page 3: Online news popularity analysis

GOALS

• Create and evaluate regression, classification and clustering models in Microsoft Azure Machine Learning Studio.

• Deploy the models as a web service to generate a REST API.

• Build the interactive web interface to predict the results.

Page 4: Online news popularity analysis

DATASET

• Data Source : UCI ML Repository

https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity

• Number of attributes: 61

• Number of records: 39,645

• Dependent variable: Number of shares

Page 5: Online news popularity analysis

DATA MODIFICATION

• Type of Data : 1 – business, 2 – lifestyle, 3 – entertainment, 4 - social media, 5 – technology, 6 – world

• Extracted the date from the URL column.

• Day of week : 0 – Sunday, 1 – Monday, 2 – Tuesday, 3 – Wednesday, 4 – Thursday, 5 – Friday, 6 – Saturday

• Web Scraping : Topics, Channel, Author

Page 6: Online news popularity analysis

PROCESS

• Created training models for regression, classification and clustering in Azure ML.

• Created predictive experiment for the above trained models.

• Deployed the models as a web service and generated a REST API.

• Designed UI using Java Spring MVC, HTML, Bootstrap, Ajax along with user validations.

Page 7: Online news popularity analysis

MACHINE LEARNING ALGORITMHS

Page 8: Online news popularity analysis

REGRESSION MODELS

• Used Azure ML regression modules

• Decision Forest, Neural Network, Poisson Regression and Boosted Decision Tree

• Best Model: Random Forest based on lowest RMSE value

Page 9: Online news popularity analysis

RANDOM FOREST

Page 10: Online news popularity analysis

CLASSIFICATION MODELS

• Used Azure ML classification components Two Class Decision Forest, Two Class Neural Network and Two Class Boosted Decision Tree

• Added attribute isPopular :

• Shares <= 1400 : high popular

• Shares > 1400 : less popular

• Best Model : Two Class Boosted Decision Tree Based on the high Accuracy and

AUC value

Page 11: Online news popularity analysis

TWO CLASS BOOSTED DECISION CLASSIFICATION

Page 12: Online news popularity analysis

CLUSTERING MODELS

• Used K-means Clustering

• No of clusters used is 3 (k = 3).

• Determines the distance of articles based on a few parameters from the centroid of clusters.

Page 13: Online news popularity analysis

DEMO

• Web User Interface

Page 14: Online news popularity analysis

ANALYSIS

Page 15: Online news popularity analysis

TABLEAU ANALYSIS

Page 16: Online news popularity analysis

TABLEAU ANALYSIS

Page 17: Online news popularity analysis

TABLEAU ANALYSIS

Page 18: Online news popularity analysis

CHALLENGES

• Formatting data after Web Scraping.

• Understanding the variables like keywords, subjectivity.

• Finding relation between variables and feature selection for modelling.

Page 19: Online news popularity analysis

LINKS

• URL – http://sample-env-1.xhmp4ynr7g.us-east-1.elasticbeanstalk.com/

• Github – https://github.com/voraankur/ADS/tree/master/Final%20Project

Page 20: Online news popularity analysis

REFERENCES

• https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity

• https://repositorium.sdum.uminho.pt/bitstream/1822/39169/1/main.pdf

Page 21: Online news popularity analysis

CONTRIBUTION

• Ankur – Regression Models, Web Interface

• Kinjal – Data cleaning, Web Scraping, Clustering, Report

• Krutika – Classification Models, Presentation, Tableau Analysis

Page 22: Online news popularity analysis

THANK YOU