business analytics with sas em on imdb data set - group 7 - final presentation

11
Business Analytics using SAS Enterprise Miner for Group 7: Rahul Prasad Kenny McDowell Deepika Gadhella Thulasiram Rushit Sanjay Shah Vatsal Ajmera

Upload: rahul-prasad

Post on 17-Jan-2017

44 views

Category:

Documents


10 download

TRANSCRIPT

Page 1: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Business Analytics using SAS Enterprise Miner for

Group 7:Rahul PrasadKenny McDowellDeepika Gadhella ThulasiramRushit Sanjay ShahVatsal Ajmera

Page 2: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Agenda:Background and Motivation

Data Description

Data Pre-processing

Modeling in SAS EM

Classifier Evaluation

Conclusion – Business Implications

Questions ?

Page 3: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Background and Motivation• Movie- the most popular source of entertainment• Global box office revenue statistics:

• Reliance on audience of the social media and movie rating websites• Our Focus- i) Movie’s rating(the most important movie attribute) ii) Gross money that a movie makes (interesting trends related to the

commercial and financial success of movies)• Built models to predict the IMDB rating’s before the release of the movie based on different predictors• Conclusions and insights will give people in movie business to produce high rated movies and make the

process of choosing a movie easier for the average movie watcher•

Page 4: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Data Description• Our quest for data of IMDB website led us to second hand data from kaggle.com

• Motivation to choose the Data set – Rich Data content

• File Format made it easier to interpret and edit data

• An Interesting fact – “human faces in primary poster”

• Data contained characteristics for 5043 movies spanning across 100 years in 66 countries

• 2399 unique director names with 1000+ actors/actresses

• Data received had few missing values which was cleansed in the data pre-processing stage

Page 5: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

• Data set contained attributes with nominal, interval and text data type

Attribute list:

Page 6: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Data Pre-processing

• Originally, our data set contained information on 5043 movies i.e. 5043 rows and 28 predictors

• Cleaned the dataset by elimination of records with missing values in attributesMissing values: Gross, Budget, Aspect ratio and Content ratingElimination of unimportant predictors: Color and IMDB_link

• Post cleanup of all the missing value records, we got dataset of 3,754 to work on

• Missing values on 1,288 rows, about 25.5% of the dataset – still kept the data set rich

Page 7: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Reason for not using Impute or Replacement node of SAS EM:Rich Data set about 3754 records after elimination

The attributes with missing values are of the likes of Gross, Budget, Content Rating, and Fb likes which are factual in nature and imputing these values may disrupt the underlying natural logic that would be used for prediction leading to inaccurate predictions

It was just not feasible to research the missing values and manually fix the missing field

Page 8: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Modeling in SAS EM

1. Objective

2. Analysis of Predictors

3. Predictor Transformations

4. Train and Valid Data sets

5. Predictive Models

6. Model Evaluation

Page 9: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Classifier Evaluation Variant of Hold Out; based on data partitioning in the ratio of 1:1 between training and validation

data Model evaluation criterion dependent on the target data type Model evaluation based on Average Squared Error for Validation data set Neural Network emerged as the champion

Page 10: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Conclusion – Business Implications• Our analysis led to very interesting insights for the critical and commercial success

of a movie

• The success of a movie has correlations with social media presence, user reviews, actor/director popularity, movie duration and budget

• Social Media Presence: Crucial marketing strategyData showed strong correlations between social media brand value like on Facebook and

high movie ratingsSocial media presence on websites like Facebook, YouTube, and Instagram will result in

higher movie ratings and thus better business at the box-office

• User Reviews: This influences a prospective viewer’s decision in the entertainment or “experience” industryPeople are relying heavily on user ratings when deciding to see a new or old movie

Interesting insights:o Actor/Director Popularity – negatively correlated to the movie ratingo Duration – longer the movie better the ratingso Budget – weak correlation with movie rating but strongly related to the commercial success

Page 11: Business Analytics with SAS EM on IMDB Data Set - Group 7 - Final Presentation

Questions ?