behind the tv shows: top-rated series characterization and...

Linear regression models Features included Estimate of generalizati on error Linear regression All 0.3456 Locally weighted linear regression All 0.3398 Linear regression with selected features and interaction effects between original features Features selected by backward selection 0.3321 The results demonstrate that the assumptions of linear regression on IID normally distributed error terms were not perfectly met. So we considered the following improvements. 3. Improved Model II: Linear Regression with Selected Features and Interaction Effects between Original Features We used backward search starting with the full sets including all possible interaction features. For each iteration of the backward search, we did cross validation to evaluate the model. The selected model gave a lower error. Behind the TV Shows: Top-Rated Series Characterization and Audience Rating Prediction Yushu Chai – Yiwen Xu – Zihui Liu CS 229 – Machine Learning Professor Andrew Ng In this project, we are going to use the viewers’ ratings on IMDb website as an indicator of audience’s preferences of various TV series, try to discover features of well-received TV series and make recommendations. Motivation The TV series data with record on IMDb were provided by Andrej Krevl from Stanford SNAP Lab. Data Classification Unsupervised Learning Acknowledgements & References Name Type Genre Vector of Booleans Release Year Numeric # Seasons Numeric # Ratings Numeric # Critics Numeric # Ep./Season Numeric # Reviewers Numeric Run Time Numeric Aspect Ratio Categorical Color Booleans Comparisons Between Models Models and Discussions Models and Discussions (Cont’d) [1] Ng. Andrew, CS 229 Lecture notes. [2] James, Gareth, et al. An introduction to statistical learning. New York: springer, 2013. 373-413 [3] N.A. (2015, Oct. 15). The Internet Movie Database. Retrieved from http://www.imdb.com/ 5. Unsupervised Learning • K-means clustering. K = 3 because we would like to find a more reasonable segment for the three classes above. But other values of K will also be considered. • PCA The plots are projections of the data onto the first three principle components, i.e. the scores for the first three principal components. The observations of different subgroups lie near each other in the low-dimensional space. Original Rating Class Counts # in this group Misclassification Rate < 8 Fair 65 52 0.08 8-9 Good 148 166 9-10 Very Popular 12 5 4. Classification Based on the current dataset, we segment the ratings into several categories: Fair (rating below 8), Good (8-9), and Very Popular (9-10), to fit a classification model using L1- regularized logistic regression. When predicting the ratings of an upcoming TV show, this classification model can predict the most likely category for it. We want to thank Professor Ng for such an inspiring course and this project opportunity, and CS 229 TA’s, in particular Youssef Ahres, for their help and support. Linear Model Diagnostic Plots Feature Selection Process

Upload: others

Post on 18-Aug-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Behind the TV Shows: Top-Rated Series Characterization and ...cs229.stanford.edu/proj2015/256_poster.pdf · Behind the TV Shows: Top-Rated Series Characterization and Audience Rating

Linear regression models Features included Estimate of

generalizati

on error

Linear regression All 0.3456

Locally weighted linear regression All 0.3398

Linear regression with selected

features and interaction effects

between original features

Features selected by

backward selection

0.3321

The results demonstrate that the assumptions of linear

regression on IID normally distributed error terms were

not perfectly met. So we considered the following

improvements.

3. Improved Model II: Linear Regression with Selected

Features and Interaction Effects between Original Features

We used backward search starting with the full sets including all

possible interaction features. For each iteration of the backward

search, we did cross validation to evaluate the model. The

selected model gave a lower error.

Behind the TV Shows: Top-Rated Series Characterization and Audience Rating Prediction

Yushu Chai – Yiwen Xu – Zihui Liu

CS 229 – Machine Learning Professor Andrew Ng

In this project, we are going to use the viewers’ ratings

on IMDb website as an indicator of audience’s

preferences of various TV series, try to discover

features of well-received TV series and make

recommendations.

Motivation

The TV series data with record on IMDb were provided by

Andrej Krevl from Stanford SNAP Lab.

Data

Classification

Unsupervised Learning

Acknowledgements & References

Name Type

Genre Vector of Booleans

Release Year Numeric

# Seasons Numeric

# Ratings Numeric

# Critics Numeric

# Ep./Season Numeric

# Reviewers Numeric

Run Time Numeric

Aspect Ratio Categorical

Color Booleans

Comparisons Between ModelsModels and Discussions

Models and Discussions (Cont’d)

[1] Ng. Andrew, CS 229 Lecture notes.

[2] James, Gareth, et al. An introduction to statistical learning. New York: springer, 2013.

373-413

[3] N.A. (2015, Oct. 15). The Internet Movie Database. Retrieved from

http://www.imdb.com/

5. Unsupervised Learning

• K-means clustering. K = 3 because we would like to find

a more reasonable segment for the three classes above.

But other values of K will also be considered.

• PCA

The plots are projections of the data onto the first three

principle components, i.e. the scores for the first three

principal components. The observations of different

subgroups lie near each other in the low-dimensional

space.

Original Rating Class Counts # in this group Misclassification

Rate

< 8 Fair 65 52

0.088-9 Good 148 166

9-10 Very Popular 12 5

4. Classification

Based on the current dataset, we segment the ratings into

several categories: Fair (rating below 8), Good (8-9), and

Very Popular (9-10), to fit a classification model using L1-

regularized logistic regression. When predicting the ratings

of an upcoming TV show, this classification model can

predict the most likely category for it.

We want to thank Professor Ng for such an inspiring course and this project opportunity,

and CS 229 TA’s, in particular Youssef Ahres, for their help and support.

Linear Model Diagnostic Plots Feature Selection Process

RATED WHAT YOU DID LAST SUMMERs3.drafthouse.com/menus/Oct._2019_Final_Chandler__Drinks.pdf · 2019-10-30 · RATED PG-13 RATED G 235 cal RATED R RATED PG WHAT YOU DID LAST SUMMER

Fire Stopping Guide - Passive Fire Systems, Fyreflex Fire ... Selector Guide... · Fire Rated Pillows Fire Rated Batts Fire Rated Mortar Fire Rated Downlight Covers Fire Rated Access

The Power of Oil Palm2011, Guatemala was rated ninth among palm oil–exporting countries worldwide and second in Latin America, only behind Ecuador. The cultivated area dedicated

BIDDING REQUIREMENTS - Memphis · BIDDING REQUIREMENTS REMOVAL AND REPLACEMENT OF DAMAGED PLUMBING COMPONENTS BEHIND FIRE RATED CHASE WALLS IN 24 DWELLING UNITS AT BARRY TOWERS [TN

Charles Thomas Homes - Department of Energy · 2015-10-06 · Charles Thomas Homes ... drywall at hard-to-insulate spots such as behind the tubs and shower surrounds. ... rated faucets

· PCX2 335M. 250Vac , luF 250Vac , iuF rated : 250Vac 3300pF 3300pF JN, rated 250Vac AH, rated : 250Vac , BBI 130 series, rated : 250Vac Epcos Illinois Evox Rifa

PRODUCT MANUAL - SA spa · 2018. 12. 13. · Rated Power(infrared): 2500W（K095） Rated Power(steam): 3000W Rated Current: 13.2A Rated frequency: 50Hz Insulated Resistance: >20MΩ

306 Series Biaxial Load Frames - shorewestern.com · 306 Series Biaxial Load Frames Rigid two-column fatigue-rated tension/torsion load frames for biaxial material characterization

WALK BEHIND SCRUBBERd2z4qs2e3spnc1.cloudfront.net/product_file/file/1901/...TECHNICAL SPECIFICATIONS 3-1 CUTTER 24V 86039080 01/20/12 ITEM DIMENSION/CAPACITY Nominal power 1450 W Rated

Characterization of Hadoop Jobs using Unsupervised Learningsalsahpc.indiana.edu/CloudCom2010/slides/PDF...Hadoop at Yahoo! • Behind Every Click ! • 38,000+ Servers • Largest

Austria · Austria ranks among the most prosperous and innovative countries in the European Union. According to Eurostat’s Prosperity Index, Austria is rated second in the EU behind

GENERAL CATALOGUE - · PDF fileappliances, A+ and A++. ... in product design now means that the energy label must be updated ... EER rated COP rated EER rated COP rated

THE PEOPLE BEHIND BIRT THE PEOPLE BEHIND BIRT THE PEOPLE BEHIND BIRT THE PEOPLE BEHIND BIRT THE PEOPLE BEHIND BIRT THE PEOPLE BEHIND BIRT THE PEOPLE BEHIND

BUSHINGS: GENERAL INFORMATION - · PDF fileBUSHINGS: GENERAL INFORMATION 1.0 Electrical characteristics (Ir = rated current; Ur = rated voltage) 1.1Standard insulation levels Rated

Characterization Sheet · Temp.Chara. TCBias(1/2Rated) Application Series: Dimensions Temperature Characteristic Rated Voltage Capacitance Capacitance Tolerance Dissipation Factor

asset.conrad.com€¦ · max. case temperature Product data - type 574223 rated voltage rated output power rated output current max. case temperature Product data - type 574224 rated

Prices Rated

General Introductiondaoud/uploads/product_pdf/inside_… · Tst/Tn Locked Torque Tm/Tn Max Torque Rated Current Type Rated Output Full Load KW Rated Torque Rated Torque HP Speed r/min

Characterization of Selected Bed-Sediment-Bound Organic ... · in sediment toxicity and 12% rated fair or poor in sediment contaminants respectively (EPA 2007a). The sediment contaminant

Intelligent Fire Rated Technology - iCageicage.co.uk/files/iCage Catalogue.pdf · Intelligent Fire Rated Technology Intelligent Fire Rated Technology Part C - Moisture Protection

DC Motor Φ8 FF-K30WD-XXXXX · Type FC-260SA-XXXXX 12250 Characteristics* Rated voltage V 24.0 Rated power P N 6.43 Rated troque T N 3.51 Rated speed n N 17530 Rated current I N 0.46

FIRE RATED STEEL DOORS FIRE RATED WOOD DOORS FIRE …

Asynchronous Spindle Motors...2 Asynchronous motors with solid shaft Rated power output Rated speed Max. speed Rated torque Rated current Standard bearing Spindle bearing ASM 200 M

CHINT Italiachint.it/PDF/VDE_NBH8.pdfNBH8-40 Bemessungsspannung Rated voltage Bemessungsstrom Rated current Bemessungsfrequenz Rated frequency Bemessungsschaltvermögen Rated making

Abtech BPG Fire Rated Enclosures - Fire Rated Resistant Enclosures

01fire Rated

Automatically Classifying Self-Rated Personality …venus.cs.qc.edu/~gan/paper/Automatically-Classifying-Self-Rated...Automatically Classifying Self-Rated Personality Scores from Speech

Outcrop/Behind Outcrop Characterization of Deepwater ...archives.aapg.org/education/dist_lect/slides/2001_02/slatt.pdf · Outcrop/Behind Outcrop Characterization of Deepwater

Preparation and characterization of polystyrene based ... · The overall success of the membrane technology, how-ever, is lagging behind these expectations. In some applica-tions,

Self-Rated versus Clinician-Rated Assessment of

REPETITIVE AVALANCHE AND dv/dt RATED 400V, N-CHANNEL ... · data contained herein is a characterization of the component based on internal standards and is intended to demonstrate

New Safe One · 2020. 10. 9. · Serge Ferrari. Stamisol Safe One, made of glass fibre fabric with elastomer coating, is a vertical façade membrane behind cladding rated Euroclass

High-Speed Nondestructive Testing Methods for Mapping ... · •Detection and characterization of cavities, flaws, cracks, honeycombing, and grouting defects in or behind tunnel lining

Environmental, Energy Market, and Health Characterization ...€¦ · This unit is a pellet burning HH rated at 40 kW (137,000 Btu/hour). Combustion occurs on a round burner plate

3URILOH - heko-electronic.com€¦ · Model Rated Voltage Frequency Rated Current Rated Power Rated Speed Air Flow Air pressure Noise Level Perm.Amb. Temp. Weight Part NO. VAC Hz