Download - Homework 4 - Sebastian Rojas v1
-
8/18/2019 Homework 4 - Sebastian Rojas v1
1/13
Homework 4: Time SeriesAnalysis of “Gone Girl” Daily BoxOce
Sebastian RojasRegression and Multivariate Data Analysis
Prof. Jeffrey Simonoff Fall 201
-
8/18/2019 Homework 4 - Sebastian Rojas v1
2/13
!rying to find "redi#tors for a film$s bo% offi#e is #om"le% "roblem t&at &as &aunted t&e filmindustry for years. 'n t&e last de#ade t&oug&( )oogle sear#& and so#ial media &ave been"raised as a good t&ermometer of so#ial "&enomenon. 'n fa#t in June 201*( Andrea +&en()oogle$s "rin#i"al industry analyst for media and entertainment( #laimed t&at a movie$s bo%offi#e #ould be "redi#ted as far as four ,ee-s in advan#e using )oogle sear#& 1. For t&e"ur"ose of t&is "a"er( ' ,anted to test t&is #laim #ombining t&e "ubli#ly available )oogle!rends tool ,it& Fa#eboo- data and !,itter data. !&e movie analyed for t&is "a"er ,as/)one )irl dire#ted by David Fin#&er and starring en Affle#- and Rosamund Pi-e(released in #tober *rd 201. 3o,ever( given t&e nature of t&e data "ubli#ly available( t&isrelation ,as analyed on a day4to4day simultaneous basis( and not in t&e "redi#tive ,ayde#lared by +&en.
!&e target variable on t&e analysis is t&e daily bo% offi#e of t&e movie bet,een #tober 10 t&
201 and 5ovember 6t& 201 7*0 days8 ta-en from o% ffi#e Mojo. !&e reason to use only*0 days is be#ause most of t&e "redi#ting variables are only available for *0 days "reviousto t&e moment ,&en t&e data is e%tra#ted. Data #overing t&e #om"lete "eriod from t&erelease date of t&e movie until no, ,ould be ideal( but is only available using "aid so#ial
media analyti# tools t&oug&t for businesses. !&e "redi#ting variables are t&en t&e follo,ingones9
a8 Google Trend Results for the search topic “Gone Girl”, with label “2014 Film”)oogle instantly se"arates to"i#s t&at mig&t &ave a similar title using labels( allo,ingus to differentiate t&e film from t&e &omonymous boo- on ,&i#& t&e film is based.!&e ,ay )oogle !rends results are "resented as observations in t&e range 04100.100 is set as a defining referen#e for t&e rest of t&e observations( ta-ing t&e day,&ere t&e &ig&est number of sear#&es ,ere "erformed on t&e defined "eriod. :veryot&er number is relative to t&at "ea-. )oogle doesn$t allo, setting #ustomied dates.!&e only "ossible fine4tuning is to define t&e sear#& as relative to t&e "ast ,ee-( "ast*0 days( ;0 days( *
-
8/18/2019 Homework 4 - Sebastian Rojas v1
3/13
Sin#e ,e are trying to "redi#t a money variable( and our &y"ot&esis is t&at all t&e "redi#torsare related to audien#e variables t&at s&ould a##ount for "ro"ortional #&anges in t&e targetvariable 7t&ey a#t as a sam"le of t&e overall "o"ulation interest in t&e movie8( t&e relationbet,een t&em ,as treated as multi"li#ativemulti"li#ative. !&erefore( t&e target variable as,ell as t&e "redi#tors ,ere logged on base 10. !&e variables are t&en9
+redictors
Bog 7gti89 Bogged )oogle !rendsBog 7Fus#&ange89 Bogged +&ange in Ameri#an fans of t&e Fa#eboo- "age for t&emovieBog 7t,))89 Bogged number of mentions to /)one )irlBog 7t,C))89 Bogged number of mentions to t&e offi#ial t,itter a##ount of /)one )irl
Target ariable
Bog7DD)89 Bogged Daily Domesti# )ross
!&e s#atter"lots for t&e different variables are t&e follo,ing9
.
-
8/18/2019 Homework 4 - Sebastian Rojas v1
4/13
4
-
8/18/2019 Homework 4 - Sebastian Rojas v1
5/13
All four s#atter"lots a""ear to s&o, a strong and signifi#ant relation bet,een t&e "redi#torsand t&e target variable( "arti#ularly Bog 7gti8. 'f ,e run a best subsets regression( t&e out"utis t&e follo,ing one9
Best Subsets Regression: Log(DDG) versus Log(gti),
Log(FBuschange), ...
Response is Log(DDG)
L
o
g
(
F
B L
u L o
L s o g
o c g (
g h ( t
( a t w
g n w @
t g G G
R-Sq R-Sq Mallows i e G G
Vars R-Sq (ad) (pred) !p S ) ) ) )
" #$%& #$%' #'%" % %"*+$+ ,
" ''%& '+%& *'%& &&%* %++*& ,
+ ##% #%# #% "%# %"+"*' , ,
+ #%# #$%# #'%$ %" %"*# , ,
* #&% #% #'%# *% %"+"+ , , ,
* ##%# #%$ #'%# *% %"++& , , ,
' #&% #%* #*%' $% %"+'+ , , , ,
@it& t&ese results( t&e best model a""ears to be undoubtedly t&e t,o variable model t&at-ee"s Bog7gti8 and Bog7Fus#&ange8. ?nder t&is model Mallo,s +" is t&e smallest( and
3
-
8/18/2019 Homework 4 - Sebastian Rojas v1
6/13
bot& "redi#ted R2 and adjusted R2 are ma%imied. 't is interesting to noti#e t&at ,&en ,eta-e sim"le linear model t&at only #onsiders Bog7gti8( ,&i#& from t&e s#atter "lots a""ears tobe t&e "redi#tor t&at &as t&e strongest relation ,it& t&e target variable( Mallo,s +" is
-
8/18/2019 Homework 4 - Sebastian Rojas v1
7/13
!&e Durbin4@atson statisti# for t&e regression is dG1.2=
-
8/18/2019 Homework 4 - Sebastian Rojas v1
8/13
A runs test ,as also "erformed ,it& t&e follo,ing results9
Runs test 0or SR2S"
Runs a;o
-
8/18/2019 Homework 4 - Sebastian Rojas v1
9/13
!&is s"i-e starts on Friday( and is one of t&e reasons on ,&y most movies are "remiered on!&ursdays. Sin#e t&e first observation #orres"onds to a Friday and t&ese s"i-es are s"a#edby one ,ee-( t&ere$s an auto#orrelation of order H.
'n order to #orre#t t&is( a seasonal indi#ator variable ,as added for t&e observations#orres"onding to a Friday. !&e out"ut and residual "lots for t&e ne, regression model are
t&e follo,ing ones9
Regression Analysis: Log(DDG) versus Log(gti), Log(FBuschange),FR
Method
!ategorical predictor coding ("? )
.nal/sis o0 Variance
Source DF .d SS .d MS F-Value 1-Value
Regression * *%*+'# "%"#* ""%"$ % Log(gti) " "%$+' "%$+' +'%+ %
Log(FBuschange) " % % ""% %*
FR " %+"#&* %+"#&* *"%# %
2rror + %"#$# %#
3otal +& *%'&&
Model Su44ar/
S R-sq R-sq(ad) R-sq(pred)
%#+#$# &'%&5 &'%*"5 &*%*&5
!oe00icients
3er4 !oe0 S2 !oe0 3-Value 1-Value V6F
7
-
8/18/2019 Homework 4 - Sebastian Rojas v1
10/13
!onstant '%'& %* "'% %
Log(gti) +%"'# %"*& "$%$ % +%
Log(FBuschange) -%'* %"*" -*%** %* +%
FR
" %++& %' $%$ % "%
Regression 2quation
FR
Log(DDG) 7 '%'& 89+%"'#9Log(gti) -9%'*9Log(FBuschange)
" Log(DDG) 7 '%+ 89+%"'#9Log(gti) -9%'*9Log(FBuschange)
Fits and Diagnostics 0or nusual A;ser
-
8/18/2019 Homework 4 - Sebastian Rojas v1
11/13
'f ,e loo- at t&e Auto#orrelation fun#tion under t&is ne, model( all forms of auto#orrelationseem to &ave gone a,ay9
11
-
8/18/2019 Homework 4 - Sebastian Rojas v1
12/13
Similarly( t&e "4value in t&e runs test is above t&e level of signifi#an#e( &en#e reje#ting t&enull &y"ot&esis of auto#orrelation9
Runs Test: SRS!
Runs test 0or SR2S*
Runs a;o
-
8/18/2019 Homework 4 - Sebastian Rojas v1
13/13
'n #on#lusion( t&e estimated regression eLuation t&en indi#ates t&at for t&e mont& of t&eanalysis9
• Beaving everyt&ing else fi%ed( "ro"ortional in#rements in t&e )oogle !rend 'nde%
&ave multi"li#ative effe#ts t&at are more or less eLual to t&at "ro"ortional in#rementsLuared. !&is strong relation #ould be seen in t&e initial s#atter"lot. ?nli-e so#ialnet,or- a#tivities( "erforming a sear#& on )oogle is most "robably ,&at everybodyt&at goes to t&e movies does to find a time( t&eater( e%&ibition times( et#. Part of t&ee%"onential effe#t in t&e "redi#tor #ould also be tied to t&e fa#t t&at a "ersonal sear#&on )oogle #ould im"ly more t&an one ti#-et boug&t "er sear#&.
• Beaving everyt&ing else fi%ed( Fridays &ave a H0 &ig&er daily o% ffi#e t&an t&e
rest of t&e days of t&e ,ee-.
• Beaving everyt&ing else fi%ed( a "ro"ortional in#rease in Fa#eboo- li-ers of t&e
offi#ial movie "age on Fa#eboo- is tied to a "ro"ortional de#rease in o% ffi#e t&at
is slig&tly less t&an &alf t&e "ro"ortional in#rease in Fa#eboo- li-ers. !&is result is#ertainly sur"rising( and t&e diffi#ulty to understand it #ould be eit&er be lin-ed to t&e"ossibility t&at t&e "redi#tor &el"s to #alibrate an overestimation in t&e relationbet,een t&e )oogle !rends 'nde% and Daily o% ffi#e( or s"ea-s to t&e fa#t t&at ,estill don$t fully understand &o, be&avior on so#ial net,or-s affe#ts #onsum"tion andt&e effe#t ,as not "ro"erly #oded in t&e regression.
1.