a report on baseball using r
TRANSCRIPT
UC DAVIS
FALL STA 141 LANG
FINAL PROJECT
American BaseballAnd
A Collection of Thoughts
Author:
Ray Peralta
ID: 997579589
December 17 2014
1
Contents
1 HW 6 61.1 Part 1: Results . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Part 1 i . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.2 part 1 ii . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Part 2: Results . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.1 Problem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Problem 2 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Problem 3 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.4 Problem 4 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.5 Problem 5 . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.6 Problem 6 . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.7 Problem 7 . . . . . . . . . . . . . . . . . . . . . . . . . 10
Problem 7A . . . . . . . . . . . . . . . . . . . . . . . . 10Problem 7B . . . . . . . . . . . . . . . . . . . . . . . . 10Problem 7C . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.8 Problem 8 . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.9 Problem 9 . . . . . . . . . . . . . . . . . . . . . . . . . 20
Problem 9A . . . . . . . . . . . . . . . . . . . . . . . . 20Problem 9B . . . . . . . . . . . . . . . . . . . . . . . . 20Problem 9C . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.10 Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . 21
2 APPENDIX: Plots for HW6 232.1 Problem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.1 Graphs: Number of Games played by World SeriesWinners and Losers . . . . . . . . . . . . . . . . . . . . 24
2.2 Problem 7C . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2.1 Graphs: Baseball Team Payrolls 1971-2013 . . . . . . . 25
2
2.3 Problem 9A . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 Graphs: American League Payroll . . . . . . . . . . . . 262.3.2 Graphs: National League Payroll . . . . . . . . . . . . 27
2.4 Problem 9B . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.1 Graphs:Division ALC Team Payroll . . . . . . . . . . . 282.4.2 Graphs:Division ALE Team Payroll . . . . . . . . . . . 292.4.3 Graphs:Division ALW Team Payroll . . . . . . . . . . . 302.4.4 Graphs:Division NLC Team Payroll . . . . . . . . . . . 312.4.5 Graphs:Division NLE Team Payroll . . . . . . . . . . . 322.4.6 Graphs:Division NLW Team Payroll . . . . . . . . . . . 33
2.5 Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.5.1 Graphs: Distribution of Home Runs from 1875-2013 . . 34
3 APPENDIX: EXPLICIT R CODE 353.1 Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Part 1i . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1.2 Part 1ii . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Functions used in Part 2 . . . . . . . . . . . . . . . . . . . . . 383.2.1 sortTeamReturnSalary(dataFrame,name) . . . . . . . . 383.2.2 sortTeamYearly(dataFrame,name) . . . . . . . . . . . . 383.2.3 sortYearlyPay(dataFrame,year) . . . . . . . . . . . . . 393.2.4 frameFill(list) . . . . . . . . . . . . . . . . . . . . . . . 393.2.5 sortTeam . . . . . . . . . . . . . . . . . . . . . . . . . 403.2.6 sortYearMean(dataFrame, year) . . . . . . . . . . . . . 403.2.7 meanFrame(list) . . . . . . . . . . . . . . . . . . . . . . 403.2.8 plotStart(plotData,meanData,listNames) . . . . . . . . 413.2.9 plotInformation(dataFrame) . . . . . . . . . . . . . . . 423.2.10 divisionPlot(division,leagueAL,leagueNL) . . . . . . . . 423.2.11 groupFrame(list) . . . . . . . . . . . . . . . . . . . . . 433.2.12 plot.HR(list) . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.3.1 Problem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 443.3.2 Problem 2 . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.3 Problem 3 . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.4 Problem 4 . . . . . . . . . . . . . . . . . . . . . . . . . 463.3.5 Problem 5 . . . . . . . . . . . . . . . . . . . . . . . . . 463.3.6 Problem 6 . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.7 Problem 7 . . . . . . . . . . . . . . . . . . . . . . . . . 47
3
Problem 7A . . . . . . . . . . . . . . . . . . . . . . . . 47Problem 7B . . . . . . . . . . . . . . . . . . . . . . . . 48Problem 7C . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.8 Problem 8 . . . . . . . . . . . . . . . . . . . . . . . . . 483.3.9 Problem 9 . . . . . . . . . . . . . . . . . . . . . . . . . 49
Problem 9A . . . . . . . . . . . . . . . . . . . . . . . . 49Problem 9B . . . . . . . . . . . . . . . . . . . . . . . . 50Problem 9C . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.10 Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Extra Credit: A Collection of Thoughts 534.1 A Small Quotes from Piazza . . . . . . . . . . . . . . . . . . . 534.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1 Home Runs and Handedness . . . . . . . . . . . . . . . 53Based on Population: All Available Players . . . . . . . 54Based on Population: Home Run Hitters . . . . . . . . 55
4.2.2 Pitchers: Throwing Home Runs and Handedness . . . . 554.2.3 Errors in the World Series . . . . . . . . . . . . . . . . 564.2.4 Hall of Fame and Salary . . . . . . . . . . . . . . . . . 584.2.5 Positions and Salary . . . . . . . . . . . . . . . . . . . 59
A particular group: Single Position Players . . . . . . . 59Salaries by Position . . . . . . . . . . . . . . . . . . . . 60
4.2.6 All about the All Stars . . . . . . . . . . . . . . . . . . 61The All Star and Non Star Salaries . . . . . . . . . . . 61Salary Disparities: Amongst All Stars and Non Stars . 62Salary Differences: Amongst All Stars . . . . . . . . . 63
4.2.7 The All Star Algorithm . . . . . . . . . . . . . . . . . . 64The Algorithm . . . . . . . . . . . . . . . . . . . . . . 65
5 EXPLICIT:R CODE 665.1 Functions Used in Bonus Section . . . . . . . . . . . . . . . . 66
5.1.1 matchFrame(dataFrame,xFrame) . . . . . . . . . . . . 665.1.2 groupNameReturnHR(dataFrame,name) . . . . . . . . 675.1.3 matchFrameFame(dataFrame,xFrame) . . . . . . . . . 675.1.4 matchUniqueFrame(dataFrame,xFrame) . . . . . . . . 685.1.5 uniquePositions(dataFrame,names) . . . . . . . . . . . 685.1.6 plotPosSalary(uniqueSalary) . . . . . . . . . . . . . . . 695.1.7 plotPosSalaryAll(uniqueSalary) . . . . . . . . . . . . . 71
4
5.1.8 matchNonUniqueFrame(dataFrame,xFrame) . . . . . . 735.1.9 sortSalaryYearly . . . . . . . . . . . . . . . . . . . . . 735.1.10 Home Runs and Handedness . . . . . . . . . . . . . . . 74
Based on Population: All Available Players . . . . . . . 75Based on Population: Home Run Hitters . . . . . . . . 75
5.1.11 Throwing Home Runs and Handedness . . . . . . . . . 765.1.12 Errors in the World Series . . . . . . . . . . . . . . . . 775.1.13 Hall of Fame and Salary . . . . . . . . . . . . . . . . . 775.1.14 Positions and Salary . . . . . . . . . . . . . . . . . . . 78
A particular group: Single Position Players . . . . . . . 78Salaries by Position . . . . . . . . . . . . . . . . . . . . 78
5.1.15 All about the All Stars . . . . . . . . . . . . . . . . . . 78The All Star Salary . . . . . . . . . . . . . . . . . . . . 78Salary Disparities: Amongst All Stars and Non Stars . 79Salary Disparities: Amongst All Stars . . . . . . . . . . 80
5
Chapter 1
HW 6
1.1 Part 1: Results
1.1.1 Part 1 i
SHELL RESULTS:
1263519 total Outbound Flights
468200 LAX Outbound Flights
367914 SFO Outbound Flights
251290 JFK Outbound Flights
89821 OAK Outbound Flights
86294 SMF Outbound Flights
Running Time: 0m22.633s
R RESULTS:
584916 total Outbound Flights
222029 LAX Outbound Flights
169734 SFO Outbound Flights
105097 JFK Outbound Flights
44911 OAK Outbound Flights
43145 SMF Outbound Flights
Running Time: 6m13.820s
2
1.1.2 part 1 ii
Results:
TOTAL ORIGIN Out-Bound & In-Bound
12489332
468211 LAX Out-Bound & In-Bound Flights
241228 JFK Out-Bound & In-Bound Flights
362781 SFO Out-Bound & In-Bound Flights
89819 OAK Out-Bound & In-Bound Flights
6
86293 SMF Out-Bound & In-Bound Flights
Shell Running Time: 0m47.440s
R Running Time: 1m13.290s
total time: 2m00.730s
1.2 Part 2: Results
1.2.1 Problem 1
1. What years does the data cover? are there data for each of these years?
[1] 1871 2013
The database covers the years 1871 to 2013, there exists data for each of
these years concerning some factor. For example the Pitching table has
data ranging from 1871 to 2013, but the Salaries data only ranges from
the year 1985 to 2013, so there will be missing data regarding some factors
for some years.
1.2.2 Problem 2
2. How many (unique) people are included in the database?
How many are players, managers, etc?
[1] 682
[1] 18354
[1] 19036
There are 682 UNIQUE managers and 18354 UNIQUE Baseball Players.
There is a grand total of 19036 UNIQUE people.
1.2.3 Problem 3
3. What team won the World Series in 2000?
2334 Y NYA 2000
The team that won was NYA.
1.2.4 Problem 4
4. What teams lost the World Series each year?
7
dataSub.teamID dataSub.yearID
1 NY4 1884
2 SL4 1885
3 CHN 1885
4 CHN 1886
5 SL4 1887
6 SL4 1888
7 BR3 1889
8 LS2 1890
9 BRO 1890
10 PIT 1903
11 PHA 1905
12 CHN 1906
13 DET 1907
14 DET 1908
15 DET 1909
16 CHN 1910
17 NY1 1911
18 NY1 1912
19 NY1 1913
20 PHA 1914
21 PHI 1915
22 BRO 1916
23 NY1 1917
24 CHN 1918
25 CHA 1919
26 BRO 1920
27 NYA 1921
28 NYA 1922
29 NY1 1923
30 NY1 1924
31 WS1 1925
32 NYA 1926
33 PIT 1927
34 SLN 1928
35 CHN 1929
36 SLN 1930
37 PHA 1931
38 CHN 1932
39 WS1 1933
40 DET 1934
41 CHN 1935
42 NY1 1936
43 NY1 1937
44 CHN 1938
45 CIN 1939
46 DET 1940
47 BRO 1941
48 NYA 1942
49 SLN 1943
50 SLA 1944
51 CHN 1945
52 BOS 1946
53 BRO 1947
54 BSN 1948
55 BRO 1949
56 PHI 1950
57 NY1 1951
58 BRO 1952
59 BRO 1953
60 CLE 1954
61 NYA 1955
62 BRO 1956
63 NYA 1957
64 ML1 1958
65 CHA 1959
66 NYA 1960
67 CIN 1961
68 SFN 1962
69 NYA 1963
70 NYA 1964
71 MIN 1965
72 LAN 1966
73 BOS 1967
74 SLN 1968
75 BAL 1969
76 CIN 1970
77 BAL 1971
78 CIN 1972
79 NYN 1973
8
80 LAN 1974
81 BOS 1975
82 NYA 1976
83 LAN 1977
84 LAN 1978
85 BAL 1979
86 KCA 1980
87 NYA 1981
88 ML4 1982
89 PHI 1983
90 SDN 1984
91 SLN 1985
92 BOS 1986
93 SLN 1987
94 OAK 1988
95 SFN 1989
96 OAK 1990
97 ATL 1991
98 ATL 1992
99 PHI 1993
100 CLE 1995
101 ATL 1996
102 CLE 1997
103 SDN 1998
104 ATL 1999
105 NYN 2000
106 NYA 2001
107 SFN 2002
108 NYA 2003
109 SLN 2004
110 HOU 2005
111 DET 2006
112 COL 2007
113 TBA 2008
114 PHI 2009
115 TEX 2010
116 TEX 2011
117 DET 2012
118 SLN 2013
1.2.5 Problem 5
#5. Do you see a relationship between the number of games won in a season
and winning the World Series?
*To the reader: Please refer to appendix for Plots.
It would seem that from 1984 the trend has been that teams typically
winning a large number of games wins the world series.However, to check
this trend we should also look at the teams that DID not win the world
series. Based on the graphs, we can clearly observe a similarity
between the two groups, so we cannot say that winning more games
during playoffs will decide who wins a World Series.
1.2.6 Problem 6
6. In 2003, what were the three highest salaries? (We refer here to unique
9
salaries, i.e., more than one player might be paid one of these
salaries.)
[1] 22000000 20000000 18700000
The Highest 3 salaries are: $22,000,000 and $20,000,000 and $18,700,000
(Sidenote: Wow. . .that is quite ridiculous)
1.2.7 Problem 7
Problem 7A
A) For 1999, compute the total payroll
of each of the different teams.
[1] "ANA" "55388166"
[1] "ARI" "68703999"
[1] "ATL" "73140000"
[1] "BAL" "80605863"
[1] "BOS" "63497500"
[1] "CHA" "25620000"
[1] "CHN" "62343000"
[1] "CIN" "33962761"
[1] "CLE" "72978462"
[1] "COL" "61935837"
[1] "DET" "36489666"
[1] "FLO" "21085000"
[1] "HOU" "54914000"
[1] "KCA" "26225000"
[1] "LAN" "80862453"
[1] "MIL" "43377395"
[1] "MIN" "21257500"
[1] "MON" "17903000"
[1] "NYA" "86734359"
[1] "NYN" "65092092"
[1] "OAK" "24431833"
[1] "PHI" "31692500"
[1] "PIT" "24697666"
[1] "SDN" "49768179"
[1] "SEA" "54125003"
[1] "SFN" "46595057"
[1] "SLN" "49778195"
[1] "TBA" "38870000"
[1] "TEX" "76709931"
[1] "TOR" "45444333"
Problem 7B
b) Next compute the team payrolls for all years in the database for
which we have salary information.
[1] "ATL" "1985" "14807000"
[1] "ATL" "1986" "17102786"
[1] "ATL" "1987" "16544560"
[1] "ATL" "1988" "12728174"
[1] "ATL" "1989" "11112334"
[1] "ATL" "1990" "14555501"
[1] "ATL" "1991" "18403500"
[1] "ATL" "1992" "34625333"
10
[1] "ATL" "1993" "41641417"
[1] "ATL" "1994" "49383513"
[1] "ATL" "1995" "47235445"
[1] "ATL" "1996" "49698500"
[1] "ATL" "1997" "52278500"
[1] "ATL" "1998" "61186000"
[1] "ATL" "1999" "73140000"
[1] "ATL" "2000" "84537836"
[1] "ATL" "2001" "91936166"
[1] "ATL" "2002" "92870367"
[1] "ATL" "2003" "106243667"
[1] "ATL" "2004" "90182500"
[1] "ATL" "2005" "86457302"
[1] "ATL" "2006" "90156876"
[1] "ATL" "2007" "87290833"
[1] "ATL" "2008" "102365683"
[1] "ATL" "2009" "96726166"
[1] "ATL" "2010" "84423666"
[1] "ATL" "2011" "87002692"
[1] "ATL" "2012" "82829942"
[1] "ATL" "2013" "87871525"
[1] "BAL" "1985" "11560712"
[1] "BAL" "1986" "13001258"
[1] "BAL" "1987" "13900273"
[1] "BAL" "1988" "13532075"
[1] "BAL" "1989" "8275167"
[1] "BAL" "1990" "9680084"
[1] "BAL" "1991" "17519000"
[1] "BAL" "1992" "23780667"
[1] "BAL" "1993" "29096500"
[1] "BAL" "1994" "38849769"
[1] "BAL" "1995" "43942521"
[1] "BAL" "1996" "54490315"
[1] "BAL" "1997" "58516400"
[1] "BAL" "1998" "72355634"
[1] "BAL" "1999" "80605863"
[1] "BAL" "2000" "81447435"
[1] "BAL" "2001" "67599540"
[1] "BAL" "2002" "60493487"
[1] "BAL" "2003" "73877500"
[1] "BAL" "2004" "51623333"
[1] "BAL" "2005" "73914333"
[1] "BAL" "2006" "72585582"
[1] "BAL" "2007" "93174808"
[1] "BAL" "2008" "67196246"
[1] "BAL" "2009" "67101666"
[1] "BAL" "2010" "81612500"
[1] "BAL" "2011" "85304038"
[1] "BAL" "2012" "77353999"
[1] "BAL" "2013" "84393333"
[1] "BOS" "1985" "10897560"
[1] "BOS" "1986" "14402239"
[1] "BOS" "1987" "10144167"
[1] "BOS" "1988" "13896092"
[1] "BOS" "1989" "17481748"
[1] "BOS" "1990" "20558333"
[1] "BOS" "1991" "35167500"
[1] "BOS" "1992" "43610584"
[1] "BOS" "1993" "37120583"
[1] "BOS" "1994" "37859084"
[1] "BOS" "1995" "32455518"
[1] "BOS" "1996" "42393500"
[1] "BOS" "1997" "43558750"
[1] "BOS" "1998" "56757000"
[1] "BOS" "1999" "63497500"
[1] "BOS" "2000" "77940333"
[1] "BOS" "2001" "110035833"
[1] "BOS" "2002" "108366060"
[1] "BOS" "2003" "99946500"
[1] "BOS" "2004" "127298500"
[1] "BOS" "2005" "123505125"
[1] "BOS" "2006" "120099824"
[1] "BOS" "2007" "143026214"
[1] "BOS" "2008" "133390035"
[1] "BOS" "2009" "121345999"
[1] "BOS" "2010" "162447333"
[1] "BOS" "2011" "161762475"
[1] "BOS" "2012" "173186617"
[1] "BOS" "2013" "151530000"
[1] "CAL" "1985" "14427894"
[1] "CAL" "1986" "14427258"
[1] "CAL" "1987" "12843499"
[1] "CAL" "1988" "11947388"
[1] "CAL" "1989" "15097833"
[1] "CAL" "1990" "21720000"
[1] "CAL" "1991" "33060001"
[1] "CAL" "1992" "34749334"
[1] "CAL" "1993" "28588334"
[1] "CAL" "1994" "25156218"
[1] "CAL" "1995" "31223171"
11
[1] "CAL" "1996" "28738000"
[1] "CHA" "1985" "9846178"
[1] "CHA" "1986" "10418819"
[1] "CHA" "1987" "10641843"
[1] "CHA" "1988" "6390000"
[1] "CHA" "1989" "7265410"
[1] "CHA" "1990" "9491500"
[1] "CHA" "1991" "16919667"
[1] "CHA" "1992" "30160833"
[1] "CHA" "1993" "39696166"
[1] "CHA" "1994" "39183836"
[1] "CHA" "1995" "46961282"
[1] "CHA" "1996" "45139500"
[1] "CHA" "1997" "57740000"
[1] "CHA" "1998" "38335000"
[1] "CHA" "1999" "25620000"
[1] "CHA" "2000" "31133500"
[1] "CHA" "2001" "65653667"
[1] "CHA" "2002" "57052833"
[1] "CHA" "2003" "51010000"
[1] "CHA" "2004" "65212500"
[1] "CHA" "2005" "75178000"
[1] "CHA" "2006" "102750667"
[1] "CHA" "2007" "108671833"
[1] "CHA" "2008" "121189332"
[1] "CHA" "2009" "96068500"
[1] "CHA" "2010" "105530000"
[1] "CHA" "2011" "127789000"
[1] "CHA" "2012" "96919500"
[1] "CHA" "2013" "120065277"
[1] "CHN" "1985" "12702917"
[1] "CHN" "1986" "17208165"
[1] "CHN" "1987" "14307999"
[1] "CHN" "1988" "13119198"
[1] "CHN" "1989" "10668000"
[1] "CHN" "1990" "13624000"
[1] "CHN" "1991" "23175667"
[1] "CHN" "1992" "29829686"
[1] "CHN" "1993" "39386666"
[1] "CHN" "1994" "36287333"
[1] "CHN" "1995" "29505834"
[1] "CHN" "1996" "33081000"
[1] "CHN" "1997" "42155333"
[1] "CHN" "1998" "50838000"
[1] "CHN" "1999" "62343000"
[1] "CHN" "2000" "60539333"
[1] "CHN" "2001" "64715833"
[1] "CHN" "2002" "75690833"
[1] "CHN" "2003" "79868333"
[1] "CHN" "2004" "90560000"
[1] "CHN" "2005" "87032933"
[1] "CHN" "2006" "94424499"
[1] "CHN" "2007" "99670332"
[1] "CHN" "2008" "118345833"
[1] "CHN" "2009" "134809000"
[1] "CHN" "2010" "146609000"
[1] "CHN" "2011" "125047329"
[1] "CHN" "2012" "88197033"
[1] "CHN" "2013" "100567726"
[1] "CIN" "1985" "8359917"
[1] "CIN" "1986" "11906388"
[1] "CIN" "1987" "9281500"
[1] "CIN" "1988" "8888409"
[1] "CIN" "1989" "11072000"
[1] "CIN" "1990" "14370000"
[1] "CIN" "1991" "26305333"
[1] "CIN" "1992" "35931499"
[1] "CIN" "1993" "44879666"
[1] "CIN" "1994" "40961833"
[1] "CIN" "1995" "43144670"
[1] "CIN" "1996" "42526334"
[1] "CIN" "1997" "49768000"
[1] "CIN" "1998" "23005000"
[1] "CIN" "1999" "33962761"
[1] "CIN" "2000" "46867200"
[1] "CIN" "2001" "48986000"
[1] "CIN" "2002" "45050390"
[1] "CIN" "2003" "59355667"
[1] "CIN" "2004" "46615250"
[1] "CIN" "2005" "61892583"
[1] "CIN" "2006" "60909519"
[1] "CIN" "2007" "68524980"
[1] "CIN" "2008" "74117695"
[1] "CIN" "2009" "73558500"
[1] "CIN" "2010" "71761542"
[1] "CIN" "2011" "75947134"
[1] "CIN" "2012" "82203616"
[1] "CIN" "2013" "106404462"
[1] "CLE" "1985" "6551666"
[1] "CLE" "1986" "7809500"
12
[1] "CLE" "1987" "8513750"
[1] "CLE" "1988" "8936500"
[1] "CLE" "1989" "9094500"
[1] "CLE" "1990" "14487000"
[1] "CLE" "1991" "17635000"
[1] "CLE" "1992" "9373044"
[1] "CLE" "1993" "18561000"
[1] "CLE" "1994" "30490500"
[1] "CLE" "1995" "37937835"
[1] "CLE" "1996" "48107360"
[1] "CLE" "1997" "56802460"
[1] "CLE" "1998" "60800166"
[1] "CLE" "1999" "72978462"
[1] "CLE" "2000" "75880771"
[1] "CLE" "2001" "93152001"
[1] "CLE" "2002" "78909449"
[1] "CLE" "2003" "48584834"
[1] "CLE" "2004" "34319300"
[1] "CLE" "2005" "41502500"
[1] "CLE" "2006" "56031500"
[1] "CLE" "2007" "61673267"
[1] "CLE" "2008" "78970066"
[1] "CLE" "2009" "81579166"
[1] "CLE" "2010" "61203966"
[1] "CLE" "2011" "48776566"
[1] "CLE" "2012" "78430300"
[1] "CLE" "2013" "75771800"
[1] "DET" "1985" "10348143"
[1] "DET" "1986" "12335714"
[1] "DET" "1987" "12122881"
[1] "DET" "1988" "12869571"
[1] "DET" "1989" "15146404"
[1] "DET" "1990" "17593238"
[1] "DET" "1991" "23838333"
[1] "DET" "1992" "27322834"
[1] "DET" "1993" "38150165"
[1] "DET" "1994" "41446501"
[1] "DET" "1995" "37044168"
[1] "DET" "1996" "23438000"
[1] "DET" "1997" "17272000"
[1] "DET" "1998" "24065000"
[1] "DET" "1999" "36489666"
[1] "DET" "2000" "58265167"
[1] "DET" "2001" "53416167"
[1] "DET" "2002" "55048000"
[1] "DET" "2003" "49168000"
[1] "DET" "2004" "46832000"
[1] "DET" "2005" "69092000"
[1] "DET" "2006" "82612866"
[1] "DET" "2007" "94800369"
[1] "DET" "2008" "137685196"
[1] "DET" "2009" "115085145"
[1] "DET" "2010" "122864928"
[1] "DET" "2011" "105700231"
[1] "DET" "2012" "132300000"
[1] "DET" "2013" "145989500"
[1] "HOU" "1985" "9993051"
[1] "HOU" "1986" "9873276"
[1] "HOU" "1987" "12608371"
[1] "HOU" "1988" "12286167"
[1] "HOU" "1989" "15029500"
[1] "HOU" "1990" "18330000"
[1] "HOU" "1991" "12852500"
[1] "HOU" "1992" "15407500"
[1] "HOU" "1993" "30210500"
[1] "HOU" "1994" "33126000"
[1] "HOU" "1995" "34169834"
[1] "HOU" "1996" "28487000"
[1] "HOU" "1997" "34777500"
[1] "HOU" "1998" "42374000"
[1] "HOU" "1999" "54914000"
[1] "HOU" "2000" "51289111"
[1] "HOU" "2001" "60612667"
[1] "HOU" "2002" "63448417"
[1] "HOU" "2003" "71040000"
[1] "HOU" "2004" "75397000"
[1] "HOU" "2005" "76779000"
[1] "HOU" "2006" "88694435"
[1] "HOU" "2007" "87759000"
[1] "HOU" "2008" "88930414"
[1] "HOU" "2009" "102996414"
[1] "HOU" "2010" "92355500"
[1] "HOU" "2011" "70694000"
[1] "HOU" "2012" "60651000"
[1] "HOU" "2013" "17890700"
[1] "KCA" "1985" "9321179"
[1] "KCA" "1986" "13043698"
[1] "KCA" "1987" "11828056"
[1] "KCA" "1988" "14556562"
[1] "KCA" "1989" "18683568"
13
[1] "KCA" "1990" "23361084"
[1] "KCA" "1991" "26319834"
[1] "KCA" "1992" "33893834"
[1] "KCA" "1993" "41346167"
[1] "KCA" "1994" "40541334"
[1] "KCA" "1995" "29532834"
[1] "KCA" "1996" "20281250"
[1] "KCA" "1997" "34655000"
[1] "KCA" "1998" "36862500"
[1] "KCA" "1999" "26225000"
[1] "KCA" "2000" "23433000"
[1] "KCA" "2001" "35422500"
[1] "KCA" "2002" "47257000"
[1] "KCA" "2003" "40518000"
[1] "KCA" "2004" "47609000"
[1] "KCA" "2005" "36881000"
[1] "KCA" "2006" "47294000"
[1] "KCA" "2007" "67116500"
[1] "KCA" "2008" "58245500"
[1] "KCA" "2009" "70519333"
[1] "KCA" "2010" "71405210"
[1] "KCA" "2011" "35712000"
[1] "KCA" "2012" "60916225"
[1] "KCA" "2013" "80091725"
[1] "LAN" "1985" "10967917"
[1] "LAN" "1986" "14913776"
[1] "LAN" "1987" "13675403"
[1] "LAN" "1988" "16850515"
[1] "LAN" "1989" "21071562"
[1] "LAN" "1990" "21318704"
[1] "LAN" "1991" "32790664"
[1] "LAN" "1992" "44788166"
[1] "LAN" "1993" "39331999"
[1] "LAN" "1994" "38000001"
[1] "LAN" "1995" "39273201"
[1] "LAN" "1996" "35355000"
[1] "LAN" "1997" "45380304"
[1] "LAN" "1998" "48820000"
[1] "LAN" "1999" "80862453"
[1] "LAN" "2000" "87924286"
[1] "LAN" "2001" "109105953"
[1] "LAN" "2002" "94850953"
[1] "LAN" "2003" "105572620"
[1] "LAN" "2004" "92902001"
[1] "LAN" "2005" "83039000"
[1] "LAN" "2006" "98447187"
[1] "LAN" "2007" "108454524"
[1] "LAN" "2008" "118588536"
[1] "LAN" "2009" "100414592"
[1] "LAN" "2010" "95358016"
[1] "LAN" "2011" "104188999"
[1] "LAN" "2012" "95143575"
[1] "LAN" "2013" "223362196"
[1] "MIN" "1985" "5764821"
[1] "MIN" "1986" "8748167"
[1] "MIN" "1987" "6397500"
[1] "MIN" "1988" "12462666"
[1] "MIN" "1989" "15531666"
[1] "MIN" "1990" "14602000"
[1] "MIN" "1991" "23361833"
[1] "MIN" "1992" "28027834"
[1] "MIN" "1993" "28217933"
[1] "MIN" "1994" "28438500"
[1] "MIN" "1995" "25410500"
[1] "MIN" "1996" "23117000"
[1] "MIN" "1997" "34072500"
[1] "MIN" "1998" "27927500"
[1] "MIN" "1999" "21257500"
[1] "MIN" "2000" "16519500"
[1] "MIN" "2001" "24130000"
[1] "MIN" "2002" "40425000"
[1] "MIN" "2003" "55505000"
[1] "MIN" "2004" "53585000"
[1] "MIN" "2005" "56186000"
[1] "MIN" "2006" "63396006"
[1] "MIN" "2007" "71439500"
[1] "MIN" "2008" "56932766"
[1] "MIN" "2009" "65299266"
[1] "MIN" "2010" "97559166"
[1] "MIN" "2011" "112737000"
[1] "MIN" "2012" "94085000"
[1] "MIN" "2013" "75337500"
[1] "ML4" "1985" "11284107"
[1] "ML4" "1986" "9943642"
[1] "ML4" "1987" "7293224"
[1] "ML4" "1988" "8402000"
[1] "ML4" "1989" "11533000"
[1] "ML4" "1990" "19719167"
[1] "ML4" "1991" "23115500"
[1] "ML4" "1992" "31013667"
14
[1] "ML4" "1993" "23806834"
[1] "ML4" "1994" "24350500"
[1] "ML4" "1995" "17798825"
[1] "ML4" "1996" "21730000"
[1] "ML4" "1997" "23655338"
[1] "MON" "1985" "9470166"
[1] "MON" "1986" "11103600"
[1] "MON" "1987" "6942052"
[1] "MON" "1988" "9603333"
[1] "MON" "1989" "13807389"
[1] "MON" "1990" "16586388"
[1] "MON" "1991" "10732333"
[1] "MON" "1992" "15822334"
[1] "MON" "1993" "18899333"
[1] "MON" "1994" "19098000"
[1] "MON" "1995" "12364000"
[1] "MON" "1996" "16264500"
[1] "MON" "1997" "19295500"
[1] "MON" "1998" "10641500"
[1] "MON" "1999" "17903000"
[1] "MON" "2000" "32994333"
[1] "MON" "2001" "35159500"
[1] "MON" "2002" "38670500"
[1] "MON" "2003" "51948500"
[1] "MON" "2004" "40897500"
[1] "NYA" "1985" "14238204"
[1] "NYA" "1986" "18494253"
[1] "NYA" "1987" "17099714"
[1] "NYA" "1988" "19441152"
[1] "NYA" "1989" "17114375"
[1] "NYA" "1990" "20912318"
[1] "NYA" "1991" "27344168"
[1] "NYA" "1992" "37543334"
[1] "NYA" "1993" "42624900"
[1] "NYA" "1994" "45731334"
[1] "NYA" "1995" "48874851"
[1] "NYA" "1996" "54191792"
[1] "NYA" "1997" "62241545"
[1] "NYA" "1998" "66806867"
[1] "NYA" "1999" "86734359"
[1] "NYA" "2000" "92338260"
[1] "NYA" "2001" "112287143"
[1] "NYA" "2002" "125928583"
[1] "NYA" "2003" "152749814"
[1] "NYA" "2004" "184193950"
[1] "NYA" "2005" "208306817"
[1] "NYA" "2006" "194663079"
[1] "NYA" "2007" "189259045"
[1] "NYA" "2008" "207896789"
[1] "NYA" "2009" "201449189"
[1] "NYA" "2010" "206333389"
[1] "NYA" "2011" "202275028"
[1] "NYA" "2012" "196522289"
[1] "NYA" "2013" "231978886"
[1] "NYN" "1985" "10834762"
[1] "NYN" "1986" "15393714"
[1] "NYN" "1987" "13846714"
[1] "NYN" "1988" "15269314"
[1] "NYN" "1989" "19885071"
[1] "NYN" "1990" "21722834"
[1] "NYN" "1991" "32590001"
[1] "NYN" "1992" "44602002"
[1] "NYN" "1993" "39043667"
[1] "NYN" "1994" "30956583"
[1] "NYN" "1995" "27674992"
[1] "NYN" "1996" "24479500"
[1] "NYN" "1997" "39800400"
[1] "NYN" "1998" "52077999"
[1] "NYN" "1999" "65092092"
[1] "NYN" "2000" "79509776"
[1] "NYN" "2001" "93174428"
[1] "NYN" "2002" "94633593"
[1] "NYN" "2003" "116876429"
[1] "NYN" "2004" "96660970"
[1] "NYN" "2005" "101305821"
[1] "NYN" "2006" "101084963"
[1] "NYN" "2007" "115231663"
[1] "NYN" "2008" "137793376"
[1] "NYN" "2009" "149373987"
[1] "NYN" "2010" "134422942"
[1] "NYN" "2011" "118847309"
[1] "NYN" "2012" "93353983"
[1] "NYN" "2013" "49448346"
[1] "OAK" "1985" "9058606"
[1] "OAK" "1986" "9779421"
[1] "OAK" "1987" "11680839"
[1] "OAK" "1988" "9690000"
[1] "OAK" "1989" "15613070"
[1] "OAK" "1990" "19887501"
[1] "OAK" "1991" "36999167"
15
[1] "OAK" "1992" "41035000"
[1] "OAK" "1993" "37812333"
[1] "OAK" "1994" "34172500"
[1] "OAK" "1995" "37739225"
[1] "OAK" "1996" "21243000"
[1] "OAK" "1997" "24018500"
[1] "OAK" "1998" "21303000"
[1] "OAK" "1999" "24431833"
[1] "OAK" "2000" "31971333"
[1] "OAK" "2001" "33810750"
[1] "OAK" "2002" "40004167"
[1] "OAK" "2003" "50260834"
[1] "OAK" "2004" "59425667"
[1] "OAK" "2005" "55425762"
[1] "OAK" "2006" "62243079"
[1] "OAK" "2007" "79366940"
[1] "OAK" "2008" "47967126"
[1] "OAK" "2009" "61910000"
[1] "OAK" "2010" "55254900"
[1] "OAK" "2011" "66536500"
[1] "OAK" "2012" "55372500"
[1] "OAK" "2013" "60132500"
[1] "PHI" "1985" "10124966"
[1] "PHI" "1986" "11590166"
[1] "PHI" "1987" "11514233"
[1] "PHI" "1988" "13838000"
[1] "PHI" "1989" "10604000"
[1] "PHI" "1990" "13173667"
[1] "PHI" "1991" "22487332"
[1] "PHI" "1992" "24383834"
[1] "PHI" "1993" "28538334"
[1] "PHI" "1994" "31599000"
[1] "PHI" "1995" "30555945"
[1] "PHI" "1996" "34314500"
[1] "PHI" "1997" "36656500"
[1] "PHI" "1998" "36297500"
[1] "PHI" "1999" "31692500"
[1] "PHI" "2000" "47308000"
[1] "PHI" "2001" "41663833"
[1] "PHI" "2002" "57954999"
[1] "PHI" "2003" "70780000"
[1] "PHI" "2004" "92919167"
[1] "PHI" "2005" "95522000"
[1] "PHI" "2006" "88273333"
[1] "PHI" "2007" "89428213"
[1] "PHI" "2008" "97879880"
[1] "PHI" "2009 "113004046"
[1] "PHI" "2010 "141928379"
[1] "PHI" "2011" "172976379"
[1] "PHI" "2012" "174538938"
[1] "PHI" "2013" "169863189"
[1] "PIT" "1985" "9227500"
[1] "PIT" "1986" "10843500"
[1] "PIT" "1987" "7652000"
[1] "PIT" "1988" "5998500"
[1] "PIT" "1989" "12737500"
[1] "PIT" "1990" "15556000"
[1] "PIT" "1991" "23634667"
[1] "PIT" "1992" "33944167"
[1] "PIT" "1993" "24822467"
[1] "PIT" "1994" "24217250"
[1] "PIT" "1995" "18355345"
[1] "PIT" "1996" "23017500"
[1] "PIT" "1997" "10771667"
[1] "PIT" "1998" "15065000"
[1] "PIT" "1999" "24697666"
[1] "PIT" "2000" "28928334"
[1] "PIT" "2001" "57760833"
[1] "PIT" "2002" "42323599"
[1] "PIT" "2003" "54812429"
[1] "PIT" "2004" "32227929"
[1] "PIT" "2005" "38133000"
[1] "PIT" "2006" "46717750"
[1] "PIT" "2007" "38537833"
[1] "PIT" "2008" "48689783"
[1] "PIT" "2009" "48693000"
[1] "PIT" "2010" "34943000"
[1] "PIT" "2011" "45047000"
[1] "PIT" "2012" "62951999"
[1] "PIT" "2013" "77062000"
[1] "SDN" "1985" "11036583"
[1] "SDN" "1986" "11380693"
[1] "SDN" "1987" "11065796"
[1] "SDN" "1988" "9561002"
[1] "SDN" "1989" "14195000"
[1] "SDN" "1990" "17588334"
[1] "SDN" "1991" "22150001"
[1] "SDN" "1992" "26854167"
[1] "SDN" "1993" "25511333"
[1] "SDN" "1994" "14916333"
16
[1] "SDN" "1995" "26382334"
[1] "SDN" "1996" "28348172"
[1] "SDN" "1997" "37363672"
[1] "SDN" "1998" "46861500"
[1] "SDN" "1999" "49768179"
[1] "SDN" "2000" "54821000"
[1] "SDN" "2001" "39182833"
[1] "SDN" "2002" "41425000"
[1] "SDN" "2003" "45210000"
[1] "SDN" "2004" "55384833"
[1] "SDN" "2005" "63290833"
[1] "SDN" "2006" "69896141"
[1] "SDN" "2007" "58110567"
[1] "SDN" "2008" "73677616"
[1] "SDN" "2009" "43333700"
[1] "SDN" "2010" "37799300"
[1] "SDN" "2011" "45869140"
[1] "SDN" "2012" "55244700"
[1] "SDN" "2013" "65585500"
[1] "SEA" "1985" "4613000"
[1] "SEA" "1986" "5958309"
[1] "SEA" "1987" "2263500"
[1] "SEA" "1988" "7342450"
[1] "SEA" "1989" "9779500"
[1] "SEA" "1990" "12553667"
[1] "SEA" "1991" "15691833"
[1] "SEA" "1992" "23179833"
[1] "SEA" "1993" "32696333"
[1] "SEA" "1994" "29228500"
[1] "SEA" "1995" "36481311"
[1] "SEA" "1996" "41328501"
[1] "SEA" "1997" "41540661"
[1] "SEA" "1998" "54087036"
[1] "SEA" "1999" "54125003"
[1] "SEA" "2000" "58915000"
[1] "SEA" "2001" "74720834"
[1] "SEA" "2002" "80282668"
[1] "SEA" "2003" "86959167"
[1] "SEA" "2004" "81515834"
[1] "SEA" "2005" "87754334"
[1] "SEA" "2006" "87959833"
[1] "SEA" "2007" "106460833"
[1] "SEA" "2008" "117666482"
[1] "SEA" "2009" "98904166"
[1] "SEA" "2010" "86510000"
[1] "SEA" "2011" "86110600"
[1] "SEA" "2012" "81978100"
[1] "SEA" "2013" "74005043"
[1] "SFN" "1985" "8221714"
[1] "SFN" "1986" "8947000"
[1] "SFN" "1987" "7290000"
[1] "SFN" "1988" "12380000"
[1] "SFN" "1989" "14962834"
[1] "SFN" "1990" "19335333"
[1] "SFN" "1991" "30967666"
[1] "SFN" "1992" "33163168"
[1] "SFN" "1993" "35050000"
[1] "SFN" "1994" "42638666"
[1] "SFN" "1995" "36462777"
[1] "SFN" "1996" "37144725"
[1] "SFN" "1997" "35592378"
[1] "SFN" "1998" "42565834"
[1] "SFN" "1999" "46595057"
[1] "SFN" "2000" "53737826"
[1] "SFN" "2001" "63280167"
[1] "SFN" "2002" "78299835"
[1] "SFN" "2003" "82852167"
[1] "SFN" "2004" "82019166"
[1] "SFN" "2005" "90199500"
[1] "SFN" "2006" "90056419"
[1] "SFN" "2007" "90219056"
[1] "SFN" "2008" "76594500"
[1] "SFN" "2009" "83026450"
[1] "SFN" "2010" "98641333"
[1] "SFN" "2011" "118198333"
[1] "SFN" "2012" "117620683"
[1] "SFN" "2013" "140180334"
[1] "SLN" "1985" "11817083"
[1] "SLN" "1986" "9875010"
[1] "SLN" "1987" "11758000"
[1] "SLN" "1988" "12880000"
[1] "SLN" "1989" "16078833"
[1] "SLN" "1990" "20523334"
[1] "SLN" "1991" "21860001"
[1] "SLN" "1992" "27583836"
[1] "SLN" "1993" "23367334"
[1] "SLN" "1994" "29275601"
[1] "SLN" "1995" "37101000"
[1] "SLN" "1996" "40269667"
[1] "SLN" "1997" "45456667"
17
[1] "SLN" "1998" "54672521"
[1] "SLN" "1999" "49778195"
[1] "SLN" "2000" "61453863"
[1] "SLN" "2001" "78538333"
[1] "SLN" "2002" "74660875"
[1] "SLN" "2003" "83786666"
[1] "SLN" "2004" "83228333"
[1] "SLN" "2005" "92106833"
[1] "SLN" "2006" "88891371"
[1] "SLN" "2007" "90286823"
[1] "SLN" "2008" "99624449"
[1] "SLN" "2009" "88528409"
[1] "SLN" "2010" "93540751"
[1] "SLN" "2011" "105433572"
[1] "SLN" "2012" "110300862"
[1] "SLN" "2013" "92260110"
[1] "TEX" "1985" "7676500"
[1] "TEX" "1986" "6743119"
[1] "TEX" "1987" "880000"
[1] "TEX" "1988" "5342131"
[1] "TEX" "1989" "11893781"
[1] "TEX" "1990" "14874372"
[1] "TEX" "1991" "18224500"
[1] "TEX" "1992" "30128167"
[1] "TEX" "1993" "36376959"
[1] "TEX" "1994" "32973597"
[1] "TEX" "1995" "34581451"
[1] "TEX" "1996" "39041528"
[1] "TEX" "1997" "53448838"
[1] "TEX" "1998" "56572095"
[1] "TEX" "1999" "76709931"
[1] "TEX" "2000" "70795921"
[1] "TEX" "2001" "88633500"
[1] "TEX" "2002" "105526122"
[1] "TEX" "2003" "103491667"
[1] "TEX" "2004" "55050417"
[1] "TEX" "2005" "55849000"
[1] "TEX" "2006" "68228662"
[1] "TEX" "2007" "68318675"
[1] "TEX" "2008" "67712326"
[1] "TEX" "2009" "68178798"
[1] "TEX" "2010" "55250544"
[1] "TEX" "2011" "92299264"
[1] "TEX" "2012" "120510974"
[1] "TEX" "2013" "112522600"
[1] "TOR" "1985" "8812550"
[1] "TOR" "1986" "12611047"
[1] "TOR" "1987" "10479501"
[1] "TOR" "1988" "12241225"
[1] "TOR" "1989" "16261666"
[1] "TOR" "1990" "17756834"
[1] "TOR" "1991" "19902417"
[1] "TOR" "1992" "44788666"
[1] "TOR" "1993" "47279166"
[1] "TOR" "1994" "43433668"
[1] "TOR" "1995" "50590000"
[1] "TOR" "1996" "29555083"
[1] "TOR" "1997" "47079833"
[1] "TOR" "1998" "51376000"
[1] "TOR" "1999" "45444333"
[1] "TOR" "2000" "44838332"
[1] "TOR" "2001" "76895999"
[1] "TOR" "2002" "76864333"
[1] "TOR" "2003" "51269000"
[1] "TOR" "2004" "50017000"
[1] "TOR" "2005" "45719500"
[1] "TOR" "2006" "71365000"
[1] "TOR" "2007" "81942800"
[1] "TOR" "2008" "97793900"
[1] "TOR" "2009" "80538300"
[1] "TOR" "2010" "62234000"
[1] "TOR" "2011" "62567800"
[1] "TOR" "2012" "75009200"
[1] "TOR" "2013" "126288100"
[1] "COL" "1993" "10353500"
[1] "COL" "1994" "23887333"
[1] "COL" "1995" "34154717"
[1] "COL" "1996" "40179823"
[1] "COL" "1997" "43559667"
[1] "COL" "1998" "50484648"
[1] "COL" "1999" "61935837"
[1] "COL" "2000" "61111190"
[1] "COL" "2001" "71541334"
[1] "COL" "2002" "56851043"
[1] "COL" "2003" "67179667"
[1] "COL" "2004" "65445167"
[1] "COL" "2005" "47839000"
[1] "COL" "2006" "41233000"
[1] "COL" "2007" "54041000"
[1] "COL" "2008" "68655500"
18
[1] "COL" "2009" "75201000"
[1] "COL" "2010" "84227000"
[1] "COL" "2011" "88148071"
[1] "COL" "2012" "78069571"
[1] "COL" "2013" "74409071"
[1] "FLO" "1993" "19330545"
[1] "FLO" "1994" "21633000"
[1] "FLO" "1995" "24515781"
[1] "FLO" "1996" "31022500"
[1] "FLO" "1997" "48692500"
[1] "FLO" "1998" "41322667"
[1] "FLO" "1999" "21085000"
[1] "FLO" "2000" "19872000"
[1] "FLO" "2001" "35762500"
[1] "FLO" "2002" "41979917"
[1] "FLO" "2003" "49450000"
[1] "FLO" "2004" "42143042"
[1] "FLO" "2005" "60408834"
[1] "FLO" "2006" "14671500"
[1] "FLO" "2007" "30507000"
[1] "FLO" "2008" "21811500"
[1] "FLO" "2009" "36834000"
[1] "FLO" "2010" "57029719"
[1] "FLO" "2011" "56944000"
[1] "ANA" "1997" "31135472"
[1] "ANA" "1998" "41281000"
[1] "ANA" "1999" "55388166"
[1] "ANA" "2000" "51464167"
[1] "ANA" "2001" "47535167"
[1] "ANA" "2002" "61721667"
[1] "ANA" "2003" "79031667"
[1] "ANA" "2004" "100534667"
[1] "ARI" "1998" "32347000"
[1] "ARI" "1999" "68703999"
[1] "ARI" "2000" "81027833"
[1] "ARI" "2001" "85082999"
[1] "ARI" "2002" "102819999"
[1] "ARI" "2003" "80657000"
[1] "ARI" "2004" "69780750"
[1] "ARI" "2005" "62329166"
[1] "ARI" "2006" "59684226"
[1] "ARI" "2007" "52067546"
[1] "ARI" "2008" "66202712"
[1] "ARI" "2009" "73115666"
[1] "ARI" "2010" "60718166"
[1] "ARI" "2011" "53639833"
[1] "ARI" "2012" "73804833"
[1] "ARI" "2013" "90132000"
[1] "MIL" "1998" "33914904"
[1] "MIL" "1999" "43377395"
[1] "MIL" "2000" "36505333"
[1] "MIL" "2001" "43886833"
[1] "MIL" "2002" "50287833"
[1] "MIL" "2003" "40627000"
[1] "MIL" "2004" "27528500"
[1] "MIL" "2005" "39934833"
[1] "MIL" "2006" "57568333"
[1] "MIL" "2007" "70986500"
[1] "MIL" "2008" "80937499"
[1] "MIL" "2009" "80182502"
[1] "MIL" "2010" "81108278"
[1] "MIL" "2011" "85497333"
[1] "MIL" "2012" "97653944"
[1] "MIL" "2013" "76947033"
[1] "TBA" "1998" "27280000"
[1] "TBA" "1999" "38870000"
[1] "TBA" "2000" "62765129"
[1] "TBA" "2001" "56980000"
[1] "TBA" "2002" "34380000"
[1] "TBA" "2003" "19630000"
[1] "TBA" "2004" "29556667"
[1] "TBA" "2005" "29679067"
[1] "TBA" "2006" "34917967"
[1] "TBA" "2007" "24123500"
[1] "TBA" "2008" "43820597"
[1] "TBA" "2009" "63313034"
[1] "TBA" "2010" "71923471"
[1] "TBA" "2011" "41053571"
[1] "TBA" "2012" "64173500"
[1] "TBA" "2013" "52955272"
[1] "LAA" "2005" "94867822"
[1] "LAA" "2006" "103472000"
[1] "LAA" "2007" "109251333"
[1] "LAA" "2008" "119216333"
[1] "LAA" "2009" "113709000"
[1] "LAA" "2010" "104963866"
[1] "LAA" "2011" "138543166"
[1] "LAA" "2012" "154485166"
[1] "LAA" "2013" "124174750"
[1] "WAS" "2005" "48581500"
19
[1] "WAS" "2006" "63143000"
[1] "WAS" "2007" "36947500"
[1] "WAS" "2008" "54961000"
[1] "WAS" "2009" "59928000"
[1] "WAS" "2010" "61400000"
[1] "WAS" "2011" "63856928"
[1] "WAS" "2012" "80855143"
[1] "WAS" "2013" "113703270"
[1] "MIA" "2012" "118078000"
[1] "MIA" "2013" "33601900"
Problem 7C
c) Display these in a plot.
*To the reader: Please refer to appendix for Plots.
1.2.8 Problem 8
Study the change in salary over time.
Have salaries kept up with inflation, fallen behind, or grown faster?
Overall the growth in salary, has shown a steady linear trend.
This is,however, taking in consideration The OVERALL growth amongst all
teams. There are clearly some teams that show some
near-exponential growth in income,which is balanced out by a group
of teams having either less than average income growth, or negative
income growth. Regarding inflation, yes the salaries seem to, overall,
keep up with the creeping inflation.
1.2.9 Problem 9
Problem 9A
Compare payrolls for the teams that are in the same leagues
*To the reader: Please refer to appendix for Plots.
Problem 9B
Compare payrolls for the teams that are in the same division.
*To the reader: Please refer to appendix for Plots.
20
Problem 9C
Are there any interesting characteristics?
Although most teams show a positive growth in payroll, there are stark
differences in rates. For example the Boston Red Sox and the New York
Yankees have monstrous growths in payroll compared to other teams in the
AL league, while other teams such as Kansas City are typically under the
mean growth curve. The same can be said about the NL league, however most
teams in the NL league stay typically closer to the mean growth. There
is one exception of monstrous growth, and that is LAN(I believe this to be
the dodgers).
Have certain teams always had top payrolls over the years?
There are a few teams that have clearly higher payrolls than others, they
are the following: The Boston Red Sox, The New York Yankees, The Los
Angeles Dodgers, and to some extent the Phileadelphia Phillies
Is there a connection between payroll and performance?
One clear correlation that I observed was between the Yankee’s
ridiculous amounts of W-Series wins and their and their enormous payroll.
It appears that the more frequent a team appears in the W-Series,
the higher their payroll, this is clear when observing which teams
lost the World Series multiple times, but still tend to have fairly higher
payrolls, such as CHN and BOS. Of course more recent appearances in
the W-Series garners higher salaries as well, this can be observed in
SFN for example.
1.2.10 Problem 10
10. Has the distribution of home runs for players increased over the years?
*To the reader: Please refer to appendix for Plots.
We can see that there is an overall increase in the amount of homeruns
as time progresses. Compared to previous years, we can observe that
the frequency of larger amounts of homeruns are occuring, but are
21
still infrequent.
22
23
Chapter 2
APPENDIX: Plots for HW6
2.1 Problem 5
2.1.1 Graphs: Number of Games played by World Se-ries Winners and Losers
24
2.2 Problem 7C
2.2.1 Graphs: Baseball Team Payrolls 1971-2013
25
2.3 Problem 9A
2.3.1 Graphs: American League Payroll
26
2.3.2 Graphs: National League Payroll
27
2.4 Problem 9B
2.4.1 Graphs:Division ALC Team Payroll
28
2.4.2 Graphs:Division ALE Team Payroll
29
2.4.3 Graphs:Division ALW Team Payroll
30
2.4.4 Graphs:Division NLC Team Payroll
31
2.4.5 Graphs:Division NLE Team Payroll
32
2.4.6 Graphs:Division NLW Team Payroll
33
2.5 Problem 10
2.5.1 Graphs: Distribution of Home Runs from 1875-2013
34
Chapter 3
APPENDIX: EXPLICIT RCODE
3.1 Part 1
3.1.1 Part 1i
############## START OF PART 1i
INSIDE SHELL:
time grep -n "\"OAK\",\"Oakland" * > OAK.txt |
grep -n "\"SMF\",\"Sacramento" * > SMF.txt |
grep -n "\"LAX\",\"Los Angeles" * > LAX.txt |
grep -n "\"SFO\",\"San Francisco" * > SFO.txt |
grep -n "\"JFK\",\"New York" * > JFK.txt|
wc -l *.txt | sort -r
INSIDE R:
outboundCount <- function(dataFrameList, string){
i <- 1
sum <- 0
while (i <= length(dataFrameList)){
sum <- sum + length(which(dataFrameList[[i]] == string))
i <- i + 1
}
print(sum)
}
35
time <- proc.time()
data1 <- read.csv("2012_August.csv")
data2 <- read.csv("2012_July.csv")
data3 <- read.csv("2012_September.csv")
data4 <- read.csv("2012_October.csv")
data5 <- read.csv("2012_November.csv")
data6 <- read.csv("2012_December.csv")
data7 <- read.csv("2013_January.csv")
data8 <- read.csv("2013_February.csv")
data9 <- read.csv("2013_March.csv")
data10 <- read.csv("2013_April.csv")
data11 <- read.csv("2013_May.csv")
data12 <- read.csv("2013_June.csv")
data1 <- data1$ORIGIN
data2 <- data2$ORIGIN
data3 <- data3$ORIGIN
data4 <- data4$ORIGIN
data5 <- data5$ORIGIN
data6 <- data6$ORIGIN
data7 <- data7$ORIGIN
data8 <- data8$ORIGIN
data9 <- data9$ORIGIN
data10 <- data10$ORIGIN
data11 <- data11$ORIGIN
data12 <- data12$ORIGIN
dataFrameList <- list(data1,data2,data3,
data4,data5,data6,data7,data8,
data9,data10,data11,data12)
outboundCount(dataFrameList, "LAX")
outboundCount(dataFrameList, "SFO")
outboundCount(dataFrameList, "JFK")
outboundCount(dataFrameList, "OAK")
outboundCount(dataFrameList, "SMF")
proc.time() - time
############## END OF PART 1i
36
3.1.2 Part 1ii
############## START OF PART 1ii
INSIDE SHELL:
time grep -n "OAK" * > OAK2.csv |
grep -n "SMF" > SMF2.csv |
grep -n "LAX" * > LAX2.csv |
grep -n "SFO" * > SFO2.csv |
grep -n "JFK" * > JFK2.csv
INSIDE R:
time <- proc.time()
OAK <- read.csv("OAK2.csv")
SFO <- read.csv("SFO2.csv")
SMF <- read.csv("SMF2.csv")
LAX <- read.csv("LAX2.csv")
JFK <- read.csv("JFK2.csv")
names(SFO)[24] <- "DESTINATION"
names(SFO)[15] <- "ORIGIN"
names(OAK)[24] <- "DESTINATION"
names(OAK)[15] <- "ORIGIN"
names(LAX)[24] <- "DESTINATION"
names(LAX)[15] <- "ORIGIN"
names(SMF)[24] <- "DESTINATION"
names(SMF)[15] <- "ORIGIN"
names(JFK)[24] <- "DESTINATION"
names(JFK)[15] <- "ORIGIN"
stuff$ORIGIN <- subset(OAK, OAK$ORIGIN %in% "OAK")
stuff$DESTINATION <- subset(OAK, OAK$DESTINATION %in% "OAK")
outinCount <- function(dFrame, string){
sum <- 0
sum <- (length(subset(dFrame, dFrame$ORIGIN %in% string)$ORIGIN) +
length(subset(dFrame, dFrame$DESTINATION %in% string)$DESTINATION))
print(sum)
}
37
outinCount(LAX,"LAX")
outinCount(JFK,"JFK")
outinCount(SFO,"SFO")
outinCount(OAK,"OAK")
outinCount(SMF,"SMF")
proc.time() - time
############## END OF PART 1ii
3.2 Functions used in Part 2
3.2.1 sortTeamReturnSalary(dataFrame,name)
USE: Used to sort a data frame by team name and return the salary for that
specific a specific year available in the database.
sortTeamReturnSalary <- function(dataFrame,name){
dataWork <- subset(dataFrame, teamID %in% name)
print(c(dataWork$teamID[1] , sum(dataWork$salary)))
}
3.2.2 sortTeamYearly(dataFrame,name)
USE: Used to sort information by teams and print out a team’s total salary for
every year available in the database
sortTeamYearly <- function(dataFrame, name){
#Group together our data for a team
dataWork <-subset(dataFrame, teamID %in% name)
#Group each year together.
year <- (unique(dataWork$yearID))
return(sortYearlyPay(dataWork,year))
}
38
3.2.3 sortYearlyPay(dataFrame,year)
sortYearlyPay <- function(dataFrame, year){
USE: Used inside sortTeamYearly to return the Salary per Year for a specific
Baseball team.
i <- 1
#an empty list to add to
list <- list()
#Print each year’s salary
while( i <= length(year)){
dataYear <- subset(dataFrame, yearID %in% year[i])
list[[i]] <- print(c(dataYear$teamID[1], year[i], sum(dataYear$salary)))
i <- i + 1
}
return(list)
}
3.2.4 frameFill(list)
USE: Used specifically to create data based on my needs:
Which was typically to create a data frame with a teamID, a year ID, and
the "payRoll" that hw6 is interested in.
frameFill <- function(list){
#dummy variables to be removed later
dataFrame <- data.frame("dummy", 0000, 0000,stringsAsFactors=FALSE)
names(dataFrame) <- list("teamID", "yearID", "payRoll")
i <- 1
while(i <= length(list)){
j <- 1
while(j <= length(list[[i]])){
k <- 1
while(k <= length(list[[i]][j])){
row <- list[[i]][j][k]
dataFrame <- rbind(dataFrame,unlist(row))
k <- k + 1
39
}
j <- j +1
}
i <- i + 1
}
return(dataFrame[-1,])
}
3.2.5 sortTeam
USE: Used to create a dataFrame where teams are grouped together.
sortTeam<- function(dataFrame, name){
#Group together our data for a team
dataWork <-subset(dataFrame, teamID %in% name)
return(dataWork)
}
3.2.6 sortYearMean(dataFrame, year)
USE: Used to group together information based on year. Specifically made
to create a mean Line for graphs.
sortYearMean<- function(dataFrame, year){
#Group together our data for a team
dataWork <-subset(dataFrame, yearID %in% year)
return(dataWork)
}
3.2.7 meanFrame(list)
USE: Used to create a new data frame made specifically for creating a mean line
for our graphs.
40
meanFrame <- function(list){
#dummy variables to be removed later
dataFrame <- data.frame(0000, 0000,stringsAsFactors=FALSE)
names(dataFrame) <- list( "yearID", "payRoll")
i <- 1
while (i <= length(list)){
row <-list(list[[i]]$yearID[1],
round(mean(as.integer(unlist(list[[i]][3])))))
dataFrame <- rbind(dataFrame,unlist(row))
i <- i + 1
}
return(dataFrame[-1,])
}
3.2.8 plotStart(plotData,meanData,listNames)
USE: Used to actually plot our information. This will plot a number of lines
On seperate plots in order to avoid messiness. BLACK DOTS here represent
the overall mean growth of salary. It’s functionality saved me a
great deal of time!
plotStart <- function(plotData,meanData,listNames){
list <- lapply(listNames, sortTeam, data = plotData)
#it turns out that color goes from 1 to 657
colorList <- list(sample(657,length(list)))
nameList <- list(unique(plotData$teamID))
#To fix the parameters, we will find the max and min payRoll Magnitude
yMAX <- max(as.integer(plotData$payRoll))
yMIN <- min(as.integer(plotData$payRoll))
xMAX <- max(as.integer(plotData$yearID))
xMIN <- min(as.integer(plotData$yearID))
#To fill up the plot with a certain number of lines
lineCounter <- round(length(listNames)/6)
i <- 1
while( i <= length(list)){
if(i > length(list))break
plot(meanData$yearID, meanData$payRoll,
41
xlab = "Year", ylab = "payRoll Magnitude",
main = "Baseball team’s total payroll by year.",
pch = 20,cex = 2.5, xlim = c(xMIN,xMAX), ylim = c(yMIN,yMAX))
legend(xMIN,yMAX, fill = c(colorList[[1]][i:(i+lineCounter-1)],0),
pch = c(NA,NA,NA,NA,NA,NA,20),
legend = c(nameList[[1]][i:(i+lineCounter-1)], "mean"))
j <- 1
while( j <= lineCounter){
if(i > length(list))break
lines(list[[i]]$yearID, list[[i]]$payRoll, col = colorList[[1]][i])
j <- j + 1
i <- i + 1
}
}
}
3.2.9 plotInformation(dataFrame)
USE: Master function to create data frames and plot information based on the
question at hand. Takes a dataFrame, and serves as a great functional
tool for the hw problems.
plotInformation <- function(dataFrame){
listNames <- c(as.vector(unique(dataFrame$teamID)))
cData <- invisible(lapply(listNames, sortTeamYearly, dataFrame = dataFrame))
dataPlot <- frameFill(cData)
listYear <- c(min(as.integer(dataPlot$yearID)):
max(as.integer(dataPlot$yearID)))
meanData <- dataPlot[with(dataPlot,order(dataPlot$yearID)),]
list <- lapply(listYear, sortYearMean, data = meanData)
meanData <- meanFrame(list)
plotStart(dataPlot, meanData,listNames)
}
3.2.10 divisionPlot(division,leagueAL,leagueNL)
USE: Used specifically for a HW6 Problem. This function is used to create a
42
Data frame that supplies team SALARIES from the Saleries Tables. It was
problematic that the Teams table did not include salaries. So by cross
referencing, we are able to create a data frame for each division that
includes every team’s salary.
divisionPlot <-function(division,league){
dFrame1 <- (subset(league, teamID %in% unique(division$teamID)))
plotInformation(dFrame1)
}
3.2.11 groupFrame(list)
USE: Used specifically for HW6 Problem 10. This function groups together
our lists of list(where each list holds a year’s number of homeruns),
into groups of 20 for easier display and observation.
groupFrame <- function(list){
#Our list to return
groupFrame <- list()
i <- 1
j <- 1
while (j <= (length(list)/20)){
megaList <- list()
while ( i <= (j*20)){
megaList <- c(megaList,list[[i]]$HR)
i <- i + 1
}
groupFrame[[j]] <- unlist(megaList)[-which(megaList == 0)]
j <- j + 1
}
return(groupFrame)
}
43
3.2.12 plot.HR(list)
plot.HR <- function(list){
USE: Used specifically for HW6 Problem 10. This is used to explicitly plot
and label our graphs of interest.
i <- 1
mainList <- c( "Distributions of Homeruns over 1871-1891",
"Distributions of Homeruns over 1891-1911",
"Distributions of Homeruns over 1912-1932",
"Distributions of Homeruns over 1933-1953",
"Distributions of Homeruns over 1954-1974",
"Distributions of Homeruns over 1975-1995",
"Distributions of Homeruns over 1996-2013")
j <- 1
maxList <- list()
while(j <= length(list)){
maxList <- c(maxList, max(table(list[[j]])))
j <- j + 1
}
while(i <= length(list)){
barplot(table(list[[i]]), main = mainList[i],xlim = c(min(unlist(list)),max(unlist(list))), ylim = c(0,max(unlist(maxList))))
i <- i + 1
}
}
3.3 Part 2
3.3.1 Problem 1
#1. What years does the data cover? are there data for each of these years?
dataDate <- dbGetQuery(db, "Select yearID from Teams")
print(c(min(dataDate),max(dataDate)))
[1] 1871 2013
#The database covers the years 1871 tp 2013,
#there exists data for each of these years
44
3.3.2 Problem 2
#2. How many (unique) people are included in the database?
#How many are players, managers, etc?
#MANAGER TABLE FOR MANAGERS
#MASTER TABLE FOR PLAYERS
#THEN ADD THE SUM
sum <- 0
dbListFields(db, "Managers")
[1] "playerID" "yearID" "teamID" "lgID" "inseason" "G"
[7] "W" "L" "rank" "plyrMgr"
data <- dbGetQuery(db, "Select playerID from Managers")
length(unique(data$playerID))
[1] 682
sum <- sum + length(unique(data$playerID))
data <- dbGetQuery(db, "Select playerID from Master")
length(unique(data$playerID))
[1] 18354
sum <- sum + length(unique(data$playerID))
print(sum)
[1] 19036
#There are 682 UNIQUE managers and 18354 UNIQUE Baseball Players.
#There is a grand total of 19036 UNIQUE people.
3.3.3 Problem 3
#3. What team won the World Series in 2000?
data <- (dbGetQuery(db, "Select WSWin , teamID, yearID from Teams"))
dataSub <- subset(data, yearID %in% 2000)
dataSub[which(subset(data, yearID %in% 2000)$WSWin == "Y"),]
WSWin teamID yearID
2334 Y NYA 2000
#The team that won was NYA.
45
3.3.4 Problem 4
#4. What teams lost the World Series each year?
data <- (dbGetQuery(db, "Select WSWin , teamID, yearID, LgWin from Teams"))
dataSub <- subset(data, LgWin %in% "Y" )
dataSub <- subset(dataSub, WSWin %in% "N")
dataSub <- data.frame(dataSub$teamID, dataSub$yearID)
dataSub
3.3.5 Problem 5
#5. Do you see a relationship between the number of games won in a season
# and winning the World Series?
par(mfrow =c(2,1))
data <- (dbGetQuery(db, "Select WSWin , G, yearID from Teams"))
dataSub <- subset(data, WSWin %in% "Y")
plot(dataSub$yearID,dataSub$G, type = "l",
ylab = "# of Games Won", xlab = "Year",
main = "Number of Games played by World Series Winners")
#There are a few outliers, noted by the sharp drops on the plot.
#But it seems that
from 1984 the trend has been that teams typically winning a large number
of games wins the world series.
#However, to check this trend we should also look at the teams that DID
#not win the world series.
data <- (dbGetQuery(db, "Select WSWin,G , yearID, LgWin from Teams"))
dataSub <- subset(data, LgWin %in% "Y" )
dataSub <- subset(dataSub, WSWin %in% "N")
plot(dataSub$yearID,dataSub$G, type = "l",
ylab = "# of Games Won", xlab = "Year",
main = "Number of Games played by World Series Losers")
#We can clearly see a similarity between the two graphs,
#so we cannot say that winning more games during playoffs will decide
#who wins the
World Series.
46
3.3.6 Problem 6
#6. In 2003, what were the three highest salaries?
#(We refer here to unique salaries, i.e., more than one
# player might be paid one of these salaries.)
data <- (dbGetQuery(db, "Select salary, yearID from Salaries"))
dataSub <- subset(data, data$yearID %in% 2003)
print(c(
sort(unique(dataSub$salary))[length(sort(unique(dataSub$salary)))],
sort(unique(dataSub$salary))[length(sort(unique(dataSub$salary)))-1],
sort(unique(dataSub$salary))[length(sort(unique(dataSub$salary)))-2]))
[1] 22000000 20000000 18700000
#The above will print the 3 Highest saleries in the year 2003 from
#largest to smallest.
#The Highest 3 salaries are: 22,000,000 20,000,000 18,700,000
#(Sidenote: Wow. . .)
3.3.7 Problem 7
#7.
# a) For 1999, compute the total payroll
# of each of the different teams.
# I am understanding this as the combined saleries
# of EVERYONE on the team.
# b) Next compute the team payrolls for all years in the database for
# which we have salary information.
# c) Display these in a plot.
# I understand that this will be a very messy plot. I will try
# to present it clearly.
Problem 7A
#PART A
data <- (dbGetQuery(db, "Select salary, yearID, teamID from Salaries"))
data1999 <- subset(data,yearID %in% 1999)
#I will just make a function to do this for me.
47
list <- (as.vector(unique(data1999$teamID)))
invisible(lapply(list, sortTeamReturnSalary, dataFrame = data1999))
#The above will print out the team name,
#a year, and the year’s total payroll.
Problem 7B
#PART B
#I will write another function here:
#The first must group all of the yearly data by team.
#the second must group all of the data by year and
#print each year’s total payroll
#The previous function (sortTeamReturnSalary) will not work, and will
#be a pain to remake to fit both needs.
listNames <- (as.vector(unique(data$teamID)))
invisible(lapply(listNames, sortTeamYearly, dataFrame = data))
Problem 7C
#PART C
#This is how I will display "These" in a plot.
#Group each total salary by team name
#Put each dataFrame into a list. The first item will be the first plot,
#succeeding items will be added.
#Each line should be a different team on the same plot.
#So i plotted everything on one graph, and I now realize that it that
#plot is useless.
#Instead I will make seperate plots with fixed parameters,
#allowing us to compare the plots
par(mfrow=c(2,3))
plotInformation(data)
3.3.8 Problem 8
#8. Study the change in salary over time.
#Have salaries kept up with inflation, fallen behind, or grown faster?
#Overall the growth in salary, has shown a steady linear trend.
48
#This is,however, taking in consideration The OVERALL growth amongst
#all teams. There are clearly some teams that show some
#near-exponential growth in income,which is balanced out by a group
#of teams having either less than averageincome growth, or negative
#income growth. Regarding inflation, yes the salaries seem to, overall,
#keep up with the creeping inflation.
3.3.9 Problem 9
#9. Compare payrolls for the teams that are in the same leagues,
# and then in the same divisions.
#For me: There are 3 divisions and 2 Leagues in this database for Salaries
# But there are 7 leagues in the Team’s Table
# CLARIFICATIONS: The salaries database is the MAJOR
# LEAGUES. . .leagues.
# Within the Teams database,
# minor leagues are included.
# APPARENTLY TEAMS CAN SWITCH LEAGUES,
# Some strangeness in the graphs!
# For Example: HOUSTON is in the AL league in 2013
# instead of the NL league!
Problem 9A
dataLeague <-
dbGetQuery(db, "Select yearID, teamID, lgID, salary from Salaries")
#Side note: I no longer needed the league category,
#Since we’re grouping them in Explicit Leagues.
#So I just got rid of them
leagueAL <- subset(
dataLeague, lgID %in% "AL")
[,!(names(subset(dataLeague, lgID %in% "AL"))
%in% "lgID")]
leagueNL <- subset(
dataLeague, lgID %in% "NL")
[,!(names(subset( dataLeague, lgID %in% "NL"))
49
%in% "lgID")]
plotInformation(leagueAL)
plotInformation(leagueNL)
Problem 9B
#We must find a way to deal with division
dataDivisions <- dbGetQuery(db, "Select yearID, teamID, lgID, divID from Teams")
#SPLIT THEM INTO LEAGUES FIRST
#AL LEAGUE
divisionALC <- subset(dataDivisions, divID %in% "C")[,!(names(subset(dataDivisions , divID %in% "C")) %in% "divID")]
divisionALE <- subset(dataDivisions, divID %in% "E")[,!(names(subset(dataDivisions , divID %in% "E")) %in% "divID")]
divisionALW <- subset(dataDivisions, divID %in% "W")[,!(names(subset(dataDivisions , divID %in% "W")) %in% "divID")]
#NL LEAGUE
divisionNLC <- subset(dataDivisions, divID %in% "C")[,!(names(subset(dataDivisions , divID %in% "C")) %in% "divID")]
divisionNLE <- subset(dataDivisions, divID %in% "E")[,!(names(subset(dataDivisions , divID %in% "E")) %in% "divID")]
divisionNLW <- subset(dataDivisions, divID %in% "W")[,!(names(subset(dataDivisions , divID %in% "W")) %in% "divID")]
#The strategy:
#We will look for each division within both leagues, and create a new dataFrame, so we can through it into
#Our plotInformation Function.
par(mfrow=c(3,3))
divisionPlot(divisionALC,leagueAL)
divisionPlot(divisionALE,leagueAL)
par(mfrow=c(2,3))
divisionPlot(divisionALW,leagueAL)
divisionPlot(divisionNLC,leagueNL)
divisionPlot(divisionNLE,leagueNL)
divisionPlot(divisionNLW,leagueNL)
Problem 9C
#Are there any interesting characteristics?
50
#Although most teams show a positive growth in payroll, here is stark
#differences in rates.
#For example the Boston Red Sox and the New York Yankees have a monstrous
#growth in payroll compared to
#Other teams in the AL league, while other teams such as Kansas City are
#typically under the mean growth curve.
#The same an be said about the NL league, however most teams in the NL
#league stay typically closer to the mean growth.
#There is one exception of monstrous growth, and that is LAN(I believe
#this to be the dodgers).
#Have certain teams always had top payrolls over the years?
#There are a few teams that have clearly higher payrolls than others, they
#are the following:
#The Boston Red Sox, The New York Yankees, The Los Angeles Dodgers, and to
#some extent the Phileadelphia Phillies
#Is there a connection between payroll and performance?
#One clear correlation that I observed was between the Yankee’s ridiculous
#amounts of W-Series wins and their
#And their enormous payroll. It appears that the more frequent teams in
#the W-Series have much higher
#Pay Rolls, this is clear when observing which teams lost the world series
#multiple times, but still
#tend to have fairly higher payrolls, such as CHN,NYA. Of course more
#recent appearances in W-Series
#Garners higher payRolls, which can be observed in the SFN for example.
3.3.10 Problem 10
#10. Has the distribution of home runs for players increased over the years?
dataHomeRun <- dbGetQuery(db, "Select HR, yearID, playerID from Batting")
#Just to Cross Reference, that we have a single # of HR’s
dataHomeRunCheck <- dbGetQuery(db, "Select HR from Pitching")
#if this is 0, then we definitely have all "recorded" HR’s
sum(dataHomeRunCheck$HR) - sum(dataHomeRun$HR, na.rm = TRUE)
#Not zero, but insignificant to whole.
51
#What do we want to produce?
#We want to produce a graph, that would show yearly distributions of #of
#occurences vs #of homeruns
#By Year. We need to keep scaling in mind, so that we may ’eyeball’
#multiple graphs with ease.
#Let’s make a dataFrame of all our information.
#We will remove the NA’s and 0’s of our data as we are more concerned with
#the "growth" of home runs.
HR <- (dataHomeRun$HR[-which(is.na(dataHomeRun$HR))])
yearID <- (dataHomeRun$yearID[-which(is.na(dataHomeRun$HR))])
#dataFrame of players, yearID and HR
dFrame <- data.frame(yearID)
dFrame$HR <- HR
names(dFrame) <- list("yearID", "HR")
listYear <- min(yearID):max(yearID)
sortedFrame <- lapply(listYear, sortYearMean, data = dFrame)
#We will group our data into every twenty years,
#we will also remove zeros here
gFrame <- groupFrame(sortedFrame)
#This will print the plot we are interseted in
par(mfrow=c(4,2))
#PLOT the histograms
plot.HR(gFrame)
#We can see that there is an overall increase in the amount of
#homeruns as time progresses.
#We can also observe that the frequency of larger amounts of
#homeruns are occuring,
#but are still infrequent when compared to previous years.
52
Chapter 4
Extra Credit: A Collection ofThoughts
4.1 A Small Quotes from Piazza
4.2 Results
4.2.1 Home Runs and Handedness
Here we will explore the relationship between hitting home runs and a
player’s handedess. There is a slight difference in the amount of HR’s a
particular player is able to make in their season, based on their handedness.
53
Based on Population: All Available Players
As we observe the entire population of available players in the database,
we can see that righthanded players tend to be the most likely to have 0
homeruns in a season. Left handedplayers are more likely to hit at least a
single home run, and ambidextrous players are the most likely to hit at
least 1 home run bewteen the three groups.
54
Based on Population: Home Run Hitters
When we consider the population of players who have hit at least a
single home run, we can observe a different relationship. Right handed
players tend to
4.2.2 Pitchers: Throwing Home Runs and Handedness
Does the hand a pitcher use show any effect on the chance of a
home run being hit?
55
We can see that Ambidextrous pitchers have the highest population size
for throwing zero homeruns. Followed by left handed pitchers, and
right handed pitches being the smallest of the three populations.
Although there is a difference amongst the "0" population, it does not
appear to be significant in magnitude, and we can conclude that specific
handedness does not necessarily dictate the chance of throwing a home run.
4.2.3 Errors in the World Series
I wondered the following:
"With all of the pressure in the final game of the season, are there
56
many errors that occur? Do teams win by capitalizing on those
errors? Have they decreased or increased or stayed roughly the same
over the years?"
We can clearly observe that there is a definite decreasing trend
in the amount of errors being made in the World Series as time progresses.
The notable trend seems to follow a hyperbolic curve, and it would seem that
in future cases, there will eventually be a "cap off" for the number of errors
being made.
We can also note that typically W-Series winners make fewer mistakes than
57
the teams that they beat, but there are times when the winners do make more
errors during the game than their opponent. It is unfortunate, that the
losing team did not make better use of those opportunities!
The most striking feature of the graph is the ridiculous amount of
errors made in the games in the 1880’s, seriously. What was going on?
Unfortunately, I could not find any answer through use of search
engines. There is also a great deal of data missing between 1880s
and the 1900.
4.2.4 Hall of Fame and Salary
Players who are inducted into the Hall of Fame, are players voted into the
hall by a comittee who come to a decision via elections. I was curious to know
how these famed players’ salaries would match up with today’s player’s salaries.
58
At first glance it may seem that hall of famers almost never come close to the
overall average salary of baseball players, but the reason for this is the
inflation of salaries that appear in our data.
4.2.5 Positions and Salary
In this portion I wanted to see if there was a relationship between a player’s
salary and their position. Do particular positions get paid noticeablely
larger salaries, and does that happen more frequently for particular positions?
A particular group: Single Position Players
This was my initial graph at first. I thought that players were often
59
picked to play a specific position, so I aimed to find a player’s position
and cross reference them with the salaries table in order to find a relationship
between a position and a salary. However as the above graph shows, there are
very few players who play a single position(with the exception of
catchers and pitchers; especially the latter). As we can already see, it would
appear that pitchers show variation in magnitudes of salaries.
Salaries by Position
After coming to the understanding that players will often rotate around
positions, I decided to simply cross reference players in the fielding table.
with their salary in the salaries table. These graphs, all having the same
60
scaling, show that of all the positions, there aren’t any ridiculous
focus on a single position. . .except for pitchers. Pitchers tend to
have higher salaries compared to the other positions, which makes sense.
Pitchers have the most outliers in Salary (Those ostracized points in
the pitcher’s graph.) It would also appear that First basemen are considered
more "valuable" in resource than the other two bases. And left fielders
and center fielders are much more valuable than Right Fielders.(Accidentally
labeled as Right Stop. Apologies.)
4.2.6 All about the All Stars
Are All Stars payed anymore on a team than their "Non Star" team mates?
The All Star and Non Star Salaries
Yes.
61
There has always been a stark difference between the two groups salaries.
I would like to bring to attention the black line, which represents the mean
salary of the entire population. All Stars seem to always be above the mean,
placing the rest of the players under the mean, and making them seem less
"valuable assets" to the team.
Side anecdote: After a brief dialogue with my room mate, he has me
cynically believing that baseball is a game of All Stars, and "others."
Salary Disparities: Amongst All Stars and Non Stars
The graph above shows us the growth rate of disparity, the difference
between the All Star and Non Star salary. The curve shows a slightly
62
quadratic nature, and always follows a positive trend. This could indicate
that in the near future, the disparity will only become larger and larger.
Salary Differences: Amongst All Stars
How does the distribution of Salary differ amongst the All Stars themselves?
As we can see the distribution of Salary is definitely not normal, and is
definitely not distributed equally. The distribution is heavily skewed left.
It would benefit us to explore the visual representation of All Star Salary
Disparity through other means.
63
As we can see, the average mean salary of All Stars is relatively low when
being compared to the population of All Stars. We can clearly see that as time
increases, that there is an increasing trend for Salary in Millions, and this
is partially due to the inflation of modern salaries. We can quickly
verify that salary is not roughly the same amongst All Star players every season.
It is worth mentioning that the red line of the graph represnts the mean salary
of the All Stars of THAT year.
4.2.7 The All Star Algorithm
Unfortunately I ran out of time when my room mate proposed to me a radical
idea. He asked of me the following:
64
"With the data that you do have, can you actually come up with a model
to express what a generic All Star Plater is? I mean can you create a
model that clearly states something like the following:
if this particular critera is not achieved or exceeded, we can predict
that a player is or is not going to be an All Star player."
The answer I have is yes. It is possible to create a model to do so, but
we would have to do some data retrieval in order to build our model.
The following subsection will detail how I would approach my room mate’s
suggestion.
The Algorithm
I would first consider which factors could be possible good predictors
of the being an all star player. I would take an approach much like our
homework for classfiying an item as SPAM or HAM. In this case we are
classifying our players as STAR or BARS. I would then observe which
variables show distinct differences between players that are STARS and
players that belong in BARS.
Taking those variables, I would create my data frame and create
a model using categorical analysis methods I learned from STA 138.
I would most likely employ a multiple logistic regression using
R’s glm function with the family set to binomial. Then I would use this
model to predict STARS out of the data that I have used, and see how well
it could predict BARS and STARS from our data set. I would then bring
this model to my room mate and verify the model’s parameters with him.
(He has greater experience and knowledge in baseball then me, and
I would like to make sure that the factors I had selected are in fact
useful in seperating the STARS from the BARS)
65
Chapter 5
EXPLICIT:R CODE
5.1 Functions Used in Bonus Section
5.1.1 matchFrame(dataFrame,xFrame)
USE: Takes two frames for cross referencing and returns a combined frame.
This is used to combine our dataframe using playerID’s.
matchFrame <- function(dataFrame,xFrame){
i <- 1
dFrame <- data.frame("John", 1776, "R", stringsAsFactors = FALSE)
names(dFrame) <- list("playerID", "HR","HAND")
while (i <= length(xFrame$playerID)){
workHorse <- subset(dataFrame, playerID %in% xFrame$playerID[i])
if(is.na(workHorse$playerID[1])) i <- i + 1
else{
crossFrame <- subset(xFrame, playerID %in% xFrame$playerID[i])
row <- list(workHorse$playerID[1],
crossFrame$HR[1], workHorse$bats[1])
dFrame <- rbind (dFrame,row)
i <- i + 1
print(row)
print(c(i, "out of", length(xFrame$playerID)))
}
}
return(dFrame)
66
}
5.1.2 groupNameReturnHR(dataFrame,name)
USE: This function groups together a player’s number of home runs,
so that we may observe a player’s total number of homeruns during
their career.
groupNameReturnHR <- function(dataFrame, name){
i <- 1
dFrame <- data.frame("John", 1776, stringsAsFactors = FALSE)
#Dummy variables to be removed later
names(dFrame) <- list("playerID", "HR" )
while (i <= length(name[[1]])){
workHorse <- subset(dataFrame, playerID %in% name[[1]][i])
row <- list(unique(workHorse$playerID), sum(workHorse$HR))
dFrame <- rbind(dFrame,row)
i <- i + 1
}
return(dFrame) #Remove the first row
}
5.1.3 matchFrameFame(dataFrame,xFrame)
USE: Used for cross referencing a player’s ID and creates a
brand new data frame that returns players that are Hall
of Famers.
matchFrameFame <- function(dataFrame,xFrame){
i <- 1
dFrame <- data.frame("John", 1776, stringsAsFactors = FALSE)
names(dFrame) <- list("playerID", "salary")
while (i <= length(xFrame)){
workHorse <- subset(dataFrame, playerID %in% xFrame[i])
if(is.na(workHorse$playerID[1])) i <- i + 1 #If there is no match
else{
row <- list(workHorse$playerID[1],workHorse$salary[1])
dFrame <- rbind (dFrame,row)
67
i <- i + 1
}
}
return(dFrame)
}
5.1.4 matchUniqueFrame(dataFrame,xFrame)
USE: Matches players in the unique position frame to their respective
salary.
matchUniqueFrame <- function(dataFrame,xFrame){
i <- 1
dFrame <- data.frame("John", "C", 1 ,stringsAsFactors = FALSE)
names(dFrame) <- list("playerID", "POS", "salary")
while (i <= length(xFrame$playerID)){
workHorse <- subset(dataFrame, playerID %in% xFrame$playerID[i])
if(is.na(workHorse$playerID[1])) i <- i + 1
else{
crossFrame <- subset(xFrame, playerID %in% xFrame$playerID[i])
row <- list(workHorse$playerID[1], crossFrame$Pos[1],
workHorse$salary[1])
dFrame <- rbind (dFrame,row)
i <- i + 1
print(row)
print(c(i, "out of", length(xFrame$playerID)))
}
}
return(dFrame)
}
5.1.5 uniquePositions(dataFrame,names)
USE: Used to find players that played only ONE position during
their entire career in baseball.
uniquePositions<- function(dataFrame, names){
68
i <- 1
dFrame <- data.frame("John", "C", stringsAsFactors = FALSE)
#Dummy variables to be removed later
names(dFrame) <- list("playerID", "Pos")
while(i <= length(names[[1]])){
if(length((subset(dataFrame,
playerID %in% names[[1]][i]))$playerID)!= 1){
i <- i + 1
print(c(i, "out of", length(names[[1]])))
}
else{
workHorse <- subset(dataFrame, playerID %in% names[[1]][i])
row <- list(workHorse$playerID, workHorse$POS)
dFrame <- rbind(dFrame,row)
i <- i + 1
print(row)
print(c(i, "out of", length(names[[1]])))
}
}
return(dFrame) #Remove the first row
}
5.1.6 plotPosSalary(uniqueSalary)
USE: Used specifically to plot single position baseball
player Salaries grouped by Positions.
plotPosSalary <- function(uniqueSalary){
pitchers <- uniqueSalary[which(uniqueSalary$POS == "P"),]
pitchers <- pitchers[-which(pitchers$salary == max(pitchers$salary)),]
catchers <- uniqueSalary[which(uniqueSalary$POS == "C"),]
firstBase <- uniqueSalary[which(uniqueSalary$POS == "1B"),]
secondBase <- uniqueSalary[which(uniqueSalary$POS == "2B"),]
thirdBase <- uniqueSalary[which(uniqueSalary$POS == "3B"),]
shortStop <- uniqueSalary[which(uniqueSalary$POS == "SS"),]
leftField <- uniqueSalary[which(uniqueSalary$POS == "LF"),]
centerField <- uniqueSalary[which(uniqueSalary$POS == "CF"),]
69
par(mfrow=c(3,4))
plot(c(1:length(pitchers$playerID)), pitchers$salary, pch = 20,
ylim = c(0, max(pitchers$salary)), main = "Salary by Positions:
Pitchers", xlab = "Pitcher #", ylab = "Salary Magnitude")
plot(c(1:length(catchers $playerID)), catchers $salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Catchers ", xlab = "Catchers #",
ylab = "Salary Magnitude")
plot(c(1:length(firstBase $playerID)), firstBase $salary, pch = 20,
ylim = c(0, max(pitchers$salary)), main = "Salary by Positions: First
Base ", xlab = "First Base #", ylab = "Salary Magnitude")
plot(c(1:length(secondBase$playerID)), secondBase$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Second Base", xlab = "Second Base #",
ylab = "Salary Magnitude")
plot(c(1:length(thirdBase$playerID)), thirdBase$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Third Base", xlab = "Third Base #",
ylab = "Salary Magnitude")
plot(c(1:length(shortStop$playerID)), shortStop$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Short Stop", xlab = "Short Stop #",
ylab = "Salary Magnitude")
plot(c(1:length(leftField$playerID)), leftField$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Left Field", xlab = "Left Field #",
ylab = "Salary Magnitude")
plot(c(1:length(centerField$playerID)), centerField$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Center Field", xlab = "Center Field #",
ylab = "Salary Magnitude")
70
}
5.1.7 plotPosSalaryAll(uniqueSalary)
USE: Used specifically to plot all baseball player Salaries grouped
by Positions.
plotPosSalaryAll <- function(uniqueSalary){
pitchers <- uniqueSalary[which(uniqueSalary$POS == "P"),]
pitchers <- pitchers[-which(pitchers$salary == max(pitchers$salary)),]
catchers <- uniqueSalary[which(uniqueSalary$POS == "C"),]
firstBase <- uniqueSalary[which(uniqueSalary$POS == "1B"),]
secondBase <- uniqueSalary[which(uniqueSalary$POS == "2B"),]
thirdBase <- uniqueSalary[which(uniqueSalary$POS == "3B"),]
shortStop <- uniqueSalary[which(uniqueSalary$POS == "SS"),]
leftField <- uniqueSalary[which(uniqueSalary$POS == "LF"),]
centerField <- uniqueSalary[which(uniqueSalary$POS == "CF"),]
rightStop <- uniqueSalary[which(uniqueSalary$POS == "RF"),]
outField <- uniqueSalary[which(uniqueSalary$POS == "OF"),]
desHitter <- uniqueSalary[which(uniqueSalary$POS == "DH"),]
par(mfrow=c(3,4))
plot(c(1:length(pitchers$playerID)), pitchers$salary, pch = 20,
ylim = c(0, max(pitchers$salary)), main = "Salary by Positions:
Pitchers", xlab = "Pitcher #", ylab = "Salary Magnitude")
plot(c(1:length(catchers $playerID)), catchers $salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Catchers ", xlab = "Catchers #",
ylab = "Salary Magnitude")
plot(c(1:length(firstBase $playerID)), firstBase $salary, pch = 20,
ylim = c(0, max(pitchers$salary)), main = "Salary by Positions: First
Base ", xlab = "First Base #", ylab = "Salary Magnitude")
plot(c(1:length(secondBase$playerID)), secondBase$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Second Base", xlab = "Second Base #",
71
ylab = "Salary Magnitude")
plot(c(1:length(thirdBase$playerID)), thirdBase$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Third Base", xlab = "Third Base #",
ylab = "Salary Magnitude")
plot(c(1:length(shortStop$playerID)), shortStop$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Short Stop", xlab = "Short Stop #",
ylab = "Salary Magnitude")
plot(c(1:length(leftField$playerID)), leftField$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Left Field", xlab = "Left Field #",
ylab = "Salary Magnitude")
plot(c(1:length(centerField$playerID)), centerField$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Center Field", xlab = "Center Field #",
ylab = "Salary Magnitude")
plot(c(1:length(rightStop$playerID)), rightStop$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Right Stop", xlab = "Right Stop #",
ylab = "Salary Magnitude")
plot(c(1:length(outField$playerID)), outField$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: Out Field", xlab = "Out Field #",
ylab = "Salary Magnitude")
plot(c(1:length(desHitter$playerID)), desHitter$salary, pch = 20,
ylim = c(0, max(pitchers$salary)),
main = "Salary by Positions: D-Hitter", xlab = "Designated Hitter #",
ylab = "Salary Magnitude")
}
72
5.1.8 matchNonUniqueFrame(dataFrame,xFrame)
USE: Used to match together baseball player’s positions,
with their salaries. Unlike the unique match, this will
put a salary value to each of a player’s position.
matchNonUniqueFrame <- function(dataFrame,xFrame){
i <- 1
dFrame <- data.frame("John", "C", 1 ,stringsAsFactors = FALSE)
names(dFrame) <- list("playerID", "POS", "salary")
while (i <= length(xFrame$playerID)){
workHorse <- subset(dataFrame, playerID %in% xFrame$playerID[i])
if(is.na(workHorse$playerID[1])) i <- i + 1
else{
crossFrame <- subset(xFrame, playerID %in% xFrame$playerID[i])
j <- 1
while (j <= length(crossFrame$playerID)){
row <- list(workHorse$playerID[1], crossFrame$POS[j],
workHorse$salary[1])
dFrame <- rbind (dFrame,row)
j <- j + 1
}
i <- i + 1
}
}
return(dFrame)
}
5.1.9 sortSalaryYearly
USE: Used to create a data frame sorted by years, of salaries
of baseball players. In our case, this was used to organize
the All Star’s Salaries for further processing.
sortSalaryYearly <- function(dataFrame, yearList){
#Group together our data for a team
dFrame <- data.frame(1000,1000)
73
names(dFrame) <- c("yearID", "Salary")
i <- 1
while ( i <= length(dataFrame)){
row <-subset(dataFrame[[i]], yearID %in% yearList[i])
dFrame <- rbind(dFrame,row)
i <- i + 1
}
#Group each year together.
return(dFrame[-1,])
}
5.1.10 Home Runs and Handedness
dataBats <- dbGetQuery(db, "Select playerID,bats from MASTER")
#This data only has player ID and Batting hand.
#We need to cross reference this table with the HR’s list.
dataHR <- dbGetQuery(db, "Select playerID, HR from Batting")
#Then we need to create a dataFrame that matches
#a unique player name, a handedness and the number of HR’s they have.
#We also need to remove NA’s
dataBats <- dataBats[-which(is.na(dataBats$bats)),]
dataHR <- dataHR[-which(is.na(dataHR$HR)),]
#This function will return a row of a playerID and their total number of HR’s
listNames <- list(unique(dataHR$playerID))
crossFrame <- lapply(listNames, groupNameReturnHR, dataFrame = dataHR)
xFrame <- groupNameReturnHR(dataHR,listNames)
#This function will match togther players based on their player ID
tableBattingHR <- matchFrame(dataBats,xFrame)
tableBattingHR <- tableBattingHR[-1,]
#Split them into 3 groups, RH, LH, BH
rightHand <- tableBattingHR[which(tableBattingHR$HAND == "R"),]
leftHand <- tableBattingHR[which(tableBattingHR$HAND == "L"),]
bothHand <- tableBattingHR[which(tableBattingHR$HAND == "B"),]
par(mfrow=c(3,1))
74
Based on Population: All Available Players
#We will plot %of players with that frequency.
barplot(table(rightHand$HR)/length(rightHand$playerID),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.6),
main = "Right Handed Players Homerun Distribution",
ylab = "% of all Right-Handed Players", xlab = "# of HomeRuns")
barplot(table(leftHand$HR)/length(leftHand$playerID),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.6), main = "Left Handed Players Homerun Distribution",
ylab = "% of all Left-Handed Players", xlab = "# of HomeRuns")
barplot(table(bothHand$HR)/length(bothHand$playerID),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.6),
main = "Ambidextrous Handed Players Homerun Distribution",
ylab = "% of all Ambidextrous Players", xlab = "# of HomeRuns")
Based on Population: Home Run Hitters
barplot(table(rightHand$HR[-which(rightHand$HR == 0)])/length(rightHand$HR),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.08), main = "Right Handed Players Homerun Distribution",
ylab = "% of all Right-Handed Players", xlab = "# of HomeRuns")
barplot(table(leftHand$HR[-which(leftHand$HR == 0)])/length(leftHand$HR),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.08), main = "Left Handed Players Homerun Distribution",
ylab = "% of all Left-Handed Players", xlab = "# of HomeRuns")
barplot(table(bothHand$HR[-which(bothHand$HR == 0)])/length(bothHand$HR),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.08),
main = "Ambidextrous Handed Players Homerun Distribution",
ylab = "% of all Ambidextrous Players", xlab = "# of HomeRuns")
75
5.1.11 Throwing Home Runs and Handedness
#Pitching Hand # of HR’s and/or Hits (Master # Pitching)
#Does it make a difference?
dataPitch <- dbGetQuery(db, "Select playerID,throws from MASTER")
#This data only has player ID and Batting hand.
#We need to cross reference this table with the HR’s list.
dataHR <- dbGetQuery(db, "Select playerID, HR from Pitching")
#Then we need to create a dataFrame that matches
#a unique player name, a handedness and the number of HR’s they have.
#We also need to remove NA’s
dataPitch<- dataPitch[-which(is.na(dataPitch$throws)),]
listNames <- list(unique(dataHR$playerID))
xFrame <- groupNameReturnHR(dataHR,listNames)
tablePitchingHR <- matchFrame(dataBats,xFrame)
tablePitchingHR <- tablePitchingHR[-1,]
Split them into 3 groups, RH, LH, BH
rightHand <- tablePitchingHR[which(tablePitchingHR$HAND == "R"),]
leftHand <- tablePitchingHR[which(tablePitchingHR$HAND == "L"),]
bothHand <- tablePitchingHR[which(tablePitchingHR$HAND == "B"),]
par(mfrow=c(3,1))
#We will plot %of players with that frequency.
barplot(table(rightHand$HR)/length(rightHand$playerID),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.2),
main = "Right Handed Pitchers Homerun Distribution",
ylab = "% of all Right-Handed Pitchers", xlab = "# of HomeRuns")
barplot(table(leftHand$HR)/length(leftHand$playerID),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.2),
main = "Left Handed Pitchers Homerun Distribution",
ylab = "% of all Left-Handed Pitchers", xlab = "# of HomeRuns")
barplot(table(bothHand$HR)/length(bothHand$playerID),
xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),
ylim = c(0,.2),
main = "Ambidextrous Handed Pitchers Homerun Distribution",
76
ylab = "% of all Ambidextrous Pitchers", xlab = "# of HomeRuns")
5.1.12 Errors in the World Series
#Number of Errors in W-Series Games
par(mfrow=c(1,1))
data <- (dbGetQuery(db, "Select yearID,LgWin,WSWin,E from Teams"))
Winners <- subset(data, WSWin %in% "Y")
data <- (dbGetQuery(db, "Select yearID,LgWin,WSWin,E from Teams"))
dataSub <- subset(data, LgWin %in% "Y" )
Losers <- subset(dataSub, WSWin %in% "N")
plot(Winners$yearID,Winners$E,pch = 20, col = "black", cex = 1.5,
main = "# of Errors made by W-Series losers and winners from 1884 - 2013",
xlab = "Year", ylab = "# of Errors made")
lines(Losers$yearID,Losers$E, pch = 20, col = "red", cex = 1.5)
legend(x = 1960, y = 500, legend = c("W-Series Winners", "W-Series Losers"),
fill = c("black", "red"))
5.1.13 Hall of Fame and Salary
dataSalary <- dbGetQuery(db, "Select playerID,teamID,salary from Salaries")
dataFame <- dbGetQuery(db, "Select playerID,inducted from HallofFame")
#We need to get the players that are in the hall of Fame
fameID <- dataFame[which(dataFame$inducted == "Y"),]
fameID <- fameID[,-2] #Remove the inducted column, now we just have names.
#This Frame contains all salaries of HALL of Famers
fameFrame <- matchFrameFame(dataSalary,fameID)
#Let’s plot this.
par(mfrow=c(1,1))
plot(c(1:length(fameFrame$playerID)),fameFrame$salary,
main = "Salary of Hall of Famers",
xlab = "Hall of Famer #", ylab = "Salary", pch = 20, col = "red")
lines(c(0:length(fameFrame$playerID)),rep(mean(dataSalary$salary),
length(fameFrame$playerID)+1), cex = 5, lwd = 10)
legend(x = 2.5, y = 1500000,
77
legend = c("Hall of Famers", "Overall Mean Salary"),
fill = c("red", "black"))
5.1.14 Positions and Salary
#Are certain positions payed more than others?
#This data only concerns people that played a SINGLE position in their career.
#Which positions are more lucrative?
dataPos <- dbGetQuery(db, "Select playerID, Pos from Fielding")
dataPos <- (unique(dataPos$POS))
dataSalary <- dbGetQuery(db, "Select playerID,salary from Salaries")
A particular group: Single Position Players
names <- list(unique(dataPos$playerID))
uniqueFrame <- uniquePositions(dataPos,names)
uniqueFrame <- uniqueFrame[-1,]
uniqueSalary <- matchUniqueFrame(dataSalary, uniqueFrame)
Salaries by Position
nonUniqueSalary <- matchNonUniqueFrame(dataSalary, dataPos)
plotPosSalaryAll(nonUniqueSalary)
5.1.15 All about the All Stars
ataStars <- dbGetQuery(db, "Select playerID from AllstarFULL")
dataStars <- unique(dataStars)
dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")
dataStars <- dataStars[which(dataStars$playerID %in% unique(dataSalary$playerID) == TRUE),]
dataSalaryStars <-
dataSalary[which((dataSalary$playerID %in% dataStars) == TRUE),]
The All Star Salary
istYear <- (min(dataSalary$yearID):max(dataSalary$yearID))
sortedFrame <- lapply(listYear, sortYearMean, data = dataSalaryStars)
meanSalaryAllStars <- meanFrame(sortedFrame)
78
dataStars <- dbGetQuery(db, "Select playerID from AllstarFULL")
dataStars <- unique(dataStars)
dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")
dataStars <- dataStars[which(dataStars$playerID %in% unique(dataSalary$playerID) == TRUE),]
dataSalaryNonStars <-
dataSalary[-which((dataSalary$playerID %in% dataStars) == TRUE),]
listYear <- (min(dataSalary$yearID):max(dataSalary$yearID))
sortedFrame <- lapply(listYear, sortYearMean, data = dataSalaryNonStars)
meanSalaryNonStars <- meanFrame(sortedFrame)
dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")
listYear <- (min(dataSalary$yearID):max(dataSalary$yearID))
sortedFrame <- lapply(listYear, sortYearMean, data = dataSalary)
meanSalaryAll <- meanFrame(sortedFrame)
plot(meanSalaryAllStars$yearID,
meanSalaryAllStars$payRoll, type = "l", col = "Gold",
ylim = c(min(meanSalaryNonStars$payRoll),
max(meanSalaryAllStars$payRoll)),
main = "Mean Salary of Baseball Players by AllStars",
ylab = "Yearly Average Salary",
xlab = "Year")
lines(meanSalaryNonStars$yearID, meanSalaryNonStars$payRoll, col = "Blue")
lines(meanSalaryAll$yearID, meanSalaryAll$payRoll, col = "Black")
legend(x = 1985, y = 8000000, fill = c("Gold", "Black", "Blue"),
legend = c("Mean Salary: AllStar Players",
"Mean Salary: All Players", "Mean Salary: Non Allstars Players"))
Salary Disparities: Amongst All Stars and Non Stars
meanSalaryDisparity <- meanSalaryAllStars
meanSalaryDisparity$payRoll <-
(meanSalaryAllStars$payRoll - meanSalaryNonStars$payRoll)
plot(meanSalaryDisparity$yearID, meanSalaryDisparity$payRoll,
pch = 20, col = "Red",
main = "Growth of Salary Disparity: Allstars Vs. Nonstars",
ylab = "Salary Difference", xlab = "Year")
79
lines(meanSalaryDisparity$yearID,
predict(loess(meanSalaryDisparity$payRoll~
meanSalaryDisparity$year)), col = "black", lwd = 2)
legend(x = 1985, y = 5000000, fill = c("Black", "Red"),
legend = c("Fitted Curve", "Observed Distances"))
Salary Disparities: Amongst All Stars
par(mfrow = c(1,1))
groupedSalary <- table(round(dataSalaryStars$salary/1000000))
barplot(groupedSalary, ylab = "Frequency",
xlab = "Salary by Millions",
main = "Distributions of Salaries for All Stars")
dataStars <- dbGetQuery(db, "Select playerID from AllstarFULL")
dataStars <- unique(dataStars)
dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")
dataStars <- dataStars[which(dataStars$playerID %in% unique(dataSalary$playerID) == TRUE),]
dataSalaryStars <-
dataSalary[which((dataSalary$playerID %in% dataStars) == TRUE),]
dFrame <- data.frame(dataSalaryStars$yearID)
dFrame$Salary <- dataSalaryStars$salary
names(dFrame) <- list("yearID", "Salary")
listYear <- min(dFrame$yearID):max(dFrame$yearID)
sortedFrame <- lapply(listYear, sortYearMean, data = dFrame)
salaryFrame <- sortSalaryYearly(sortedFrame,listYear)
plot(salaryFrame$yearID,round(salaryFrame$Salary/1000000),
xlab = "Year", ylab = "Salary in Millions",
main = "Salary of All Stars in Millions")
lines(meanSalaryAllStars$yearID,
round(meanSalaryAllStars$payRoll/1000000),
pch = 20, col = "Red", lwd = 5,)
legend(x = 1985, y = 30, fill = c("Black", "Red"),
legend = c("Salary in Millions", "Mean Salary of All Stars"))
80