a report on baseball using r

81
UC DAVIS FALL STA 141 LANG FINAL PROJECT American Baseball And A Collection of Thoughts Author: Ray Peralta ID: 997579589 December 17 2014

Upload: raymond-christopher-peralta

Post on 20-Mar-2017

82 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Report on Baseball Using R

UC DAVIS

FALL STA 141 LANG

FINAL PROJECT

American BaseballAnd

A Collection of Thoughts

Author:

Ray Peralta

ID: 997579589

December 17 2014

Page 2: A Report on Baseball Using R

1

Page 3: A Report on Baseball Using R

Contents

1 HW 6 61.1 Part 1: Results . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1.1 Part 1 i . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.2 part 1 ii . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Part 2: Results . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.1 Problem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Problem 2 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Problem 3 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.4 Problem 4 . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.5 Problem 5 . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.6 Problem 6 . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.7 Problem 7 . . . . . . . . . . . . . . . . . . . . . . . . . 10

Problem 7A . . . . . . . . . . . . . . . . . . . . . . . . 10Problem 7B . . . . . . . . . . . . . . . . . . . . . . . . 10Problem 7C . . . . . . . . . . . . . . . . . . . . . . . . 20

1.2.8 Problem 8 . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.9 Problem 9 . . . . . . . . . . . . . . . . . . . . . . . . . 20

Problem 9A . . . . . . . . . . . . . . . . . . . . . . . . 20Problem 9B . . . . . . . . . . . . . . . . . . . . . . . . 20Problem 9C . . . . . . . . . . . . . . . . . . . . . . . . 21

1.2.10 Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . 21

2 APPENDIX: Plots for HW6 232.1 Problem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.1 Graphs: Number of Games played by World SeriesWinners and Losers . . . . . . . . . . . . . . . . . . . . 24

2.2 Problem 7C . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2.1 Graphs: Baseball Team Payrolls 1971-2013 . . . . . . . 25

2

Page 4: A Report on Baseball Using R

2.3 Problem 9A . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 Graphs: American League Payroll . . . . . . . . . . . . 262.3.2 Graphs: National League Payroll . . . . . . . . . . . . 27

2.4 Problem 9B . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.1 Graphs:Division ALC Team Payroll . . . . . . . . . . . 282.4.2 Graphs:Division ALE Team Payroll . . . . . . . . . . . 292.4.3 Graphs:Division ALW Team Payroll . . . . . . . . . . . 302.4.4 Graphs:Division NLC Team Payroll . . . . . . . . . . . 312.4.5 Graphs:Division NLE Team Payroll . . . . . . . . . . . 322.4.6 Graphs:Division NLW Team Payroll . . . . . . . . . . . 33

2.5 Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.5.1 Graphs: Distribution of Home Runs from 1875-2013 . . 34

3 APPENDIX: EXPLICIT R CODE 353.1 Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 Part 1i . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1.2 Part 1ii . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 Functions used in Part 2 . . . . . . . . . . . . . . . . . . . . . 383.2.1 sortTeamReturnSalary(dataFrame,name) . . . . . . . . 383.2.2 sortTeamYearly(dataFrame,name) . . . . . . . . . . . . 383.2.3 sortYearlyPay(dataFrame,year) . . . . . . . . . . . . . 393.2.4 frameFill(list) . . . . . . . . . . . . . . . . . . . . . . . 393.2.5 sortTeam . . . . . . . . . . . . . . . . . . . . . . . . . 403.2.6 sortYearMean(dataFrame, year) . . . . . . . . . . . . . 403.2.7 meanFrame(list) . . . . . . . . . . . . . . . . . . . . . . 403.2.8 plotStart(plotData,meanData,listNames) . . . . . . . . 413.2.9 plotInformation(dataFrame) . . . . . . . . . . . . . . . 423.2.10 divisionPlot(division,leagueAL,leagueNL) . . . . . . . . 423.2.11 groupFrame(list) . . . . . . . . . . . . . . . . . . . . . 433.2.12 plot.HR(list) . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.3.1 Problem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 443.3.2 Problem 2 . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.3 Problem 3 . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.4 Problem 4 . . . . . . . . . . . . . . . . . . . . . . . . . 463.3.5 Problem 5 . . . . . . . . . . . . . . . . . . . . . . . . . 463.3.6 Problem 6 . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.7 Problem 7 . . . . . . . . . . . . . . . . . . . . . . . . . 47

3

Page 5: A Report on Baseball Using R

Problem 7A . . . . . . . . . . . . . . . . . . . . . . . . 47Problem 7B . . . . . . . . . . . . . . . . . . . . . . . . 48Problem 7C . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.8 Problem 8 . . . . . . . . . . . . . . . . . . . . . . . . . 483.3.9 Problem 9 . . . . . . . . . . . . . . . . . . . . . . . . . 49

Problem 9A . . . . . . . . . . . . . . . . . . . . . . . . 49Problem 9B . . . . . . . . . . . . . . . . . . . . . . . . 50Problem 9C . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3.10 Problem 10 . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Extra Credit: A Collection of Thoughts 534.1 A Small Quotes from Piazza . . . . . . . . . . . . . . . . . . . 534.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.1 Home Runs and Handedness . . . . . . . . . . . . . . . 53Based on Population: All Available Players . . . . . . . 54Based on Population: Home Run Hitters . . . . . . . . 55

4.2.2 Pitchers: Throwing Home Runs and Handedness . . . . 554.2.3 Errors in the World Series . . . . . . . . . . . . . . . . 564.2.4 Hall of Fame and Salary . . . . . . . . . . . . . . . . . 584.2.5 Positions and Salary . . . . . . . . . . . . . . . . . . . 59

A particular group: Single Position Players . . . . . . . 59Salaries by Position . . . . . . . . . . . . . . . . . . . . 60

4.2.6 All about the All Stars . . . . . . . . . . . . . . . . . . 61The All Star and Non Star Salaries . . . . . . . . . . . 61Salary Disparities: Amongst All Stars and Non Stars . 62Salary Differences: Amongst All Stars . . . . . . . . . 63

4.2.7 The All Star Algorithm . . . . . . . . . . . . . . . . . . 64The Algorithm . . . . . . . . . . . . . . . . . . . . . . 65

5 EXPLICIT:R CODE 665.1 Functions Used in Bonus Section . . . . . . . . . . . . . . . . 66

5.1.1 matchFrame(dataFrame,xFrame) . . . . . . . . . . . . 665.1.2 groupNameReturnHR(dataFrame,name) . . . . . . . . 675.1.3 matchFrameFame(dataFrame,xFrame) . . . . . . . . . 675.1.4 matchUniqueFrame(dataFrame,xFrame) . . . . . . . . 685.1.5 uniquePositions(dataFrame,names) . . . . . . . . . . . 685.1.6 plotPosSalary(uniqueSalary) . . . . . . . . . . . . . . . 695.1.7 plotPosSalaryAll(uniqueSalary) . . . . . . . . . . . . . 71

4

Page 6: A Report on Baseball Using R

5.1.8 matchNonUniqueFrame(dataFrame,xFrame) . . . . . . 735.1.9 sortSalaryYearly . . . . . . . . . . . . . . . . . . . . . 735.1.10 Home Runs and Handedness . . . . . . . . . . . . . . . 74

Based on Population: All Available Players . . . . . . . 75Based on Population: Home Run Hitters . . . . . . . . 75

5.1.11 Throwing Home Runs and Handedness . . . . . . . . . 765.1.12 Errors in the World Series . . . . . . . . . . . . . . . . 775.1.13 Hall of Fame and Salary . . . . . . . . . . . . . . . . . 775.1.14 Positions and Salary . . . . . . . . . . . . . . . . . . . 78

A particular group: Single Position Players . . . . . . . 78Salaries by Position . . . . . . . . . . . . . . . . . . . . 78

5.1.15 All about the All Stars . . . . . . . . . . . . . . . . . . 78The All Star Salary . . . . . . . . . . . . . . . . . . . . 78Salary Disparities: Amongst All Stars and Non Stars . 79Salary Disparities: Amongst All Stars . . . . . . . . . . 80

5

Page 7: A Report on Baseball Using R

Chapter 1

HW 6

1.1 Part 1: Results

1.1.1 Part 1 i

SHELL RESULTS:

1263519 total Outbound Flights

468200 LAX Outbound Flights

367914 SFO Outbound Flights

251290 JFK Outbound Flights

89821 OAK Outbound Flights

86294 SMF Outbound Flights

Running Time: 0m22.633s

R RESULTS:

584916 total Outbound Flights

222029 LAX Outbound Flights

169734 SFO Outbound Flights

105097 JFK Outbound Flights

44911 OAK Outbound Flights

43145 SMF Outbound Flights

Running Time: 6m13.820s

2

1.1.2 part 1 ii

Results:

TOTAL ORIGIN Out-Bound & In-Bound

12489332

468211 LAX Out-Bound & In-Bound Flights

241228 JFK Out-Bound & In-Bound Flights

362781 SFO Out-Bound & In-Bound Flights

89819 OAK Out-Bound & In-Bound Flights

6

Page 8: A Report on Baseball Using R

86293 SMF Out-Bound & In-Bound Flights

Shell Running Time: 0m47.440s

R Running Time: 1m13.290s

total time: 2m00.730s

1.2 Part 2: Results

1.2.1 Problem 1

1. What years does the data cover? are there data for each of these years?

[1] 1871 2013

The database covers the years 1871 to 2013, there exists data for each of

these years concerning some factor. For example the Pitching table has

data ranging from 1871 to 2013, but the Salaries data only ranges from

the year 1985 to 2013, so there will be missing data regarding some factors

for some years.

1.2.2 Problem 2

2. How many (unique) people are included in the database?

How many are players, managers, etc?

[1] 682

[1] 18354

[1] 19036

There are 682 UNIQUE managers and 18354 UNIQUE Baseball Players.

There is a grand total of 19036 UNIQUE people.

1.2.3 Problem 3

3. What team won the World Series in 2000?

2334 Y NYA 2000

The team that won was NYA.

1.2.4 Problem 4

4. What teams lost the World Series each year?

7

Page 9: A Report on Baseball Using R

dataSub.teamID dataSub.yearID

1 NY4 1884

2 SL4 1885

3 CHN 1885

4 CHN 1886

5 SL4 1887

6 SL4 1888

7 BR3 1889

8 LS2 1890

9 BRO 1890

10 PIT 1903

11 PHA 1905

12 CHN 1906

13 DET 1907

14 DET 1908

15 DET 1909

16 CHN 1910

17 NY1 1911

18 NY1 1912

19 NY1 1913

20 PHA 1914

21 PHI 1915

22 BRO 1916

23 NY1 1917

24 CHN 1918

25 CHA 1919

26 BRO 1920

27 NYA 1921

28 NYA 1922

29 NY1 1923

30 NY1 1924

31 WS1 1925

32 NYA 1926

33 PIT 1927

34 SLN 1928

35 CHN 1929

36 SLN 1930

37 PHA 1931

38 CHN 1932

39 WS1 1933

40 DET 1934

41 CHN 1935

42 NY1 1936

43 NY1 1937

44 CHN 1938

45 CIN 1939

46 DET 1940

47 BRO 1941

48 NYA 1942

49 SLN 1943

50 SLA 1944

51 CHN 1945

52 BOS 1946

53 BRO 1947

54 BSN 1948

55 BRO 1949

56 PHI 1950

57 NY1 1951

58 BRO 1952

59 BRO 1953

60 CLE 1954

61 NYA 1955

62 BRO 1956

63 NYA 1957

64 ML1 1958

65 CHA 1959

66 NYA 1960

67 CIN 1961

68 SFN 1962

69 NYA 1963

70 NYA 1964

71 MIN 1965

72 LAN 1966

73 BOS 1967

74 SLN 1968

75 BAL 1969

76 CIN 1970

77 BAL 1971

78 CIN 1972

79 NYN 1973

8

Page 10: A Report on Baseball Using R

80 LAN 1974

81 BOS 1975

82 NYA 1976

83 LAN 1977

84 LAN 1978

85 BAL 1979

86 KCA 1980

87 NYA 1981

88 ML4 1982

89 PHI 1983

90 SDN 1984

91 SLN 1985

92 BOS 1986

93 SLN 1987

94 OAK 1988

95 SFN 1989

96 OAK 1990

97 ATL 1991

98 ATL 1992

99 PHI 1993

100 CLE 1995

101 ATL 1996

102 CLE 1997

103 SDN 1998

104 ATL 1999

105 NYN 2000

106 NYA 2001

107 SFN 2002

108 NYA 2003

109 SLN 2004

110 HOU 2005

111 DET 2006

112 COL 2007

113 TBA 2008

114 PHI 2009

115 TEX 2010

116 TEX 2011

117 DET 2012

118 SLN 2013

1.2.5 Problem 5

#5. Do you see a relationship between the number of games won in a season

and winning the World Series?

*To the reader: Please refer to appendix for Plots.

It would seem that from 1984 the trend has been that teams typically

winning a large number of games wins the world series.However, to check

this trend we should also look at the teams that DID not win the world

series. Based on the graphs, we can clearly observe a similarity

between the two groups, so we cannot say that winning more games

during playoffs will decide who wins a World Series.

1.2.6 Problem 6

6. In 2003, what were the three highest salaries? (We refer here to unique

9

Page 11: A Report on Baseball Using R

salaries, i.e., more than one player might be paid one of these

salaries.)

[1] 22000000 20000000 18700000

The Highest 3 salaries are: $22,000,000 and $20,000,000 and $18,700,000

(Sidenote: Wow. . .that is quite ridiculous)

1.2.7 Problem 7

Problem 7A

A) For 1999, compute the total payroll

of each of the different teams.

[1] "ANA" "55388166"

[1] "ARI" "68703999"

[1] "ATL" "73140000"

[1] "BAL" "80605863"

[1] "BOS" "63497500"

[1] "CHA" "25620000"

[1] "CHN" "62343000"

[1] "CIN" "33962761"

[1] "CLE" "72978462"

[1] "COL" "61935837"

[1] "DET" "36489666"

[1] "FLO" "21085000"

[1] "HOU" "54914000"

[1] "KCA" "26225000"

[1] "LAN" "80862453"

[1] "MIL" "43377395"

[1] "MIN" "21257500"

[1] "MON" "17903000"

[1] "NYA" "86734359"

[1] "NYN" "65092092"

[1] "OAK" "24431833"

[1] "PHI" "31692500"

[1] "PIT" "24697666"

[1] "SDN" "49768179"

[1] "SEA" "54125003"

[1] "SFN" "46595057"

[1] "SLN" "49778195"

[1] "TBA" "38870000"

[1] "TEX" "76709931"

[1] "TOR" "45444333"

Problem 7B

b) Next compute the team payrolls for all years in the database for

which we have salary information.

[1] "ATL" "1985" "14807000"

[1] "ATL" "1986" "17102786"

[1] "ATL" "1987" "16544560"

[1] "ATL" "1988" "12728174"

[1] "ATL" "1989" "11112334"

[1] "ATL" "1990" "14555501"

[1] "ATL" "1991" "18403500"

[1] "ATL" "1992" "34625333"

10

Page 12: A Report on Baseball Using R

[1] "ATL" "1993" "41641417"

[1] "ATL" "1994" "49383513"

[1] "ATL" "1995" "47235445"

[1] "ATL" "1996" "49698500"

[1] "ATL" "1997" "52278500"

[1] "ATL" "1998" "61186000"

[1] "ATL" "1999" "73140000"

[1] "ATL" "2000" "84537836"

[1] "ATL" "2001" "91936166"

[1] "ATL" "2002" "92870367"

[1] "ATL" "2003" "106243667"

[1] "ATL" "2004" "90182500"

[1] "ATL" "2005" "86457302"

[1] "ATL" "2006" "90156876"

[1] "ATL" "2007" "87290833"

[1] "ATL" "2008" "102365683"

[1] "ATL" "2009" "96726166"

[1] "ATL" "2010" "84423666"

[1] "ATL" "2011" "87002692"

[1] "ATL" "2012" "82829942"

[1] "ATL" "2013" "87871525"

[1] "BAL" "1985" "11560712"

[1] "BAL" "1986" "13001258"

[1] "BAL" "1987" "13900273"

[1] "BAL" "1988" "13532075"

[1] "BAL" "1989" "8275167"

[1] "BAL" "1990" "9680084"

[1] "BAL" "1991" "17519000"

[1] "BAL" "1992" "23780667"

[1] "BAL" "1993" "29096500"

[1] "BAL" "1994" "38849769"

[1] "BAL" "1995" "43942521"

[1] "BAL" "1996" "54490315"

[1] "BAL" "1997" "58516400"

[1] "BAL" "1998" "72355634"

[1] "BAL" "1999" "80605863"

[1] "BAL" "2000" "81447435"

[1] "BAL" "2001" "67599540"

[1] "BAL" "2002" "60493487"

[1] "BAL" "2003" "73877500"

[1] "BAL" "2004" "51623333"

[1] "BAL" "2005" "73914333"

[1] "BAL" "2006" "72585582"

[1] "BAL" "2007" "93174808"

[1] "BAL" "2008" "67196246"

[1] "BAL" "2009" "67101666"

[1] "BAL" "2010" "81612500"

[1] "BAL" "2011" "85304038"

[1] "BAL" "2012" "77353999"

[1] "BAL" "2013" "84393333"

[1] "BOS" "1985" "10897560"

[1] "BOS" "1986" "14402239"

[1] "BOS" "1987" "10144167"

[1] "BOS" "1988" "13896092"

[1] "BOS" "1989" "17481748"

[1] "BOS" "1990" "20558333"

[1] "BOS" "1991" "35167500"

[1] "BOS" "1992" "43610584"

[1] "BOS" "1993" "37120583"

[1] "BOS" "1994" "37859084"

[1] "BOS" "1995" "32455518"

[1] "BOS" "1996" "42393500"

[1] "BOS" "1997" "43558750"

[1] "BOS" "1998" "56757000"

[1] "BOS" "1999" "63497500"

[1] "BOS" "2000" "77940333"

[1] "BOS" "2001" "110035833"

[1] "BOS" "2002" "108366060"

[1] "BOS" "2003" "99946500"

[1] "BOS" "2004" "127298500"

[1] "BOS" "2005" "123505125"

[1] "BOS" "2006" "120099824"

[1] "BOS" "2007" "143026214"

[1] "BOS" "2008" "133390035"

[1] "BOS" "2009" "121345999"

[1] "BOS" "2010" "162447333"

[1] "BOS" "2011" "161762475"

[1] "BOS" "2012" "173186617"

[1] "BOS" "2013" "151530000"

[1] "CAL" "1985" "14427894"

[1] "CAL" "1986" "14427258"

[1] "CAL" "1987" "12843499"

[1] "CAL" "1988" "11947388"

[1] "CAL" "1989" "15097833"

[1] "CAL" "1990" "21720000"

[1] "CAL" "1991" "33060001"

[1] "CAL" "1992" "34749334"

[1] "CAL" "1993" "28588334"

[1] "CAL" "1994" "25156218"

[1] "CAL" "1995" "31223171"

11

Page 13: A Report on Baseball Using R

[1] "CAL" "1996" "28738000"

[1] "CHA" "1985" "9846178"

[1] "CHA" "1986" "10418819"

[1] "CHA" "1987" "10641843"

[1] "CHA" "1988" "6390000"

[1] "CHA" "1989" "7265410"

[1] "CHA" "1990" "9491500"

[1] "CHA" "1991" "16919667"

[1] "CHA" "1992" "30160833"

[1] "CHA" "1993" "39696166"

[1] "CHA" "1994" "39183836"

[1] "CHA" "1995" "46961282"

[1] "CHA" "1996" "45139500"

[1] "CHA" "1997" "57740000"

[1] "CHA" "1998" "38335000"

[1] "CHA" "1999" "25620000"

[1] "CHA" "2000" "31133500"

[1] "CHA" "2001" "65653667"

[1] "CHA" "2002" "57052833"

[1] "CHA" "2003" "51010000"

[1] "CHA" "2004" "65212500"

[1] "CHA" "2005" "75178000"

[1] "CHA" "2006" "102750667"

[1] "CHA" "2007" "108671833"

[1] "CHA" "2008" "121189332"

[1] "CHA" "2009" "96068500"

[1] "CHA" "2010" "105530000"

[1] "CHA" "2011" "127789000"

[1] "CHA" "2012" "96919500"

[1] "CHA" "2013" "120065277"

[1] "CHN" "1985" "12702917"

[1] "CHN" "1986" "17208165"

[1] "CHN" "1987" "14307999"

[1] "CHN" "1988" "13119198"

[1] "CHN" "1989" "10668000"

[1] "CHN" "1990" "13624000"

[1] "CHN" "1991" "23175667"

[1] "CHN" "1992" "29829686"

[1] "CHN" "1993" "39386666"

[1] "CHN" "1994" "36287333"

[1] "CHN" "1995" "29505834"

[1] "CHN" "1996" "33081000"

[1] "CHN" "1997" "42155333"

[1] "CHN" "1998" "50838000"

[1] "CHN" "1999" "62343000"

[1] "CHN" "2000" "60539333"

[1] "CHN" "2001" "64715833"

[1] "CHN" "2002" "75690833"

[1] "CHN" "2003" "79868333"

[1] "CHN" "2004" "90560000"

[1] "CHN" "2005" "87032933"

[1] "CHN" "2006" "94424499"

[1] "CHN" "2007" "99670332"

[1] "CHN" "2008" "118345833"

[1] "CHN" "2009" "134809000"

[1] "CHN" "2010" "146609000"

[1] "CHN" "2011" "125047329"

[1] "CHN" "2012" "88197033"

[1] "CHN" "2013" "100567726"

[1] "CIN" "1985" "8359917"

[1] "CIN" "1986" "11906388"

[1] "CIN" "1987" "9281500"

[1] "CIN" "1988" "8888409"

[1] "CIN" "1989" "11072000"

[1] "CIN" "1990" "14370000"

[1] "CIN" "1991" "26305333"

[1] "CIN" "1992" "35931499"

[1] "CIN" "1993" "44879666"

[1] "CIN" "1994" "40961833"

[1] "CIN" "1995" "43144670"

[1] "CIN" "1996" "42526334"

[1] "CIN" "1997" "49768000"

[1] "CIN" "1998" "23005000"

[1] "CIN" "1999" "33962761"

[1] "CIN" "2000" "46867200"

[1] "CIN" "2001" "48986000"

[1] "CIN" "2002" "45050390"

[1] "CIN" "2003" "59355667"

[1] "CIN" "2004" "46615250"

[1] "CIN" "2005" "61892583"

[1] "CIN" "2006" "60909519"

[1] "CIN" "2007" "68524980"

[1] "CIN" "2008" "74117695"

[1] "CIN" "2009" "73558500"

[1] "CIN" "2010" "71761542"

[1] "CIN" "2011" "75947134"

[1] "CIN" "2012" "82203616"

[1] "CIN" "2013" "106404462"

[1] "CLE" "1985" "6551666"

[1] "CLE" "1986" "7809500"

12

Page 14: A Report on Baseball Using R

[1] "CLE" "1987" "8513750"

[1] "CLE" "1988" "8936500"

[1] "CLE" "1989" "9094500"

[1] "CLE" "1990" "14487000"

[1] "CLE" "1991" "17635000"

[1] "CLE" "1992" "9373044"

[1] "CLE" "1993" "18561000"

[1] "CLE" "1994" "30490500"

[1] "CLE" "1995" "37937835"

[1] "CLE" "1996" "48107360"

[1] "CLE" "1997" "56802460"

[1] "CLE" "1998" "60800166"

[1] "CLE" "1999" "72978462"

[1] "CLE" "2000" "75880771"

[1] "CLE" "2001" "93152001"

[1] "CLE" "2002" "78909449"

[1] "CLE" "2003" "48584834"

[1] "CLE" "2004" "34319300"

[1] "CLE" "2005" "41502500"

[1] "CLE" "2006" "56031500"

[1] "CLE" "2007" "61673267"

[1] "CLE" "2008" "78970066"

[1] "CLE" "2009" "81579166"

[1] "CLE" "2010" "61203966"

[1] "CLE" "2011" "48776566"

[1] "CLE" "2012" "78430300"

[1] "CLE" "2013" "75771800"

[1] "DET" "1985" "10348143"

[1] "DET" "1986" "12335714"

[1] "DET" "1987" "12122881"

[1] "DET" "1988" "12869571"

[1] "DET" "1989" "15146404"

[1] "DET" "1990" "17593238"

[1] "DET" "1991" "23838333"

[1] "DET" "1992" "27322834"

[1] "DET" "1993" "38150165"

[1] "DET" "1994" "41446501"

[1] "DET" "1995" "37044168"

[1] "DET" "1996" "23438000"

[1] "DET" "1997" "17272000"

[1] "DET" "1998" "24065000"

[1] "DET" "1999" "36489666"

[1] "DET" "2000" "58265167"

[1] "DET" "2001" "53416167"

[1] "DET" "2002" "55048000"

[1] "DET" "2003" "49168000"

[1] "DET" "2004" "46832000"

[1] "DET" "2005" "69092000"

[1] "DET" "2006" "82612866"

[1] "DET" "2007" "94800369"

[1] "DET" "2008" "137685196"

[1] "DET" "2009" "115085145"

[1] "DET" "2010" "122864928"

[1] "DET" "2011" "105700231"

[1] "DET" "2012" "132300000"

[1] "DET" "2013" "145989500"

[1] "HOU" "1985" "9993051"

[1] "HOU" "1986" "9873276"

[1] "HOU" "1987" "12608371"

[1] "HOU" "1988" "12286167"

[1] "HOU" "1989" "15029500"

[1] "HOU" "1990" "18330000"

[1] "HOU" "1991" "12852500"

[1] "HOU" "1992" "15407500"

[1] "HOU" "1993" "30210500"

[1] "HOU" "1994" "33126000"

[1] "HOU" "1995" "34169834"

[1] "HOU" "1996" "28487000"

[1] "HOU" "1997" "34777500"

[1] "HOU" "1998" "42374000"

[1] "HOU" "1999" "54914000"

[1] "HOU" "2000" "51289111"

[1] "HOU" "2001" "60612667"

[1] "HOU" "2002" "63448417"

[1] "HOU" "2003" "71040000"

[1] "HOU" "2004" "75397000"

[1] "HOU" "2005" "76779000"

[1] "HOU" "2006" "88694435"

[1] "HOU" "2007" "87759000"

[1] "HOU" "2008" "88930414"

[1] "HOU" "2009" "102996414"

[1] "HOU" "2010" "92355500"

[1] "HOU" "2011" "70694000"

[1] "HOU" "2012" "60651000"

[1] "HOU" "2013" "17890700"

[1] "KCA" "1985" "9321179"

[1] "KCA" "1986" "13043698"

[1] "KCA" "1987" "11828056"

[1] "KCA" "1988" "14556562"

[1] "KCA" "1989" "18683568"

13

Page 15: A Report on Baseball Using R

[1] "KCA" "1990" "23361084"

[1] "KCA" "1991" "26319834"

[1] "KCA" "1992" "33893834"

[1] "KCA" "1993" "41346167"

[1] "KCA" "1994" "40541334"

[1] "KCA" "1995" "29532834"

[1] "KCA" "1996" "20281250"

[1] "KCA" "1997" "34655000"

[1] "KCA" "1998" "36862500"

[1] "KCA" "1999" "26225000"

[1] "KCA" "2000" "23433000"

[1] "KCA" "2001" "35422500"

[1] "KCA" "2002" "47257000"

[1] "KCA" "2003" "40518000"

[1] "KCA" "2004" "47609000"

[1] "KCA" "2005" "36881000"

[1] "KCA" "2006" "47294000"

[1] "KCA" "2007" "67116500"

[1] "KCA" "2008" "58245500"

[1] "KCA" "2009" "70519333"

[1] "KCA" "2010" "71405210"

[1] "KCA" "2011" "35712000"

[1] "KCA" "2012" "60916225"

[1] "KCA" "2013" "80091725"

[1] "LAN" "1985" "10967917"

[1] "LAN" "1986" "14913776"

[1] "LAN" "1987" "13675403"

[1] "LAN" "1988" "16850515"

[1] "LAN" "1989" "21071562"

[1] "LAN" "1990" "21318704"

[1] "LAN" "1991" "32790664"

[1] "LAN" "1992" "44788166"

[1] "LAN" "1993" "39331999"

[1] "LAN" "1994" "38000001"

[1] "LAN" "1995" "39273201"

[1] "LAN" "1996" "35355000"

[1] "LAN" "1997" "45380304"

[1] "LAN" "1998" "48820000"

[1] "LAN" "1999" "80862453"

[1] "LAN" "2000" "87924286"

[1] "LAN" "2001" "109105953"

[1] "LAN" "2002" "94850953"

[1] "LAN" "2003" "105572620"

[1] "LAN" "2004" "92902001"

[1] "LAN" "2005" "83039000"

[1] "LAN" "2006" "98447187"

[1] "LAN" "2007" "108454524"

[1] "LAN" "2008" "118588536"

[1] "LAN" "2009" "100414592"

[1] "LAN" "2010" "95358016"

[1] "LAN" "2011" "104188999"

[1] "LAN" "2012" "95143575"

[1] "LAN" "2013" "223362196"

[1] "MIN" "1985" "5764821"

[1] "MIN" "1986" "8748167"

[1] "MIN" "1987" "6397500"

[1] "MIN" "1988" "12462666"

[1] "MIN" "1989" "15531666"

[1] "MIN" "1990" "14602000"

[1] "MIN" "1991" "23361833"

[1] "MIN" "1992" "28027834"

[1] "MIN" "1993" "28217933"

[1] "MIN" "1994" "28438500"

[1] "MIN" "1995" "25410500"

[1] "MIN" "1996" "23117000"

[1] "MIN" "1997" "34072500"

[1] "MIN" "1998" "27927500"

[1] "MIN" "1999" "21257500"

[1] "MIN" "2000" "16519500"

[1] "MIN" "2001" "24130000"

[1] "MIN" "2002" "40425000"

[1] "MIN" "2003" "55505000"

[1] "MIN" "2004" "53585000"

[1] "MIN" "2005" "56186000"

[1] "MIN" "2006" "63396006"

[1] "MIN" "2007" "71439500"

[1] "MIN" "2008" "56932766"

[1] "MIN" "2009" "65299266"

[1] "MIN" "2010" "97559166"

[1] "MIN" "2011" "112737000"

[1] "MIN" "2012" "94085000"

[1] "MIN" "2013" "75337500"

[1] "ML4" "1985" "11284107"

[1] "ML4" "1986" "9943642"

[1] "ML4" "1987" "7293224"

[1] "ML4" "1988" "8402000"

[1] "ML4" "1989" "11533000"

[1] "ML4" "1990" "19719167"

[1] "ML4" "1991" "23115500"

[1] "ML4" "1992" "31013667"

14

Page 16: A Report on Baseball Using R

[1] "ML4" "1993" "23806834"

[1] "ML4" "1994" "24350500"

[1] "ML4" "1995" "17798825"

[1] "ML4" "1996" "21730000"

[1] "ML4" "1997" "23655338"

[1] "MON" "1985" "9470166"

[1] "MON" "1986" "11103600"

[1] "MON" "1987" "6942052"

[1] "MON" "1988" "9603333"

[1] "MON" "1989" "13807389"

[1] "MON" "1990" "16586388"

[1] "MON" "1991" "10732333"

[1] "MON" "1992" "15822334"

[1] "MON" "1993" "18899333"

[1] "MON" "1994" "19098000"

[1] "MON" "1995" "12364000"

[1] "MON" "1996" "16264500"

[1] "MON" "1997" "19295500"

[1] "MON" "1998" "10641500"

[1] "MON" "1999" "17903000"

[1] "MON" "2000" "32994333"

[1] "MON" "2001" "35159500"

[1] "MON" "2002" "38670500"

[1] "MON" "2003" "51948500"

[1] "MON" "2004" "40897500"

[1] "NYA" "1985" "14238204"

[1] "NYA" "1986" "18494253"

[1] "NYA" "1987" "17099714"

[1] "NYA" "1988" "19441152"

[1] "NYA" "1989" "17114375"

[1] "NYA" "1990" "20912318"

[1] "NYA" "1991" "27344168"

[1] "NYA" "1992" "37543334"

[1] "NYA" "1993" "42624900"

[1] "NYA" "1994" "45731334"

[1] "NYA" "1995" "48874851"

[1] "NYA" "1996" "54191792"

[1] "NYA" "1997" "62241545"

[1] "NYA" "1998" "66806867"

[1] "NYA" "1999" "86734359"

[1] "NYA" "2000" "92338260"

[1] "NYA" "2001" "112287143"

[1] "NYA" "2002" "125928583"

[1] "NYA" "2003" "152749814"

[1] "NYA" "2004" "184193950"

[1] "NYA" "2005" "208306817"

[1] "NYA" "2006" "194663079"

[1] "NYA" "2007" "189259045"

[1] "NYA" "2008" "207896789"

[1] "NYA" "2009" "201449189"

[1] "NYA" "2010" "206333389"

[1] "NYA" "2011" "202275028"

[1] "NYA" "2012" "196522289"

[1] "NYA" "2013" "231978886"

[1] "NYN" "1985" "10834762"

[1] "NYN" "1986" "15393714"

[1] "NYN" "1987" "13846714"

[1] "NYN" "1988" "15269314"

[1] "NYN" "1989" "19885071"

[1] "NYN" "1990" "21722834"

[1] "NYN" "1991" "32590001"

[1] "NYN" "1992" "44602002"

[1] "NYN" "1993" "39043667"

[1] "NYN" "1994" "30956583"

[1] "NYN" "1995" "27674992"

[1] "NYN" "1996" "24479500"

[1] "NYN" "1997" "39800400"

[1] "NYN" "1998" "52077999"

[1] "NYN" "1999" "65092092"

[1] "NYN" "2000" "79509776"

[1] "NYN" "2001" "93174428"

[1] "NYN" "2002" "94633593"

[1] "NYN" "2003" "116876429"

[1] "NYN" "2004" "96660970"

[1] "NYN" "2005" "101305821"

[1] "NYN" "2006" "101084963"

[1] "NYN" "2007" "115231663"

[1] "NYN" "2008" "137793376"

[1] "NYN" "2009" "149373987"

[1] "NYN" "2010" "134422942"

[1] "NYN" "2011" "118847309"

[1] "NYN" "2012" "93353983"

[1] "NYN" "2013" "49448346"

[1] "OAK" "1985" "9058606"

[1] "OAK" "1986" "9779421"

[1] "OAK" "1987" "11680839"

[1] "OAK" "1988" "9690000"

[1] "OAK" "1989" "15613070"

[1] "OAK" "1990" "19887501"

[1] "OAK" "1991" "36999167"

15

Page 17: A Report on Baseball Using R

[1] "OAK" "1992" "41035000"

[1] "OAK" "1993" "37812333"

[1] "OAK" "1994" "34172500"

[1] "OAK" "1995" "37739225"

[1] "OAK" "1996" "21243000"

[1] "OAK" "1997" "24018500"

[1] "OAK" "1998" "21303000"

[1] "OAK" "1999" "24431833"

[1] "OAK" "2000" "31971333"

[1] "OAK" "2001" "33810750"

[1] "OAK" "2002" "40004167"

[1] "OAK" "2003" "50260834"

[1] "OAK" "2004" "59425667"

[1] "OAK" "2005" "55425762"

[1] "OAK" "2006" "62243079"

[1] "OAK" "2007" "79366940"

[1] "OAK" "2008" "47967126"

[1] "OAK" "2009" "61910000"

[1] "OAK" "2010" "55254900"

[1] "OAK" "2011" "66536500"

[1] "OAK" "2012" "55372500"

[1] "OAK" "2013" "60132500"

[1] "PHI" "1985" "10124966"

[1] "PHI" "1986" "11590166"

[1] "PHI" "1987" "11514233"

[1] "PHI" "1988" "13838000"

[1] "PHI" "1989" "10604000"

[1] "PHI" "1990" "13173667"

[1] "PHI" "1991" "22487332"

[1] "PHI" "1992" "24383834"

[1] "PHI" "1993" "28538334"

[1] "PHI" "1994" "31599000"

[1] "PHI" "1995" "30555945"

[1] "PHI" "1996" "34314500"

[1] "PHI" "1997" "36656500"

[1] "PHI" "1998" "36297500"

[1] "PHI" "1999" "31692500"

[1] "PHI" "2000" "47308000"

[1] "PHI" "2001" "41663833"

[1] "PHI" "2002" "57954999"

[1] "PHI" "2003" "70780000"

[1] "PHI" "2004" "92919167"

[1] "PHI" "2005" "95522000"

[1] "PHI" "2006" "88273333"

[1] "PHI" "2007" "89428213"

[1] "PHI" "2008" "97879880"

[1] "PHI" "2009 "113004046"

[1] "PHI" "2010 "141928379"

[1] "PHI" "2011" "172976379"

[1] "PHI" "2012" "174538938"

[1] "PHI" "2013" "169863189"

[1] "PIT" "1985" "9227500"

[1] "PIT" "1986" "10843500"

[1] "PIT" "1987" "7652000"

[1] "PIT" "1988" "5998500"

[1] "PIT" "1989" "12737500"

[1] "PIT" "1990" "15556000"

[1] "PIT" "1991" "23634667"

[1] "PIT" "1992" "33944167"

[1] "PIT" "1993" "24822467"

[1] "PIT" "1994" "24217250"

[1] "PIT" "1995" "18355345"

[1] "PIT" "1996" "23017500"

[1] "PIT" "1997" "10771667"

[1] "PIT" "1998" "15065000"

[1] "PIT" "1999" "24697666"

[1] "PIT" "2000" "28928334"

[1] "PIT" "2001" "57760833"

[1] "PIT" "2002" "42323599"

[1] "PIT" "2003" "54812429"

[1] "PIT" "2004" "32227929"

[1] "PIT" "2005" "38133000"

[1] "PIT" "2006" "46717750"

[1] "PIT" "2007" "38537833"

[1] "PIT" "2008" "48689783"

[1] "PIT" "2009" "48693000"

[1] "PIT" "2010" "34943000"

[1] "PIT" "2011" "45047000"

[1] "PIT" "2012" "62951999"

[1] "PIT" "2013" "77062000"

[1] "SDN" "1985" "11036583"

[1] "SDN" "1986" "11380693"

[1] "SDN" "1987" "11065796"

[1] "SDN" "1988" "9561002"

[1] "SDN" "1989" "14195000"

[1] "SDN" "1990" "17588334"

[1] "SDN" "1991" "22150001"

[1] "SDN" "1992" "26854167"

[1] "SDN" "1993" "25511333"

[1] "SDN" "1994" "14916333"

16

Page 18: A Report on Baseball Using R

[1] "SDN" "1995" "26382334"

[1] "SDN" "1996" "28348172"

[1] "SDN" "1997" "37363672"

[1] "SDN" "1998" "46861500"

[1] "SDN" "1999" "49768179"

[1] "SDN" "2000" "54821000"

[1] "SDN" "2001" "39182833"

[1] "SDN" "2002" "41425000"

[1] "SDN" "2003" "45210000"

[1] "SDN" "2004" "55384833"

[1] "SDN" "2005" "63290833"

[1] "SDN" "2006" "69896141"

[1] "SDN" "2007" "58110567"

[1] "SDN" "2008" "73677616"

[1] "SDN" "2009" "43333700"

[1] "SDN" "2010" "37799300"

[1] "SDN" "2011" "45869140"

[1] "SDN" "2012" "55244700"

[1] "SDN" "2013" "65585500"

[1] "SEA" "1985" "4613000"

[1] "SEA" "1986" "5958309"

[1] "SEA" "1987" "2263500"

[1] "SEA" "1988" "7342450"

[1] "SEA" "1989" "9779500"

[1] "SEA" "1990" "12553667"

[1] "SEA" "1991" "15691833"

[1] "SEA" "1992" "23179833"

[1] "SEA" "1993" "32696333"

[1] "SEA" "1994" "29228500"

[1] "SEA" "1995" "36481311"

[1] "SEA" "1996" "41328501"

[1] "SEA" "1997" "41540661"

[1] "SEA" "1998" "54087036"

[1] "SEA" "1999" "54125003"

[1] "SEA" "2000" "58915000"

[1] "SEA" "2001" "74720834"

[1] "SEA" "2002" "80282668"

[1] "SEA" "2003" "86959167"

[1] "SEA" "2004" "81515834"

[1] "SEA" "2005" "87754334"

[1] "SEA" "2006" "87959833"

[1] "SEA" "2007" "106460833"

[1] "SEA" "2008" "117666482"

[1] "SEA" "2009" "98904166"

[1] "SEA" "2010" "86510000"

[1] "SEA" "2011" "86110600"

[1] "SEA" "2012" "81978100"

[1] "SEA" "2013" "74005043"

[1] "SFN" "1985" "8221714"

[1] "SFN" "1986" "8947000"

[1] "SFN" "1987" "7290000"

[1] "SFN" "1988" "12380000"

[1] "SFN" "1989" "14962834"

[1] "SFN" "1990" "19335333"

[1] "SFN" "1991" "30967666"

[1] "SFN" "1992" "33163168"

[1] "SFN" "1993" "35050000"

[1] "SFN" "1994" "42638666"

[1] "SFN" "1995" "36462777"

[1] "SFN" "1996" "37144725"

[1] "SFN" "1997" "35592378"

[1] "SFN" "1998" "42565834"

[1] "SFN" "1999" "46595057"

[1] "SFN" "2000" "53737826"

[1] "SFN" "2001" "63280167"

[1] "SFN" "2002" "78299835"

[1] "SFN" "2003" "82852167"

[1] "SFN" "2004" "82019166"

[1] "SFN" "2005" "90199500"

[1] "SFN" "2006" "90056419"

[1] "SFN" "2007" "90219056"

[1] "SFN" "2008" "76594500"

[1] "SFN" "2009" "83026450"

[1] "SFN" "2010" "98641333"

[1] "SFN" "2011" "118198333"

[1] "SFN" "2012" "117620683"

[1] "SFN" "2013" "140180334"

[1] "SLN" "1985" "11817083"

[1] "SLN" "1986" "9875010"

[1] "SLN" "1987" "11758000"

[1] "SLN" "1988" "12880000"

[1] "SLN" "1989" "16078833"

[1] "SLN" "1990" "20523334"

[1] "SLN" "1991" "21860001"

[1] "SLN" "1992" "27583836"

[1] "SLN" "1993" "23367334"

[1] "SLN" "1994" "29275601"

[1] "SLN" "1995" "37101000"

[1] "SLN" "1996" "40269667"

[1] "SLN" "1997" "45456667"

17

Page 19: A Report on Baseball Using R

[1] "SLN" "1998" "54672521"

[1] "SLN" "1999" "49778195"

[1] "SLN" "2000" "61453863"

[1] "SLN" "2001" "78538333"

[1] "SLN" "2002" "74660875"

[1] "SLN" "2003" "83786666"

[1] "SLN" "2004" "83228333"

[1] "SLN" "2005" "92106833"

[1] "SLN" "2006" "88891371"

[1] "SLN" "2007" "90286823"

[1] "SLN" "2008" "99624449"

[1] "SLN" "2009" "88528409"

[1] "SLN" "2010" "93540751"

[1] "SLN" "2011" "105433572"

[1] "SLN" "2012" "110300862"

[1] "SLN" "2013" "92260110"

[1] "TEX" "1985" "7676500"

[1] "TEX" "1986" "6743119"

[1] "TEX" "1987" "880000"

[1] "TEX" "1988" "5342131"

[1] "TEX" "1989" "11893781"

[1] "TEX" "1990" "14874372"

[1] "TEX" "1991" "18224500"

[1] "TEX" "1992" "30128167"

[1] "TEX" "1993" "36376959"

[1] "TEX" "1994" "32973597"

[1] "TEX" "1995" "34581451"

[1] "TEX" "1996" "39041528"

[1] "TEX" "1997" "53448838"

[1] "TEX" "1998" "56572095"

[1] "TEX" "1999" "76709931"

[1] "TEX" "2000" "70795921"

[1] "TEX" "2001" "88633500"

[1] "TEX" "2002" "105526122"

[1] "TEX" "2003" "103491667"

[1] "TEX" "2004" "55050417"

[1] "TEX" "2005" "55849000"

[1] "TEX" "2006" "68228662"

[1] "TEX" "2007" "68318675"

[1] "TEX" "2008" "67712326"

[1] "TEX" "2009" "68178798"

[1] "TEX" "2010" "55250544"

[1] "TEX" "2011" "92299264"

[1] "TEX" "2012" "120510974"

[1] "TEX" "2013" "112522600"

[1] "TOR" "1985" "8812550"

[1] "TOR" "1986" "12611047"

[1] "TOR" "1987" "10479501"

[1] "TOR" "1988" "12241225"

[1] "TOR" "1989" "16261666"

[1] "TOR" "1990" "17756834"

[1] "TOR" "1991" "19902417"

[1] "TOR" "1992" "44788666"

[1] "TOR" "1993" "47279166"

[1] "TOR" "1994" "43433668"

[1] "TOR" "1995" "50590000"

[1] "TOR" "1996" "29555083"

[1] "TOR" "1997" "47079833"

[1] "TOR" "1998" "51376000"

[1] "TOR" "1999" "45444333"

[1] "TOR" "2000" "44838332"

[1] "TOR" "2001" "76895999"

[1] "TOR" "2002" "76864333"

[1] "TOR" "2003" "51269000"

[1] "TOR" "2004" "50017000"

[1] "TOR" "2005" "45719500"

[1] "TOR" "2006" "71365000"

[1] "TOR" "2007" "81942800"

[1] "TOR" "2008" "97793900"

[1] "TOR" "2009" "80538300"

[1] "TOR" "2010" "62234000"

[1] "TOR" "2011" "62567800"

[1] "TOR" "2012" "75009200"

[1] "TOR" "2013" "126288100"

[1] "COL" "1993" "10353500"

[1] "COL" "1994" "23887333"

[1] "COL" "1995" "34154717"

[1] "COL" "1996" "40179823"

[1] "COL" "1997" "43559667"

[1] "COL" "1998" "50484648"

[1] "COL" "1999" "61935837"

[1] "COL" "2000" "61111190"

[1] "COL" "2001" "71541334"

[1] "COL" "2002" "56851043"

[1] "COL" "2003" "67179667"

[1] "COL" "2004" "65445167"

[1] "COL" "2005" "47839000"

[1] "COL" "2006" "41233000"

[1] "COL" "2007" "54041000"

[1] "COL" "2008" "68655500"

18

Page 20: A Report on Baseball Using R

[1] "COL" "2009" "75201000"

[1] "COL" "2010" "84227000"

[1] "COL" "2011" "88148071"

[1] "COL" "2012" "78069571"

[1] "COL" "2013" "74409071"

[1] "FLO" "1993" "19330545"

[1] "FLO" "1994" "21633000"

[1] "FLO" "1995" "24515781"

[1] "FLO" "1996" "31022500"

[1] "FLO" "1997" "48692500"

[1] "FLO" "1998" "41322667"

[1] "FLO" "1999" "21085000"

[1] "FLO" "2000" "19872000"

[1] "FLO" "2001" "35762500"

[1] "FLO" "2002" "41979917"

[1] "FLO" "2003" "49450000"

[1] "FLO" "2004" "42143042"

[1] "FLO" "2005" "60408834"

[1] "FLO" "2006" "14671500"

[1] "FLO" "2007" "30507000"

[1] "FLO" "2008" "21811500"

[1] "FLO" "2009" "36834000"

[1] "FLO" "2010" "57029719"

[1] "FLO" "2011" "56944000"

[1] "ANA" "1997" "31135472"

[1] "ANA" "1998" "41281000"

[1] "ANA" "1999" "55388166"

[1] "ANA" "2000" "51464167"

[1] "ANA" "2001" "47535167"

[1] "ANA" "2002" "61721667"

[1] "ANA" "2003" "79031667"

[1] "ANA" "2004" "100534667"

[1] "ARI" "1998" "32347000"

[1] "ARI" "1999" "68703999"

[1] "ARI" "2000" "81027833"

[1] "ARI" "2001" "85082999"

[1] "ARI" "2002" "102819999"

[1] "ARI" "2003" "80657000"

[1] "ARI" "2004" "69780750"

[1] "ARI" "2005" "62329166"

[1] "ARI" "2006" "59684226"

[1] "ARI" "2007" "52067546"

[1] "ARI" "2008" "66202712"

[1] "ARI" "2009" "73115666"

[1] "ARI" "2010" "60718166"

[1] "ARI" "2011" "53639833"

[1] "ARI" "2012" "73804833"

[1] "ARI" "2013" "90132000"

[1] "MIL" "1998" "33914904"

[1] "MIL" "1999" "43377395"

[1] "MIL" "2000" "36505333"

[1] "MIL" "2001" "43886833"

[1] "MIL" "2002" "50287833"

[1] "MIL" "2003" "40627000"

[1] "MIL" "2004" "27528500"

[1] "MIL" "2005" "39934833"

[1] "MIL" "2006" "57568333"

[1] "MIL" "2007" "70986500"

[1] "MIL" "2008" "80937499"

[1] "MIL" "2009" "80182502"

[1] "MIL" "2010" "81108278"

[1] "MIL" "2011" "85497333"

[1] "MIL" "2012" "97653944"

[1] "MIL" "2013" "76947033"

[1] "TBA" "1998" "27280000"

[1] "TBA" "1999" "38870000"

[1] "TBA" "2000" "62765129"

[1] "TBA" "2001" "56980000"

[1] "TBA" "2002" "34380000"

[1] "TBA" "2003" "19630000"

[1] "TBA" "2004" "29556667"

[1] "TBA" "2005" "29679067"

[1] "TBA" "2006" "34917967"

[1] "TBA" "2007" "24123500"

[1] "TBA" "2008" "43820597"

[1] "TBA" "2009" "63313034"

[1] "TBA" "2010" "71923471"

[1] "TBA" "2011" "41053571"

[1] "TBA" "2012" "64173500"

[1] "TBA" "2013" "52955272"

[1] "LAA" "2005" "94867822"

[1] "LAA" "2006" "103472000"

[1] "LAA" "2007" "109251333"

[1] "LAA" "2008" "119216333"

[1] "LAA" "2009" "113709000"

[1] "LAA" "2010" "104963866"

[1] "LAA" "2011" "138543166"

[1] "LAA" "2012" "154485166"

[1] "LAA" "2013" "124174750"

[1] "WAS" "2005" "48581500"

19

Page 21: A Report on Baseball Using R

[1] "WAS" "2006" "63143000"

[1] "WAS" "2007" "36947500"

[1] "WAS" "2008" "54961000"

[1] "WAS" "2009" "59928000"

[1] "WAS" "2010" "61400000"

[1] "WAS" "2011" "63856928"

[1] "WAS" "2012" "80855143"

[1] "WAS" "2013" "113703270"

[1] "MIA" "2012" "118078000"

[1] "MIA" "2013" "33601900"

Problem 7C

c) Display these in a plot.

*To the reader: Please refer to appendix for Plots.

1.2.8 Problem 8

Study the change in salary over time.

Have salaries kept up with inflation, fallen behind, or grown faster?

Overall the growth in salary, has shown a steady linear trend.

This is,however, taking in consideration The OVERALL growth amongst all

teams. There are clearly some teams that show some

near-exponential growth in income,which is balanced out by a group

of teams having either less than average income growth, or negative

income growth. Regarding inflation, yes the salaries seem to, overall,

keep up with the creeping inflation.

1.2.9 Problem 9

Problem 9A

Compare payrolls for the teams that are in the same leagues

*To the reader: Please refer to appendix for Plots.

Problem 9B

Compare payrolls for the teams that are in the same division.

*To the reader: Please refer to appendix for Plots.

20

Page 22: A Report on Baseball Using R

Problem 9C

Are there any interesting characteristics?

Although most teams show a positive growth in payroll, there are stark

differences in rates. For example the Boston Red Sox and the New York

Yankees have monstrous growths in payroll compared to other teams in the

AL league, while other teams such as Kansas City are typically under the

mean growth curve. The same can be said about the NL league, however most

teams in the NL league stay typically closer to the mean growth. There

is one exception of monstrous growth, and that is LAN(I believe this to be

the dodgers).

Have certain teams always had top payrolls over the years?

There are a few teams that have clearly higher payrolls than others, they

are the following: The Boston Red Sox, The New York Yankees, The Los

Angeles Dodgers, and to some extent the Phileadelphia Phillies

Is there a connection between payroll and performance?

One clear correlation that I observed was between the Yankee’s

ridiculous amounts of W-Series wins and their and their enormous payroll.

It appears that the more frequent a team appears in the W-Series,

the higher their payroll, this is clear when observing which teams

lost the World Series multiple times, but still tend to have fairly higher

payrolls, such as CHN and BOS. Of course more recent appearances in

the W-Series garners higher salaries as well, this can be observed in

SFN for example.

1.2.10 Problem 10

10. Has the distribution of home runs for players increased over the years?

*To the reader: Please refer to appendix for Plots.

We can see that there is an overall increase in the amount of homeruns

as time progresses. Compared to previous years, we can observe that

the frequency of larger amounts of homeruns are occuring, but are

21

Page 23: A Report on Baseball Using R

still infrequent.

22

Page 24: A Report on Baseball Using R

23

Page 25: A Report on Baseball Using R

Chapter 2

APPENDIX: Plots for HW6

2.1 Problem 5

2.1.1 Graphs: Number of Games played by World Se-ries Winners and Losers

24

Page 26: A Report on Baseball Using R

2.2 Problem 7C

2.2.1 Graphs: Baseball Team Payrolls 1971-2013

25

Page 27: A Report on Baseball Using R

2.3 Problem 9A

2.3.1 Graphs: American League Payroll

26

Page 28: A Report on Baseball Using R

2.3.2 Graphs: National League Payroll

27

Page 29: A Report on Baseball Using R

2.4 Problem 9B

2.4.1 Graphs:Division ALC Team Payroll

28

Page 30: A Report on Baseball Using R

2.4.2 Graphs:Division ALE Team Payroll

29

Page 31: A Report on Baseball Using R

2.4.3 Graphs:Division ALW Team Payroll

30

Page 32: A Report on Baseball Using R

2.4.4 Graphs:Division NLC Team Payroll

31

Page 33: A Report on Baseball Using R

2.4.5 Graphs:Division NLE Team Payroll

32

Page 34: A Report on Baseball Using R

2.4.6 Graphs:Division NLW Team Payroll

33

Page 35: A Report on Baseball Using R

2.5 Problem 10

2.5.1 Graphs: Distribution of Home Runs from 1875-2013

34

Page 36: A Report on Baseball Using R

Chapter 3

APPENDIX: EXPLICIT RCODE

3.1 Part 1

3.1.1 Part 1i

############## START OF PART 1i

INSIDE SHELL:

time grep -n "\"OAK\",\"Oakland" * > OAK.txt |

grep -n "\"SMF\",\"Sacramento" * > SMF.txt |

grep -n "\"LAX\",\"Los Angeles" * > LAX.txt |

grep -n "\"SFO\",\"San Francisco" * > SFO.txt |

grep -n "\"JFK\",\"New York" * > JFK.txt|

wc -l *.txt | sort -r

INSIDE R:

outboundCount <- function(dataFrameList, string){

i <- 1

sum <- 0

while (i <= length(dataFrameList)){

sum <- sum + length(which(dataFrameList[[i]] == string))

i <- i + 1

}

print(sum)

}

35

Page 37: A Report on Baseball Using R

time <- proc.time()

data1 <- read.csv("2012_August.csv")

data2 <- read.csv("2012_July.csv")

data3 <- read.csv("2012_September.csv")

data4 <- read.csv("2012_October.csv")

data5 <- read.csv("2012_November.csv")

data6 <- read.csv("2012_December.csv")

data7 <- read.csv("2013_January.csv")

data8 <- read.csv("2013_February.csv")

data9 <- read.csv("2013_March.csv")

data10 <- read.csv("2013_April.csv")

data11 <- read.csv("2013_May.csv")

data12 <- read.csv("2013_June.csv")

data1 <- data1$ORIGIN

data2 <- data2$ORIGIN

data3 <- data3$ORIGIN

data4 <- data4$ORIGIN

data5 <- data5$ORIGIN

data6 <- data6$ORIGIN

data7 <- data7$ORIGIN

data8 <- data8$ORIGIN

data9 <- data9$ORIGIN

data10 <- data10$ORIGIN

data11 <- data11$ORIGIN

data12 <- data12$ORIGIN

dataFrameList <- list(data1,data2,data3,

data4,data5,data6,data7,data8,

data9,data10,data11,data12)

outboundCount(dataFrameList, "LAX")

outboundCount(dataFrameList, "SFO")

outboundCount(dataFrameList, "JFK")

outboundCount(dataFrameList, "OAK")

outboundCount(dataFrameList, "SMF")

proc.time() - time

############## END OF PART 1i

36

Page 38: A Report on Baseball Using R

3.1.2 Part 1ii

############## START OF PART 1ii

INSIDE SHELL:

time grep -n "OAK" * > OAK2.csv |

grep -n "SMF" > SMF2.csv |

grep -n "LAX" * > LAX2.csv |

grep -n "SFO" * > SFO2.csv |

grep -n "JFK" * > JFK2.csv

INSIDE R:

time <- proc.time()

OAK <- read.csv("OAK2.csv")

SFO <- read.csv("SFO2.csv")

SMF <- read.csv("SMF2.csv")

LAX <- read.csv("LAX2.csv")

JFK <- read.csv("JFK2.csv")

names(SFO)[24] <- "DESTINATION"

names(SFO)[15] <- "ORIGIN"

names(OAK)[24] <- "DESTINATION"

names(OAK)[15] <- "ORIGIN"

names(LAX)[24] <- "DESTINATION"

names(LAX)[15] <- "ORIGIN"

names(SMF)[24] <- "DESTINATION"

names(SMF)[15] <- "ORIGIN"

names(JFK)[24] <- "DESTINATION"

names(JFK)[15] <- "ORIGIN"

stuff$ORIGIN <- subset(OAK, OAK$ORIGIN %in% "OAK")

stuff$DESTINATION <- subset(OAK, OAK$DESTINATION %in% "OAK")

outinCount <- function(dFrame, string){

sum <- 0

sum <- (length(subset(dFrame, dFrame$ORIGIN %in% string)$ORIGIN) +

length(subset(dFrame, dFrame$DESTINATION %in% string)$DESTINATION))

print(sum)

}

37

Page 39: A Report on Baseball Using R

outinCount(LAX,"LAX")

outinCount(JFK,"JFK")

outinCount(SFO,"SFO")

outinCount(OAK,"OAK")

outinCount(SMF,"SMF")

proc.time() - time

############## END OF PART 1ii

3.2 Functions used in Part 2

3.2.1 sortTeamReturnSalary(dataFrame,name)

USE: Used to sort a data frame by team name and return the salary for that

specific a specific year available in the database.

sortTeamReturnSalary <- function(dataFrame,name){

dataWork <- subset(dataFrame, teamID %in% name)

print(c(dataWork$teamID[1] , sum(dataWork$salary)))

}

3.2.2 sortTeamYearly(dataFrame,name)

USE: Used to sort information by teams and print out a team’s total salary for

every year available in the database

sortTeamYearly <- function(dataFrame, name){

#Group together our data for a team

dataWork <-subset(dataFrame, teamID %in% name)

#Group each year together.

year <- (unique(dataWork$yearID))

return(sortYearlyPay(dataWork,year))

}

38

Page 40: A Report on Baseball Using R

3.2.3 sortYearlyPay(dataFrame,year)

sortYearlyPay <- function(dataFrame, year){

USE: Used inside sortTeamYearly to return the Salary per Year for a specific

Baseball team.

i <- 1

#an empty list to add to

list <- list()

#Print each year’s salary

while( i <= length(year)){

dataYear <- subset(dataFrame, yearID %in% year[i])

list[[i]] <- print(c(dataYear$teamID[1], year[i], sum(dataYear$salary)))

i <- i + 1

}

return(list)

}

3.2.4 frameFill(list)

USE: Used specifically to create data based on my needs:

Which was typically to create a data frame with a teamID, a year ID, and

the "payRoll" that hw6 is interested in.

frameFill <- function(list){

#dummy variables to be removed later

dataFrame <- data.frame("dummy", 0000, 0000,stringsAsFactors=FALSE)

names(dataFrame) <- list("teamID", "yearID", "payRoll")

i <- 1

while(i <= length(list)){

j <- 1

while(j <= length(list[[i]])){

k <- 1

while(k <= length(list[[i]][j])){

row <- list[[i]][j][k]

dataFrame <- rbind(dataFrame,unlist(row))

k <- k + 1

39

Page 41: A Report on Baseball Using R

}

j <- j +1

}

i <- i + 1

}

return(dataFrame[-1,])

}

3.2.5 sortTeam

USE: Used to create a dataFrame where teams are grouped together.

sortTeam<- function(dataFrame, name){

#Group together our data for a team

dataWork <-subset(dataFrame, teamID %in% name)

return(dataWork)

}

3.2.6 sortYearMean(dataFrame, year)

USE: Used to group together information based on year. Specifically made

to create a mean Line for graphs.

sortYearMean<- function(dataFrame, year){

#Group together our data for a team

dataWork <-subset(dataFrame, yearID %in% year)

return(dataWork)

}

3.2.7 meanFrame(list)

USE: Used to create a new data frame made specifically for creating a mean line

for our graphs.

40

Page 42: A Report on Baseball Using R

meanFrame <- function(list){

#dummy variables to be removed later

dataFrame <- data.frame(0000, 0000,stringsAsFactors=FALSE)

names(dataFrame) <- list( "yearID", "payRoll")

i <- 1

while (i <= length(list)){

row <-list(list[[i]]$yearID[1],

round(mean(as.integer(unlist(list[[i]][3])))))

dataFrame <- rbind(dataFrame,unlist(row))

i <- i + 1

}

return(dataFrame[-1,])

}

3.2.8 plotStart(plotData,meanData,listNames)

USE: Used to actually plot our information. This will plot a number of lines

On seperate plots in order to avoid messiness. BLACK DOTS here represent

the overall mean growth of salary. It’s functionality saved me a

great deal of time!

plotStart <- function(plotData,meanData,listNames){

list <- lapply(listNames, sortTeam, data = plotData)

#it turns out that color goes from 1 to 657

colorList <- list(sample(657,length(list)))

nameList <- list(unique(plotData$teamID))

#To fix the parameters, we will find the max and min payRoll Magnitude

yMAX <- max(as.integer(plotData$payRoll))

yMIN <- min(as.integer(plotData$payRoll))

xMAX <- max(as.integer(plotData$yearID))

xMIN <- min(as.integer(plotData$yearID))

#To fill up the plot with a certain number of lines

lineCounter <- round(length(listNames)/6)

i <- 1

while( i <= length(list)){

if(i > length(list))break

plot(meanData$yearID, meanData$payRoll,

41

Page 43: A Report on Baseball Using R

xlab = "Year", ylab = "payRoll Magnitude",

main = "Baseball team’s total payroll by year.",

pch = 20,cex = 2.5, xlim = c(xMIN,xMAX), ylim = c(yMIN,yMAX))

legend(xMIN,yMAX, fill = c(colorList[[1]][i:(i+lineCounter-1)],0),

pch = c(NA,NA,NA,NA,NA,NA,20),

legend = c(nameList[[1]][i:(i+lineCounter-1)], "mean"))

j <- 1

while( j <= lineCounter){

if(i > length(list))break

lines(list[[i]]$yearID, list[[i]]$payRoll, col = colorList[[1]][i])

j <- j + 1

i <- i + 1

}

}

}

3.2.9 plotInformation(dataFrame)

USE: Master function to create data frames and plot information based on the

question at hand. Takes a dataFrame, and serves as a great functional

tool for the hw problems.

plotInformation <- function(dataFrame){

listNames <- c(as.vector(unique(dataFrame$teamID)))

cData <- invisible(lapply(listNames, sortTeamYearly, dataFrame = dataFrame))

dataPlot <- frameFill(cData)

listYear <- c(min(as.integer(dataPlot$yearID)):

max(as.integer(dataPlot$yearID)))

meanData <- dataPlot[with(dataPlot,order(dataPlot$yearID)),]

list <- lapply(listYear, sortYearMean, data = meanData)

meanData <- meanFrame(list)

plotStart(dataPlot, meanData,listNames)

}

3.2.10 divisionPlot(division,leagueAL,leagueNL)

USE: Used specifically for a HW6 Problem. This function is used to create a

42

Page 44: A Report on Baseball Using R

Data frame that supplies team SALARIES from the Saleries Tables. It was

problematic that the Teams table did not include salaries. So by cross

referencing, we are able to create a data frame for each division that

includes every team’s salary.

divisionPlot <-function(division,league){

dFrame1 <- (subset(league, teamID %in% unique(division$teamID)))

plotInformation(dFrame1)

}

3.2.11 groupFrame(list)

USE: Used specifically for HW6 Problem 10. This function groups together

our lists of list(where each list holds a year’s number of homeruns),

into groups of 20 for easier display and observation.

groupFrame <- function(list){

#Our list to return

groupFrame <- list()

i <- 1

j <- 1

while (j <= (length(list)/20)){

megaList <- list()

while ( i <= (j*20)){

megaList <- c(megaList,list[[i]]$HR)

i <- i + 1

}

groupFrame[[j]] <- unlist(megaList)[-which(megaList == 0)]

j <- j + 1

}

return(groupFrame)

}

43

Page 45: A Report on Baseball Using R

3.2.12 plot.HR(list)

plot.HR <- function(list){

USE: Used specifically for HW6 Problem 10. This is used to explicitly plot

and label our graphs of interest.

i <- 1

mainList <- c( "Distributions of Homeruns over 1871-1891",

"Distributions of Homeruns over 1891-1911",

"Distributions of Homeruns over 1912-1932",

"Distributions of Homeruns over 1933-1953",

"Distributions of Homeruns over 1954-1974",

"Distributions of Homeruns over 1975-1995",

"Distributions of Homeruns over 1996-2013")

j <- 1

maxList <- list()

while(j <= length(list)){

maxList <- c(maxList, max(table(list[[j]])))

j <- j + 1

}

while(i <= length(list)){

barplot(table(list[[i]]), main = mainList[i],xlim = c(min(unlist(list)),max(unlist(list))), ylim = c(0,max(unlist(maxList))))

i <- i + 1

}

}

3.3 Part 2

3.3.1 Problem 1

#1. What years does the data cover? are there data for each of these years?

dataDate <- dbGetQuery(db, "Select yearID from Teams")

print(c(min(dataDate),max(dataDate)))

[1] 1871 2013

#The database covers the years 1871 tp 2013,

#there exists data for each of these years

44

Page 46: A Report on Baseball Using R

3.3.2 Problem 2

#2. How many (unique) people are included in the database?

#How many are players, managers, etc?

#MANAGER TABLE FOR MANAGERS

#MASTER TABLE FOR PLAYERS

#THEN ADD THE SUM

sum <- 0

dbListFields(db, "Managers")

[1] "playerID" "yearID" "teamID" "lgID" "inseason" "G"

[7] "W" "L" "rank" "plyrMgr"

data <- dbGetQuery(db, "Select playerID from Managers")

length(unique(data$playerID))

[1] 682

sum <- sum + length(unique(data$playerID))

data <- dbGetQuery(db, "Select playerID from Master")

length(unique(data$playerID))

[1] 18354

sum <- sum + length(unique(data$playerID))

print(sum)

[1] 19036

#There are 682 UNIQUE managers and 18354 UNIQUE Baseball Players.

#There is a grand total of 19036 UNIQUE people.

3.3.3 Problem 3

#3. What team won the World Series in 2000?

data <- (dbGetQuery(db, "Select WSWin , teamID, yearID from Teams"))

dataSub <- subset(data, yearID %in% 2000)

dataSub[which(subset(data, yearID %in% 2000)$WSWin == "Y"),]

WSWin teamID yearID

2334 Y NYA 2000

#The team that won was NYA.

45

Page 47: A Report on Baseball Using R

3.3.4 Problem 4

#4. What teams lost the World Series each year?

data <- (dbGetQuery(db, "Select WSWin , teamID, yearID, LgWin from Teams"))

dataSub <- subset(data, LgWin %in% "Y" )

dataSub <- subset(dataSub, WSWin %in% "N")

dataSub <- data.frame(dataSub$teamID, dataSub$yearID)

dataSub

3.3.5 Problem 5

#5. Do you see a relationship between the number of games won in a season

# and winning the World Series?

par(mfrow =c(2,1))

data <- (dbGetQuery(db, "Select WSWin , G, yearID from Teams"))

dataSub <- subset(data, WSWin %in% "Y")

plot(dataSub$yearID,dataSub$G, type = "l",

ylab = "# of Games Won", xlab = "Year",

main = "Number of Games played by World Series Winners")

#There are a few outliers, noted by the sharp drops on the plot.

#But it seems that

from 1984 the trend has been that teams typically winning a large number

of games wins the world series.

#However, to check this trend we should also look at the teams that DID

#not win the world series.

data <- (dbGetQuery(db, "Select WSWin,G , yearID, LgWin from Teams"))

dataSub <- subset(data, LgWin %in% "Y" )

dataSub <- subset(dataSub, WSWin %in% "N")

plot(dataSub$yearID,dataSub$G, type = "l",

ylab = "# of Games Won", xlab = "Year",

main = "Number of Games played by World Series Losers")

#We can clearly see a similarity between the two graphs,

#so we cannot say that winning more games during playoffs will decide

#who wins the

World Series.

46

Page 48: A Report on Baseball Using R

3.3.6 Problem 6

#6. In 2003, what were the three highest salaries?

#(We refer here to unique salaries, i.e., more than one

# player might be paid one of these salaries.)

data <- (dbGetQuery(db, "Select salary, yearID from Salaries"))

dataSub <- subset(data, data$yearID %in% 2003)

print(c(

sort(unique(dataSub$salary))[length(sort(unique(dataSub$salary)))],

sort(unique(dataSub$salary))[length(sort(unique(dataSub$salary)))-1],

sort(unique(dataSub$salary))[length(sort(unique(dataSub$salary)))-2]))

[1] 22000000 20000000 18700000

#The above will print the 3 Highest saleries in the year 2003 from

#largest to smallest.

#The Highest 3 salaries are: 22,000,000 20,000,000 18,700,000

#(Sidenote: Wow. . .)

3.3.7 Problem 7

#7.

# a) For 1999, compute the total payroll

# of each of the different teams.

# I am understanding this as the combined saleries

# of EVERYONE on the team.

# b) Next compute the team payrolls for all years in the database for

# which we have salary information.

# c) Display these in a plot.

# I understand that this will be a very messy plot. I will try

# to present it clearly.

Problem 7A

#PART A

data <- (dbGetQuery(db, "Select salary, yearID, teamID from Salaries"))

data1999 <- subset(data,yearID %in% 1999)

#I will just make a function to do this for me.

47

Page 49: A Report on Baseball Using R

list <- (as.vector(unique(data1999$teamID)))

invisible(lapply(list, sortTeamReturnSalary, dataFrame = data1999))

#The above will print out the team name,

#a year, and the year’s total payroll.

Problem 7B

#PART B

#I will write another function here:

#The first must group all of the yearly data by team.

#the second must group all of the data by year and

#print each year’s total payroll

#The previous function (sortTeamReturnSalary) will not work, and will

#be a pain to remake to fit both needs.

listNames <- (as.vector(unique(data$teamID)))

invisible(lapply(listNames, sortTeamYearly, dataFrame = data))

Problem 7C

#PART C

#This is how I will display "These" in a plot.

#Group each total salary by team name

#Put each dataFrame into a list. The first item will be the first plot,

#succeeding items will be added.

#Each line should be a different team on the same plot.

#So i plotted everything on one graph, and I now realize that it that

#plot is useless.

#Instead I will make seperate plots with fixed parameters,

#allowing us to compare the plots

par(mfrow=c(2,3))

plotInformation(data)

3.3.8 Problem 8

#8. Study the change in salary over time.

#Have salaries kept up with inflation, fallen behind, or grown faster?

#Overall the growth in salary, has shown a steady linear trend.

48

Page 50: A Report on Baseball Using R

#This is,however, taking in consideration The OVERALL growth amongst

#all teams. There are clearly some teams that show some

#near-exponential growth in income,which is balanced out by a group

#of teams having either less than averageincome growth, or negative

#income growth. Regarding inflation, yes the salaries seem to, overall,

#keep up with the creeping inflation.

3.3.9 Problem 9

#9. Compare payrolls for the teams that are in the same leagues,

# and then in the same divisions.

#For me: There are 3 divisions and 2 Leagues in this database for Salaries

# But there are 7 leagues in the Team’s Table

# CLARIFICATIONS: The salaries database is the MAJOR

# LEAGUES. . .leagues.

# Within the Teams database,

# minor leagues are included.

# APPARENTLY TEAMS CAN SWITCH LEAGUES,

# Some strangeness in the graphs!

# For Example: HOUSTON is in the AL league in 2013

# instead of the NL league!

Problem 9A

dataLeague <-

dbGetQuery(db, "Select yearID, teamID, lgID, salary from Salaries")

#Side note: I no longer needed the league category,

#Since we’re grouping them in Explicit Leagues.

#So I just got rid of them

leagueAL <- subset(

dataLeague, lgID %in% "AL")

[,!(names(subset(dataLeague, lgID %in% "AL"))

%in% "lgID")]

leagueNL <- subset(

dataLeague, lgID %in% "NL")

[,!(names(subset( dataLeague, lgID %in% "NL"))

49

Page 51: A Report on Baseball Using R

%in% "lgID")]

plotInformation(leagueAL)

plotInformation(leagueNL)

Problem 9B

#We must find a way to deal with division

dataDivisions <- dbGetQuery(db, "Select yearID, teamID, lgID, divID from Teams")

#SPLIT THEM INTO LEAGUES FIRST

#AL LEAGUE

divisionALC <- subset(dataDivisions, divID %in% "C")[,!(names(subset(dataDivisions , divID %in% "C")) %in% "divID")]

divisionALE <- subset(dataDivisions, divID %in% "E")[,!(names(subset(dataDivisions , divID %in% "E")) %in% "divID")]

divisionALW <- subset(dataDivisions, divID %in% "W")[,!(names(subset(dataDivisions , divID %in% "W")) %in% "divID")]

#NL LEAGUE

divisionNLC <- subset(dataDivisions, divID %in% "C")[,!(names(subset(dataDivisions , divID %in% "C")) %in% "divID")]

divisionNLE <- subset(dataDivisions, divID %in% "E")[,!(names(subset(dataDivisions , divID %in% "E")) %in% "divID")]

divisionNLW <- subset(dataDivisions, divID %in% "W")[,!(names(subset(dataDivisions , divID %in% "W")) %in% "divID")]

#The strategy:

#We will look for each division within both leagues, and create a new dataFrame, so we can through it into

#Our plotInformation Function.

par(mfrow=c(3,3))

divisionPlot(divisionALC,leagueAL)

divisionPlot(divisionALE,leagueAL)

par(mfrow=c(2,3))

divisionPlot(divisionALW,leagueAL)

divisionPlot(divisionNLC,leagueNL)

divisionPlot(divisionNLE,leagueNL)

divisionPlot(divisionNLW,leagueNL)

Problem 9C

#Are there any interesting characteristics?

50

Page 52: A Report on Baseball Using R

#Although most teams show a positive growth in payroll, here is stark

#differences in rates.

#For example the Boston Red Sox and the New York Yankees have a monstrous

#growth in payroll compared to

#Other teams in the AL league, while other teams such as Kansas City are

#typically under the mean growth curve.

#The same an be said about the NL league, however most teams in the NL

#league stay typically closer to the mean growth.

#There is one exception of monstrous growth, and that is LAN(I believe

#this to be the dodgers).

#Have certain teams always had top payrolls over the years?

#There are a few teams that have clearly higher payrolls than others, they

#are the following:

#The Boston Red Sox, The New York Yankees, The Los Angeles Dodgers, and to

#some extent the Phileadelphia Phillies

#Is there a connection between payroll and performance?

#One clear correlation that I observed was between the Yankee’s ridiculous

#amounts of W-Series wins and their

#And their enormous payroll. It appears that the more frequent teams in

#the W-Series have much higher

#Pay Rolls, this is clear when observing which teams lost the world series

#multiple times, but still

#tend to have fairly higher payrolls, such as CHN,NYA. Of course more

#recent appearances in W-Series

#Garners higher payRolls, which can be observed in the SFN for example.

3.3.10 Problem 10

#10. Has the distribution of home runs for players increased over the years?

dataHomeRun <- dbGetQuery(db, "Select HR, yearID, playerID from Batting")

#Just to Cross Reference, that we have a single # of HR’s

dataHomeRunCheck <- dbGetQuery(db, "Select HR from Pitching")

#if this is 0, then we definitely have all "recorded" HR’s

sum(dataHomeRunCheck$HR) - sum(dataHomeRun$HR, na.rm = TRUE)

#Not zero, but insignificant to whole.

51

Page 53: A Report on Baseball Using R

#What do we want to produce?

#We want to produce a graph, that would show yearly distributions of #of

#occurences vs #of homeruns

#By Year. We need to keep scaling in mind, so that we may ’eyeball’

#multiple graphs with ease.

#Let’s make a dataFrame of all our information.

#We will remove the NA’s and 0’s of our data as we are more concerned with

#the "growth" of home runs.

HR <- (dataHomeRun$HR[-which(is.na(dataHomeRun$HR))])

yearID <- (dataHomeRun$yearID[-which(is.na(dataHomeRun$HR))])

#dataFrame of players, yearID and HR

dFrame <- data.frame(yearID)

dFrame$HR <- HR

names(dFrame) <- list("yearID", "HR")

listYear <- min(yearID):max(yearID)

sortedFrame <- lapply(listYear, sortYearMean, data = dFrame)

#We will group our data into every twenty years,

#we will also remove zeros here

gFrame <- groupFrame(sortedFrame)

#This will print the plot we are interseted in

par(mfrow=c(4,2))

#PLOT the histograms

plot.HR(gFrame)

#We can see that there is an overall increase in the amount of

#homeruns as time progresses.

#We can also observe that the frequency of larger amounts of

#homeruns are occuring,

#but are still infrequent when compared to previous years.

52

Page 54: A Report on Baseball Using R

Chapter 4

Extra Credit: A Collection ofThoughts

4.1 A Small Quotes from Piazza

4.2 Results

4.2.1 Home Runs and Handedness

Here we will explore the relationship between hitting home runs and a

player’s handedess. There is a slight difference in the amount of HR’s a

particular player is able to make in their season, based on their handedness.

53

Page 55: A Report on Baseball Using R

Based on Population: All Available Players

As we observe the entire population of available players in the database,

we can see that righthanded players tend to be the most likely to have 0

homeruns in a season. Left handedplayers are more likely to hit at least a

single home run, and ambidextrous players are the most likely to hit at

least 1 home run bewteen the three groups.

54

Page 56: A Report on Baseball Using R

Based on Population: Home Run Hitters

When we consider the population of players who have hit at least a

single home run, we can observe a different relationship. Right handed

players tend to

4.2.2 Pitchers: Throwing Home Runs and Handedness

Does the hand a pitcher use show any effect on the chance of a

home run being hit?

55

Page 57: A Report on Baseball Using R

We can see that Ambidextrous pitchers have the highest population size

for throwing zero homeruns. Followed by left handed pitchers, and

right handed pitches being the smallest of the three populations.

Although there is a difference amongst the "0" population, it does not

appear to be significant in magnitude, and we can conclude that specific

handedness does not necessarily dictate the chance of throwing a home run.

4.2.3 Errors in the World Series

I wondered the following:

"With all of the pressure in the final game of the season, are there

56

Page 58: A Report on Baseball Using R

many errors that occur? Do teams win by capitalizing on those

errors? Have they decreased or increased or stayed roughly the same

over the years?"

We can clearly observe that there is a definite decreasing trend

in the amount of errors being made in the World Series as time progresses.

The notable trend seems to follow a hyperbolic curve, and it would seem that

in future cases, there will eventually be a "cap off" for the number of errors

being made.

We can also note that typically W-Series winners make fewer mistakes than

57

Page 59: A Report on Baseball Using R

the teams that they beat, but there are times when the winners do make more

errors during the game than their opponent. It is unfortunate, that the

losing team did not make better use of those opportunities!

The most striking feature of the graph is the ridiculous amount of

errors made in the games in the 1880’s, seriously. What was going on?

Unfortunately, I could not find any answer through use of search

engines. There is also a great deal of data missing between 1880s

and the 1900.

4.2.4 Hall of Fame and Salary

Players who are inducted into the Hall of Fame, are players voted into the

hall by a comittee who come to a decision via elections. I was curious to know

how these famed players’ salaries would match up with today’s player’s salaries.

58

Page 60: A Report on Baseball Using R

At first glance it may seem that hall of famers almost never come close to the

overall average salary of baseball players, but the reason for this is the

inflation of salaries that appear in our data.

4.2.5 Positions and Salary

In this portion I wanted to see if there was a relationship between a player’s

salary and their position. Do particular positions get paid noticeablely

larger salaries, and does that happen more frequently for particular positions?

A particular group: Single Position Players

This was my initial graph at first. I thought that players were often

59

Page 61: A Report on Baseball Using R

picked to play a specific position, so I aimed to find a player’s position

and cross reference them with the salaries table in order to find a relationship

between a position and a salary. However as the above graph shows, there are

very few players who play a single position(with the exception of

catchers and pitchers; especially the latter). As we can already see, it would

appear that pitchers show variation in magnitudes of salaries.

Salaries by Position

After coming to the understanding that players will often rotate around

positions, I decided to simply cross reference players in the fielding table.

with their salary in the salaries table. These graphs, all having the same

60

Page 62: A Report on Baseball Using R

scaling, show that of all the positions, there aren’t any ridiculous

focus on a single position. . .except for pitchers. Pitchers tend to

have higher salaries compared to the other positions, which makes sense.

Pitchers have the most outliers in Salary (Those ostracized points in

the pitcher’s graph.) It would also appear that First basemen are considered

more "valuable" in resource than the other two bases. And left fielders

and center fielders are much more valuable than Right Fielders.(Accidentally

labeled as Right Stop. Apologies.)

4.2.6 All about the All Stars

Are All Stars payed anymore on a team than their "Non Star" team mates?

The All Star and Non Star Salaries

Yes.

61

Page 63: A Report on Baseball Using R

There has always been a stark difference between the two groups salaries.

I would like to bring to attention the black line, which represents the mean

salary of the entire population. All Stars seem to always be above the mean,

placing the rest of the players under the mean, and making them seem less

"valuable assets" to the team.

Side anecdote: After a brief dialogue with my room mate, he has me

cynically believing that baseball is a game of All Stars, and "others."

Salary Disparities: Amongst All Stars and Non Stars

The graph above shows us the growth rate of disparity, the difference

between the All Star and Non Star salary. The curve shows a slightly

62

Page 64: A Report on Baseball Using R

quadratic nature, and always follows a positive trend. This could indicate

that in the near future, the disparity will only become larger and larger.

Salary Differences: Amongst All Stars

How does the distribution of Salary differ amongst the All Stars themselves?

As we can see the distribution of Salary is definitely not normal, and is

definitely not distributed equally. The distribution is heavily skewed left.

It would benefit us to explore the visual representation of All Star Salary

Disparity through other means.

63

Page 65: A Report on Baseball Using R

As we can see, the average mean salary of All Stars is relatively low when

being compared to the population of All Stars. We can clearly see that as time

increases, that there is an increasing trend for Salary in Millions, and this

is partially due to the inflation of modern salaries. We can quickly

verify that salary is not roughly the same amongst All Star players every season.

It is worth mentioning that the red line of the graph represnts the mean salary

of the All Stars of THAT year.

4.2.7 The All Star Algorithm

Unfortunately I ran out of time when my room mate proposed to me a radical

idea. He asked of me the following:

64

Page 66: A Report on Baseball Using R

"With the data that you do have, can you actually come up with a model

to express what a generic All Star Plater is? I mean can you create a

model that clearly states something like the following:

if this particular critera is not achieved or exceeded, we can predict

that a player is or is not going to be an All Star player."

The answer I have is yes. It is possible to create a model to do so, but

we would have to do some data retrieval in order to build our model.

The following subsection will detail how I would approach my room mate’s

suggestion.

The Algorithm

I would first consider which factors could be possible good predictors

of the being an all star player. I would take an approach much like our

homework for classfiying an item as SPAM or HAM. In this case we are

classifying our players as STAR or BARS. I would then observe which

variables show distinct differences between players that are STARS and

players that belong in BARS.

Taking those variables, I would create my data frame and create

a model using categorical analysis methods I learned from STA 138.

I would most likely employ a multiple logistic regression using

R’s glm function with the family set to binomial. Then I would use this

model to predict STARS out of the data that I have used, and see how well

it could predict BARS and STARS from our data set. I would then bring

this model to my room mate and verify the model’s parameters with him.

(He has greater experience and knowledge in baseball then me, and

I would like to make sure that the factors I had selected are in fact

useful in seperating the STARS from the BARS)

65

Page 67: A Report on Baseball Using R

Chapter 5

EXPLICIT:R CODE

5.1 Functions Used in Bonus Section

5.1.1 matchFrame(dataFrame,xFrame)

USE: Takes two frames for cross referencing and returns a combined frame.

This is used to combine our dataframe using playerID’s.

matchFrame <- function(dataFrame,xFrame){

i <- 1

dFrame <- data.frame("John", 1776, "R", stringsAsFactors = FALSE)

names(dFrame) <- list("playerID", "HR","HAND")

while (i <= length(xFrame$playerID)){

workHorse <- subset(dataFrame, playerID %in% xFrame$playerID[i])

if(is.na(workHorse$playerID[1])) i <- i + 1

else{

crossFrame <- subset(xFrame, playerID %in% xFrame$playerID[i])

row <- list(workHorse$playerID[1],

crossFrame$HR[1], workHorse$bats[1])

dFrame <- rbind (dFrame,row)

i <- i + 1

print(row)

print(c(i, "out of", length(xFrame$playerID)))

}

}

return(dFrame)

66

Page 68: A Report on Baseball Using R

}

5.1.2 groupNameReturnHR(dataFrame,name)

USE: This function groups together a player’s number of home runs,

so that we may observe a player’s total number of homeruns during

their career.

groupNameReturnHR <- function(dataFrame, name){

i <- 1

dFrame <- data.frame("John", 1776, stringsAsFactors = FALSE)

#Dummy variables to be removed later

names(dFrame) <- list("playerID", "HR" )

while (i <= length(name[[1]])){

workHorse <- subset(dataFrame, playerID %in% name[[1]][i])

row <- list(unique(workHorse$playerID), sum(workHorse$HR))

dFrame <- rbind(dFrame,row)

i <- i + 1

}

return(dFrame) #Remove the first row

}

5.1.3 matchFrameFame(dataFrame,xFrame)

USE: Used for cross referencing a player’s ID and creates a

brand new data frame that returns players that are Hall

of Famers.

matchFrameFame <- function(dataFrame,xFrame){

i <- 1

dFrame <- data.frame("John", 1776, stringsAsFactors = FALSE)

names(dFrame) <- list("playerID", "salary")

while (i <= length(xFrame)){

workHorse <- subset(dataFrame, playerID %in% xFrame[i])

if(is.na(workHorse$playerID[1])) i <- i + 1 #If there is no match

else{

row <- list(workHorse$playerID[1],workHorse$salary[1])

dFrame <- rbind (dFrame,row)

67

Page 69: A Report on Baseball Using R

i <- i + 1

}

}

return(dFrame)

}

5.1.4 matchUniqueFrame(dataFrame,xFrame)

USE: Matches players in the unique position frame to their respective

salary.

matchUniqueFrame <- function(dataFrame,xFrame){

i <- 1

dFrame <- data.frame("John", "C", 1 ,stringsAsFactors = FALSE)

names(dFrame) <- list("playerID", "POS", "salary")

while (i <= length(xFrame$playerID)){

workHorse <- subset(dataFrame, playerID %in% xFrame$playerID[i])

if(is.na(workHorse$playerID[1])) i <- i + 1

else{

crossFrame <- subset(xFrame, playerID %in% xFrame$playerID[i])

row <- list(workHorse$playerID[1], crossFrame$Pos[1],

workHorse$salary[1])

dFrame <- rbind (dFrame,row)

i <- i + 1

print(row)

print(c(i, "out of", length(xFrame$playerID)))

}

}

return(dFrame)

}

5.1.5 uniquePositions(dataFrame,names)

USE: Used to find players that played only ONE position during

their entire career in baseball.

uniquePositions<- function(dataFrame, names){

68

Page 70: A Report on Baseball Using R

i <- 1

dFrame <- data.frame("John", "C", stringsAsFactors = FALSE)

#Dummy variables to be removed later

names(dFrame) <- list("playerID", "Pos")

while(i <= length(names[[1]])){

if(length((subset(dataFrame,

playerID %in% names[[1]][i]))$playerID)!= 1){

i <- i + 1

print(c(i, "out of", length(names[[1]])))

}

else{

workHorse <- subset(dataFrame, playerID %in% names[[1]][i])

row <- list(workHorse$playerID, workHorse$POS)

dFrame <- rbind(dFrame,row)

i <- i + 1

print(row)

print(c(i, "out of", length(names[[1]])))

}

}

return(dFrame) #Remove the first row

}

5.1.6 plotPosSalary(uniqueSalary)

USE: Used specifically to plot single position baseball

player Salaries grouped by Positions.

plotPosSalary <- function(uniqueSalary){

pitchers <- uniqueSalary[which(uniqueSalary$POS == "P"),]

pitchers <- pitchers[-which(pitchers$salary == max(pitchers$salary)),]

catchers <- uniqueSalary[which(uniqueSalary$POS == "C"),]

firstBase <- uniqueSalary[which(uniqueSalary$POS == "1B"),]

secondBase <- uniqueSalary[which(uniqueSalary$POS == "2B"),]

thirdBase <- uniqueSalary[which(uniqueSalary$POS == "3B"),]

shortStop <- uniqueSalary[which(uniqueSalary$POS == "SS"),]

leftField <- uniqueSalary[which(uniqueSalary$POS == "LF"),]

centerField <- uniqueSalary[which(uniqueSalary$POS == "CF"),]

69

Page 71: A Report on Baseball Using R

par(mfrow=c(3,4))

plot(c(1:length(pitchers$playerID)), pitchers$salary, pch = 20,

ylim = c(0, max(pitchers$salary)), main = "Salary by Positions:

Pitchers", xlab = "Pitcher #", ylab = "Salary Magnitude")

plot(c(1:length(catchers $playerID)), catchers $salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Catchers ", xlab = "Catchers #",

ylab = "Salary Magnitude")

plot(c(1:length(firstBase $playerID)), firstBase $salary, pch = 20,

ylim = c(0, max(pitchers$salary)), main = "Salary by Positions: First

Base ", xlab = "First Base #", ylab = "Salary Magnitude")

plot(c(1:length(secondBase$playerID)), secondBase$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Second Base", xlab = "Second Base #",

ylab = "Salary Magnitude")

plot(c(1:length(thirdBase$playerID)), thirdBase$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Third Base", xlab = "Third Base #",

ylab = "Salary Magnitude")

plot(c(1:length(shortStop$playerID)), shortStop$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Short Stop", xlab = "Short Stop #",

ylab = "Salary Magnitude")

plot(c(1:length(leftField$playerID)), leftField$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Left Field", xlab = "Left Field #",

ylab = "Salary Magnitude")

plot(c(1:length(centerField$playerID)), centerField$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Center Field", xlab = "Center Field #",

ylab = "Salary Magnitude")

70

Page 72: A Report on Baseball Using R

}

5.1.7 plotPosSalaryAll(uniqueSalary)

USE: Used specifically to plot all baseball player Salaries grouped

by Positions.

plotPosSalaryAll <- function(uniqueSalary){

pitchers <- uniqueSalary[which(uniqueSalary$POS == "P"),]

pitchers <- pitchers[-which(pitchers$salary == max(pitchers$salary)),]

catchers <- uniqueSalary[which(uniqueSalary$POS == "C"),]

firstBase <- uniqueSalary[which(uniqueSalary$POS == "1B"),]

secondBase <- uniqueSalary[which(uniqueSalary$POS == "2B"),]

thirdBase <- uniqueSalary[which(uniqueSalary$POS == "3B"),]

shortStop <- uniqueSalary[which(uniqueSalary$POS == "SS"),]

leftField <- uniqueSalary[which(uniqueSalary$POS == "LF"),]

centerField <- uniqueSalary[which(uniqueSalary$POS == "CF"),]

rightStop <- uniqueSalary[which(uniqueSalary$POS == "RF"),]

outField <- uniqueSalary[which(uniqueSalary$POS == "OF"),]

desHitter <- uniqueSalary[which(uniqueSalary$POS == "DH"),]

par(mfrow=c(3,4))

plot(c(1:length(pitchers$playerID)), pitchers$salary, pch = 20,

ylim = c(0, max(pitchers$salary)), main = "Salary by Positions:

Pitchers", xlab = "Pitcher #", ylab = "Salary Magnitude")

plot(c(1:length(catchers $playerID)), catchers $salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Catchers ", xlab = "Catchers #",

ylab = "Salary Magnitude")

plot(c(1:length(firstBase $playerID)), firstBase $salary, pch = 20,

ylim = c(0, max(pitchers$salary)), main = "Salary by Positions: First

Base ", xlab = "First Base #", ylab = "Salary Magnitude")

plot(c(1:length(secondBase$playerID)), secondBase$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Second Base", xlab = "Second Base #",

71

Page 73: A Report on Baseball Using R

ylab = "Salary Magnitude")

plot(c(1:length(thirdBase$playerID)), thirdBase$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Third Base", xlab = "Third Base #",

ylab = "Salary Magnitude")

plot(c(1:length(shortStop$playerID)), shortStop$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Short Stop", xlab = "Short Stop #",

ylab = "Salary Magnitude")

plot(c(1:length(leftField$playerID)), leftField$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Left Field", xlab = "Left Field #",

ylab = "Salary Magnitude")

plot(c(1:length(centerField$playerID)), centerField$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Center Field", xlab = "Center Field #",

ylab = "Salary Magnitude")

plot(c(1:length(rightStop$playerID)), rightStop$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Right Stop", xlab = "Right Stop #",

ylab = "Salary Magnitude")

plot(c(1:length(outField$playerID)), outField$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: Out Field", xlab = "Out Field #",

ylab = "Salary Magnitude")

plot(c(1:length(desHitter$playerID)), desHitter$salary, pch = 20,

ylim = c(0, max(pitchers$salary)),

main = "Salary by Positions: D-Hitter", xlab = "Designated Hitter #",

ylab = "Salary Magnitude")

}

72

Page 74: A Report on Baseball Using R

5.1.8 matchNonUniqueFrame(dataFrame,xFrame)

USE: Used to match together baseball player’s positions,

with their salaries. Unlike the unique match, this will

put a salary value to each of a player’s position.

matchNonUniqueFrame <- function(dataFrame,xFrame){

i <- 1

dFrame <- data.frame("John", "C", 1 ,stringsAsFactors = FALSE)

names(dFrame) <- list("playerID", "POS", "salary")

while (i <= length(xFrame$playerID)){

workHorse <- subset(dataFrame, playerID %in% xFrame$playerID[i])

if(is.na(workHorse$playerID[1])) i <- i + 1

else{

crossFrame <- subset(xFrame, playerID %in% xFrame$playerID[i])

j <- 1

while (j <= length(crossFrame$playerID)){

row <- list(workHorse$playerID[1], crossFrame$POS[j],

workHorse$salary[1])

dFrame <- rbind (dFrame,row)

j <- j + 1

}

i <- i + 1

}

}

return(dFrame)

}

5.1.9 sortSalaryYearly

USE: Used to create a data frame sorted by years, of salaries

of baseball players. In our case, this was used to organize

the All Star’s Salaries for further processing.

sortSalaryYearly <- function(dataFrame, yearList){

#Group together our data for a team

dFrame <- data.frame(1000,1000)

73

Page 75: A Report on Baseball Using R

names(dFrame) <- c("yearID", "Salary")

i <- 1

while ( i <= length(dataFrame)){

row <-subset(dataFrame[[i]], yearID %in% yearList[i])

dFrame <- rbind(dFrame,row)

i <- i + 1

}

#Group each year together.

return(dFrame[-1,])

}

5.1.10 Home Runs and Handedness

dataBats <- dbGetQuery(db, "Select playerID,bats from MASTER")

#This data only has player ID and Batting hand.

#We need to cross reference this table with the HR’s list.

dataHR <- dbGetQuery(db, "Select playerID, HR from Batting")

#Then we need to create a dataFrame that matches

#a unique player name, a handedness and the number of HR’s they have.

#We also need to remove NA’s

dataBats <- dataBats[-which(is.na(dataBats$bats)),]

dataHR <- dataHR[-which(is.na(dataHR$HR)),]

#This function will return a row of a playerID and their total number of HR’s

listNames <- list(unique(dataHR$playerID))

crossFrame <- lapply(listNames, groupNameReturnHR, dataFrame = dataHR)

xFrame <- groupNameReturnHR(dataHR,listNames)

#This function will match togther players based on their player ID

tableBattingHR <- matchFrame(dataBats,xFrame)

tableBattingHR <- tableBattingHR[-1,]

#Split them into 3 groups, RH, LH, BH

rightHand <- tableBattingHR[which(tableBattingHR$HAND == "R"),]

leftHand <- tableBattingHR[which(tableBattingHR$HAND == "L"),]

bothHand <- tableBattingHR[which(tableBattingHR$HAND == "B"),]

par(mfrow=c(3,1))

74

Page 76: A Report on Baseball Using R

Based on Population: All Available Players

#We will plot %of players with that frequency.

barplot(table(rightHand$HR)/length(rightHand$playerID),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.6),

main = "Right Handed Players Homerun Distribution",

ylab = "% of all Right-Handed Players", xlab = "# of HomeRuns")

barplot(table(leftHand$HR)/length(leftHand$playerID),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.6), main = "Left Handed Players Homerun Distribution",

ylab = "% of all Left-Handed Players", xlab = "# of HomeRuns")

barplot(table(bothHand$HR)/length(bothHand$playerID),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.6),

main = "Ambidextrous Handed Players Homerun Distribution",

ylab = "% of all Ambidextrous Players", xlab = "# of HomeRuns")

Based on Population: Home Run Hitters

barplot(table(rightHand$HR[-which(rightHand$HR == 0)])/length(rightHand$HR),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.08), main = "Right Handed Players Homerun Distribution",

ylab = "% of all Right-Handed Players", xlab = "# of HomeRuns")

barplot(table(leftHand$HR[-which(leftHand$HR == 0)])/length(leftHand$HR),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.08), main = "Left Handed Players Homerun Distribution",

ylab = "% of all Left-Handed Players", xlab = "# of HomeRuns")

barplot(table(bothHand$HR[-which(bothHand$HR == 0)])/length(bothHand$HR),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.08),

main = "Ambidextrous Handed Players Homerun Distribution",

ylab = "% of all Ambidextrous Players", xlab = "# of HomeRuns")

75

Page 77: A Report on Baseball Using R

5.1.11 Throwing Home Runs and Handedness

#Pitching Hand # of HR’s and/or Hits (Master # Pitching)

#Does it make a difference?

dataPitch <- dbGetQuery(db, "Select playerID,throws from MASTER")

#This data only has player ID and Batting hand.

#We need to cross reference this table with the HR’s list.

dataHR <- dbGetQuery(db, "Select playerID, HR from Pitching")

#Then we need to create a dataFrame that matches

#a unique player name, a handedness and the number of HR’s they have.

#We also need to remove NA’s

dataPitch<- dataPitch[-which(is.na(dataPitch$throws)),]

listNames <- list(unique(dataHR$playerID))

xFrame <- groupNameReturnHR(dataHR,listNames)

tablePitchingHR <- matchFrame(dataBats,xFrame)

tablePitchingHR <- tablePitchingHR[-1,]

Split them into 3 groups, RH, LH, BH

rightHand <- tablePitchingHR[which(tablePitchingHR$HAND == "R"),]

leftHand <- tablePitchingHR[which(tablePitchingHR$HAND == "L"),]

bothHand <- tablePitchingHR[which(tablePitchingHR$HAND == "B"),]

par(mfrow=c(3,1))

#We will plot %of players with that frequency.

barplot(table(rightHand$HR)/length(rightHand$playerID),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.2),

main = "Right Handed Pitchers Homerun Distribution",

ylab = "% of all Right-Handed Pitchers", xlab = "# of HomeRuns")

barplot(table(leftHand$HR)/length(leftHand$playerID),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.2),

main = "Left Handed Pitchers Homerun Distribution",

ylab = "% of all Left-Handed Pitchers", xlab = "# of HomeRuns")

barplot(table(bothHand$HR)/length(bothHand$playerID),

xlim = c(min(tableBattingHR$HR),(mean(tableBattingHR$HR))),

ylim = c(0,.2),

main = "Ambidextrous Handed Pitchers Homerun Distribution",

76

Page 78: A Report on Baseball Using R

ylab = "% of all Ambidextrous Pitchers", xlab = "# of HomeRuns")

5.1.12 Errors in the World Series

#Number of Errors in W-Series Games

par(mfrow=c(1,1))

data <- (dbGetQuery(db, "Select yearID,LgWin,WSWin,E from Teams"))

Winners <- subset(data, WSWin %in% "Y")

data <- (dbGetQuery(db, "Select yearID,LgWin,WSWin,E from Teams"))

dataSub <- subset(data, LgWin %in% "Y" )

Losers <- subset(dataSub, WSWin %in% "N")

plot(Winners$yearID,Winners$E,pch = 20, col = "black", cex = 1.5,

main = "# of Errors made by W-Series losers and winners from 1884 - 2013",

xlab = "Year", ylab = "# of Errors made")

lines(Losers$yearID,Losers$E, pch = 20, col = "red", cex = 1.5)

legend(x = 1960, y = 500, legend = c("W-Series Winners", "W-Series Losers"),

fill = c("black", "red"))

5.1.13 Hall of Fame and Salary

dataSalary <- dbGetQuery(db, "Select playerID,teamID,salary from Salaries")

dataFame <- dbGetQuery(db, "Select playerID,inducted from HallofFame")

#We need to get the players that are in the hall of Fame

fameID <- dataFame[which(dataFame$inducted == "Y"),]

fameID <- fameID[,-2] #Remove the inducted column, now we just have names.

#This Frame contains all salaries of HALL of Famers

fameFrame <- matchFrameFame(dataSalary,fameID)

#Let’s plot this.

par(mfrow=c(1,1))

plot(c(1:length(fameFrame$playerID)),fameFrame$salary,

main = "Salary of Hall of Famers",

xlab = "Hall of Famer #", ylab = "Salary", pch = 20, col = "red")

lines(c(0:length(fameFrame$playerID)),rep(mean(dataSalary$salary),

length(fameFrame$playerID)+1), cex = 5, lwd = 10)

legend(x = 2.5, y = 1500000,

77

Page 79: A Report on Baseball Using R

legend = c("Hall of Famers", "Overall Mean Salary"),

fill = c("red", "black"))

5.1.14 Positions and Salary

#Are certain positions payed more than others?

#This data only concerns people that played a SINGLE position in their career.

#Which positions are more lucrative?

dataPos <- dbGetQuery(db, "Select playerID, Pos from Fielding")

dataPos <- (unique(dataPos$POS))

dataSalary <- dbGetQuery(db, "Select playerID,salary from Salaries")

A particular group: Single Position Players

names <- list(unique(dataPos$playerID))

uniqueFrame <- uniquePositions(dataPos,names)

uniqueFrame <- uniqueFrame[-1,]

uniqueSalary <- matchUniqueFrame(dataSalary, uniqueFrame)

Salaries by Position

nonUniqueSalary <- matchNonUniqueFrame(dataSalary, dataPos)

plotPosSalaryAll(nonUniqueSalary)

5.1.15 All about the All Stars

ataStars <- dbGetQuery(db, "Select playerID from AllstarFULL")

dataStars <- unique(dataStars)

dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")

dataStars <- dataStars[which(dataStars$playerID %in% unique(dataSalary$playerID) == TRUE),]

dataSalaryStars <-

dataSalary[which((dataSalary$playerID %in% dataStars) == TRUE),]

The All Star Salary

istYear <- (min(dataSalary$yearID):max(dataSalary$yearID))

sortedFrame <- lapply(listYear, sortYearMean, data = dataSalaryStars)

meanSalaryAllStars <- meanFrame(sortedFrame)

78

Page 80: A Report on Baseball Using R

dataStars <- dbGetQuery(db, "Select playerID from AllstarFULL")

dataStars <- unique(dataStars)

dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")

dataStars <- dataStars[which(dataStars$playerID %in% unique(dataSalary$playerID) == TRUE),]

dataSalaryNonStars <-

dataSalary[-which((dataSalary$playerID %in% dataStars) == TRUE),]

listYear <- (min(dataSalary$yearID):max(dataSalary$yearID))

sortedFrame <- lapply(listYear, sortYearMean, data = dataSalaryNonStars)

meanSalaryNonStars <- meanFrame(sortedFrame)

dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")

listYear <- (min(dataSalary$yearID):max(dataSalary$yearID))

sortedFrame <- lapply(listYear, sortYearMean, data = dataSalary)

meanSalaryAll <- meanFrame(sortedFrame)

plot(meanSalaryAllStars$yearID,

meanSalaryAllStars$payRoll, type = "l", col = "Gold",

ylim = c(min(meanSalaryNonStars$payRoll),

max(meanSalaryAllStars$payRoll)),

main = "Mean Salary of Baseball Players by AllStars",

ylab = "Yearly Average Salary",

xlab = "Year")

lines(meanSalaryNonStars$yearID, meanSalaryNonStars$payRoll, col = "Blue")

lines(meanSalaryAll$yearID, meanSalaryAll$payRoll, col = "Black")

legend(x = 1985, y = 8000000, fill = c("Gold", "Black", "Blue"),

legend = c("Mean Salary: AllStar Players",

"Mean Salary: All Players", "Mean Salary: Non Allstars Players"))

Salary Disparities: Amongst All Stars and Non Stars

meanSalaryDisparity <- meanSalaryAllStars

meanSalaryDisparity$payRoll <-

(meanSalaryAllStars$payRoll - meanSalaryNonStars$payRoll)

plot(meanSalaryDisparity$yearID, meanSalaryDisparity$payRoll,

pch = 20, col = "Red",

main = "Growth of Salary Disparity: Allstars Vs. Nonstars",

ylab = "Salary Difference", xlab = "Year")

79

Page 81: A Report on Baseball Using R

lines(meanSalaryDisparity$yearID,

predict(loess(meanSalaryDisparity$payRoll~

meanSalaryDisparity$year)), col = "black", lwd = 2)

legend(x = 1985, y = 5000000, fill = c("Black", "Red"),

legend = c("Fitted Curve", "Observed Distances"))

Salary Disparities: Amongst All Stars

par(mfrow = c(1,1))

groupedSalary <- table(round(dataSalaryStars$salary/1000000))

barplot(groupedSalary, ylab = "Frequency",

xlab = "Salary by Millions",

main = "Distributions of Salaries for All Stars")

dataStars <- dbGetQuery(db, "Select playerID from AllstarFULL")

dataStars <- unique(dataStars)

dataSalary <- dbGetQuery(db, "Select playerID,yearID,salary from Salaries")

dataStars <- dataStars[which(dataStars$playerID %in% unique(dataSalary$playerID) == TRUE),]

dataSalaryStars <-

dataSalary[which((dataSalary$playerID %in% dataStars) == TRUE),]

dFrame <- data.frame(dataSalaryStars$yearID)

dFrame$Salary <- dataSalaryStars$salary

names(dFrame) <- list("yearID", "Salary")

listYear <- min(dFrame$yearID):max(dFrame$yearID)

sortedFrame <- lapply(listYear, sortYearMean, data = dFrame)

salaryFrame <- sortSalaryYearly(sortedFrame,listYear)

plot(salaryFrame$yearID,round(salaryFrame$Salary/1000000),

xlab = "Year", ylab = "Salary in Millions",

main = "Salary of All Stars in Millions")

lines(meanSalaryAllStars$yearID,

round(meanSalaryAllStars$payRoll/1000000),

pch = 20, col = "Red", lwd = 5,)

legend(x = 1985, y = 30, fill = c("Black", "Red"),

legend = c("Salary in Millions", "Mean Salary of All Stars"))

80