pm2 project basketball--final edit near

9
11/29/2016 PM2 Project BasketballFINAL EDIT NEAR http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learningmaster/PM2%20Project%20BasketballFINAL%20EDIT%20NEAR.ipynb?download=… 1/9 Auburn Basketball Here I try to predict Auburn basketball schedule results from the coming 20162017 season. In order to check the accuracy of the model I will compare predictions on the previous 20152016 year with their actual results I'll use a MLP neural network classifier, my inputs will be the past matches (replacing each team name with a lot of stats from both), and my output will be a number indicating the result (0 = tie, 1 = wins team1, 2 = wins team2). I'll be using pybrain for the classifier, pandas to hack my way through the data, and pygal for the graphs (far easier than matplotlib). And a lot of extra useful things implemented in the utils.py file, mostly to abstract the data processing I need before I feed the classifier. In [1]: %matplotlib inline import pandas as pd from IPython.display import SVG from utils import get_team_stats, extract_samples, normalize In [2]: # Used to avoid including tied matches. I found this greatly improves the accu racy. # In basketball there are no ties. If a game is tied at the end of the standar d game time, then the game goes into overtime. # Excluding ties ensures that there are no data entry errors since ties should not exist. exclude_ties = True # used to duplicate matches data, reversing the teams (team1‐>team2, and vicev ersa). # This helps on visualizations, and also improves precision of the predictions avoiding a dependence on the # order of the teams from the input. duplicate_with_reversed = True RAW_MATCHES_FILE = 'auburn_basketball_database_new1.csv' RAW_WINNERS_FILE = 'raw_winners.csv' TEAM_RENAMES_FILE = 'team_renames.csv' def show(graph): '''Small utility to display pygal graphs''' return SVG(graph.render())

Upload: daniel-morrison

Post on 13-Apr-2017

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 1/9

Auburn BasketballHere I try to predict Auburn basketball schedule results from the coming 2016­2017 season. In order to checkthe accuracy of the model I will compare predictions on the previous 2015­2016 year with their actual results

I'll use a MLP neural network classifier, my inputs will be the past matches (replacing each team name with alot of stats from both), and my output will be a number indicating the result (0 = tie, 1 = wins team1, 2 = winsteam2). I'll be using pybrain for the classifier, pandas to hack my way through the data, and pygal for thegraphs (far easier than matplotlib). And a lot of extra useful things implemented in the utils.py file, mostly toabstract the data processing I need before I feed the classifier.

In [1]: %matplotlib inline  import pandas as pd from IPython.display import SVG from utils import get_team_stats, extract_samples, normalize 

In [2]: # Used to avoid including tied matches. I found this greatly improves the accuracy. # In basketball there are no ties. If a game is tied at the end of the standard game time, then the game goes into overtime. # Excluding ties ensures that there are no data entry errors since ties should not exist. exclude_ties = True  # used to duplicate matches data, reversing the teams (team1‐>team2, and viceversa).  # This helps on visualizations, and also improves precision of the predictions avoiding a dependence on the # order of the teams from the input. duplicate_with_reversed = True  RAW_MATCHES_FILE = 'auburn_basketball_database_new1.csv' RAW_WINNERS_FILE = 'raw_winners.csv' TEAM_RENAMES_FILE = 'team_renames.csv'  def show(graph):     '''Small utility to display pygal graphs'''     return SVG(graph.render()) 

Page 2: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 2/9

In [3]: def get_matches(with_team_stats=False, duplicate_with_reversed=False,                 exclude_ties=False):     """Create a dataframe with matches info."""     matches = pd.DataFrame.from_csv(RAW_MATCHES_FILE)      #for column in ('team1', 'team2'):         #matches[column] = apply_renames(matches[column])      if duplicate_with_reversed:         id_offset = len(matches)          matches2 = matches.copy()         matches2.rename(columns={'team1': 'team2',                                  'team2': 'team1',                                  'score1': 'score2',                                  'score2': 'score1'},                         inplace=True)         matches2.index = matches2.index.map(lambda x: x + id_offset)          matches = pd.concat((matches, matches2))      def winner_from_score_diff(x):         if x > 0:             return 1         elif x < 0:             return 2         else:             return 0      matches['score_diff'] = matches['score1'] ‐ matches['score2']     matches['winner'] = matches['score_diff']     matches['winner'] = matches['winner'].map(winner_from_score_diff)      if exclude_ties:         matches = matches[matches['winner'] != 0]      if with_team_stats:         stats = get_team_stats(matches)          matches = matches.join(stats, on='team1')\                          .join(stats, on='team2', rsuffix='_2')      return matches 

In [4]: def apply_renames(column):     """Apply team renames to a team column from a dataframe."""     with open(TEAM_RENAMES_FILE) as renames_file:         renames = dict(l.strip().split(',')                        for l in renames_file.readlines()                        if l.strip())          def renamer(team):             return renames.get(team, team)      return column.map(renamer) 

Page 3: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 3/9

In [5]: matches = get_matches(with_team_stats=True,                       duplicate_with_reversed=duplicate_with_reversed,                       exclude_ties=exclude_ties) 

In [14]: #Some descriptive statistics 

In [6]: print(matches.head()) 

               date  score1  score2 team1 team2  winner  year  score_diff  \ Game_ID                                                                      1        11/18/2016      44      69  WSSU   EKY       2  2008         ‐25    2        12/10/2016      99     106   UWS   FAU       2  2009          ‐7    3        11/26/2016      44      62   SMU   TCU       2  2008         ‐18    4        11/13/2016      70      99   FIU  MONM       2  2009         ‐29    5        11/15/2016      79      99  WSSU   UCD       2  2009         ‐20              matches_played  matches_won  years_played  matches_won_percent  \ Game_ID                                                                    1                 108.0         28.0           4.0            25.925926    2                  12.0          0.0           5.0             0.000000    3                 386.0        190.0           7.0            49.222798    4                 374.0        144.0           7.0            38.502674    5                 108.0         28.0           4.0            25.925926              matches_played_2  matches_won_2  years_played_2  \ Game_ID                                                     1                   386.0          234.0             7.0    2                   378.0          152.0             7.0    3                   380.0          150.0             7.0    4                   374.0          124.0             7.0    5                   374.0          130.0             7.0              matches_won_percent_2   Game_ID                          1                    60.621762   2                    40.211640   3                    39.473684   4                    33.155080   5                    34.759358   

In [7]: team_stats = get_team_stats(matches) 

In [8]: print(team_stats.head()) 

      matches_played  matches_won  years_played  matches_won_percent team                                                                 ROC              2.0          0.0           1.0             0.000000 SIU            376.0        154.0           7.0            40.957447 PRIN           346.0        228.0           7.0            65.895954 OSU            436.0        340.0           7.0            77.981651 HEND             8.0          0.0           4.0             0.000000 

Page 4: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 4/9

In [9]: ### Split the data set for regression, Bernoulli, SVC import numpy as np from sklearn.metrics import roc_curve, auc import matplotlib.pyplot as plt from sklearn.cross_validation import train_test_split  input_features = ['year', 'matches_won_percent', 'matches_won_percent_2'] output_feature = ['winner']  inputs, outputs = extract_samples(matches,                                   input_features,                                   output_feature)  normalizer, inputs = normalize(inputs)                     X_train, X_test, y_train, y_test = train_test_split(matches[input_features], matches[output_feature], test_size = 0.2, random_state=12)  prediction = dict() from sklearn.naive_bayes import MultinomialNB modelM = MultinomialNB().fit(X_train, y_train) prediction['Multinomial'] = modelM.predict(X_test)  from sklearn.naive_bayes import BernoulliNB modelN = BernoulliNB().fit(X_train, y_train) prediction['Bernoulli'] = modelN.predict(X_test)  from sklearn import linear_model logreg = linear_model.LogisticRegression(C=1e5) logreg.fit(X_train, y_train) prediction['Logistic'] = logreg.predict(X_test)  from sklearn.svm import SVC svc = SVC(C= 1.0, kernel='linear') svc.fit(X_train, y_train) prediction['SVC'] = svc.predict(X_test) 

Page 5: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 5/9

C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\sklearn\utils\validation.py:515: DataConversionWarning: A column‐vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().   y = column_or_1d(y, warn=True) C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\sklearn\utils\validation.py:515: DataConversionWarning: A column‐vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().   y = column_or_1d(y, warn=True) C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\sklearn\utils\validation.py:515: DataConversionWarning: A column‐vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().   y = column_or_1d(y, warn=True) C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\sklearn\svm\base.py:514: DataConversionWarning: A column‐vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().   y_ = column_or_1d(y, warn=True) 

In [10]: y_test["winner"] = y_test["winner"]‐1 

In [11]: # Convert 1 to 0, 2 to 1 def formatt(x):     x = x‐1     return xvfunc = np.vectorize(formatt) 

Page 6: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 6/9

In [12]: cmp = 0 colors = ['b', 'g', 'y', 'm', 'k'] for model, predicted in prediction.items():     false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, predicted)     roc_auc = auc(false_positive_rate, true_positive_rate)    plt.plot(false_positive_rate, true_positive_rate, colors[cmp], label='%s: AUC %0.2f'% (model,roc_auc))     cmp += 1  plt.title('Classifiers comparison with ROC') plt.legend(loc='lower right') plt.plot([0,1],[0,1],'r‐‐') plt.xlim([‐0.1,1.2]) plt.ylim([‐0.1,1.2]) plt.ylabel('True Positive Rate') plt.xlabel('False Positive Rate') plt.show() 

Page 7: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 7/9

In [13]: # select only Auburn games and generate predictions matches_Auburn = matches[matches['team1'] == 'AUB']    # generate predictionsmatches_Auburn['Logistic'] = logreg.predict(matches_Auburn[input_features]) matches_Auburn['Multinomial'] = modelM.predict(matches_Auburn[input_features]) matches_Auburn['Bernoulli'] = modelN.predict(matches_Auburn[input_features]) matches_Auburn['SVC'] = svc.predict(matches_Auburn[input_features])   # print the results columnlist = ['year', 'team1', 'team2', 'winner', 'Logistic', 'Multinomial', 'Bernoulli', 'SVC']  print(matches_Auburn[columnlist].head(20)) 

Page 8: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 8/9

C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\ipykernel\__main__.py:6: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead  See the caveats in the documentation: http://pandas.pydata.org/pandas‐docs/stable/indexing.html#indexing‐view‐versus‐copy C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\ipykernel\__main__.py:7: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead  See the caveats in the documentation: http://pandas.pydata.org/pandas‐docs/stable/indexing.html#indexing‐view‐versus‐copy C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\ipykernel\__main__.py:8: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead  See the caveats in the documentation: http://pandas.pydata.org/pandas‐docs/stable/indexing.html#indexing‐view‐versus‐copy C:\Users\dsm0014\AppData\Local\Continuum\Anaconda2\lib\site‐packages\ipykernel\__main__.py:9: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead  See the caveats in the documentation: http://pandas.pydata.org/pandas‐docs/stable/indexing.html#indexing‐view‐versus‐copy 

         year team1 team2  winner  Logistic  Multinomial  Bernoulli  SVC Game_ID                                                                  911      2008   AUB   DAY       2         2            2          2    2 1202     2008   AUB   XAV       2         2            2          2    2 1847     2008   AUB   UVA       1         2            2          2    2 2784     2009   AUB  SCAR       2         2            2          2    2 3305     2009   AUB    UK       2         2            2          2    2 3385     2009   AUB   ARK       1         2            2          2    2 3967     2009   AUB  MISS       2         2            2          2    2 4653     2009   AUB   UGA       1         2            2          2    2 4841     2009   AUB   LSU       2         2            2          2    2 5141     2009   AUB  MSST       1         2            2          2    2 5286     2009   AUB   ALA       1         2            2          2    2 5687     2009   AUB  TENN       2         2            2          2    2 6038     2009   AUB  MOSU       2         2            2          2    2 6397     2009   AUB  NCST       2         2            2          2    2 6858     2009   AUB  AAMU       1         1            1          2    1 7480     2009   AUB   FSU       2         2            2          2    2 8666     2010   AUB  TENN       2         2            2          2    2 8976     2010   AUB   LSU       1         2            2          2    2 9079     2010   AUB   VAN       2         2            2          2    2 9749     2010   AUB   ARK       2         2            2          2    2 

Page 9: PM2 Project Basketball--FINAL EDIT NEAR

11/29/2016 PM2 Project Basketball­­FINAL EDIT NEAR

http://localhost:8888/nbconvert/html/PM_Notebooks/world_cup_learning­master/PM2%20Project%20Basketball­­FINAL%20EDIT%20NEAR.ipynb?download=… 9/9