- Executive Summary
- Methods and Results
- Appendix 1. Variable Definitions
- Appendix 2. Data Manipulation
- Appendix 3. Check Assumptions
- Appendix 4. Model 1: Logistic Model with Random Effects
- Appendix 5. Model 2: Multinomial Model - All possibilities
- Appendix 6. Model 3: Multinomial Model - Round Selection Given Already in Tournament
- Appendix 7. Model 4: Classification Tree - All possibilities
- Appendix 8. Model 5: Classification Tree - Round Selection Given Already in Tournament
- Appendix 9. Full Error Table (%)
- Appendix 10. 2020 March Madness Predictions
Executive Summary
Sadly, the 2020 NCAA Men’s Basketball tournament could not be held this year due to the coronavirus. I found data on the past four years of college basketball seasons and thought it would be interesting to see if I could accurately represent what could have been. Based on several variables in this dataset found in Appendix 1 and 2.1, is it possible to predict the teams that will make the tournament using machine learning techniques? Also, based on this information, can we give an estimate of the round this team will make it to? These are great questions to be answered using a variety of different predictive models.
First, we should examine the data (Appendix 2.1). There are several variables that could have a potentially high correlation. The variance inflation factors in Appendix 3.1 with extremely high correlation are effective field goal percentage of shots taken and allowed. This is nearly a direct calculation of other variables presented in the dataset. Another variable with a high inflation factor is the power rating. This rating is more or less a summary of a team based on several factors presented in the data already. The last two factors of potential concern are offensive and defensive efficiency. Model selection should be able to take care of this multicollinearity. We can check our final models with diagnostic plots. The last step is to view the correlation of our potential responses and regressors with a correlation matrix. My main variable of interest is post season wins. It seems there is high correlation between several variables and the response, Appendix 3.2. We can plot some of the variables with the highest correlation. The scatter plot in Appendix 3.3 shows the relationships between post season wins and the regressors. There does appear to be significant multicollinearity as we found before.
Before we begin setting up models, it is necessary to check if the regressors need transformations. The Box-Cox method provides significant evidence for transformations, Appendix 3.4. Testing the case with no transformations gives a p-value of less than 0.0001 so transformations are clearly needed, but even after testing the recommended transformations, the p-value was still below 0.0001. It seems another factor is at play such as the multicollinearity noted previously. We can again proceed with caution. Model selection and diagnostic checks should give us a better look at what is going on later. Besides, the recommended transformations hurt the interpretability of our results quite substantially. Since we want models that perform classification, we are unable to test the transformation of our response.
First, we build a model using all potentially useful variables, Appendix 4. We can check for outliers before we begin model selection. The halfnorm plot in Appendix 4.1 suggests observation 1329 is an outlier. Upon further inspection, this team made the tournament with a terrible record among other poor variables. We will continue the analysis with this team since removing this team is unjustified. By reviewing the summary of our first model, we see several variables are insignificant. We will test removing the variable with the highest p-value in the summary of the model recursively until the ANOVA test provides evidence to accept the full model. The method is not exact, but it gives us a good idea of the variables and models that are most significant. The variables to be kept in the model were power rating, turnover rate, and wins above the bubble to name a few. Even the random effects were not significant with this final model (Appendix 4.2). Next, diagnostic checks were completed once more (Appendix 4.3). Observation 1329 was still an outlier, but the jump in trend is relatively insignificant. There is still some pretty high VIF results, but they are much lower than before. It seems this model does not drastically break any assumptions. Last, we can test the model’s predictive ability. By looking at a ROC curve (Appendix 4.4), we find the best threshold for acceptance to balance the sensitivity and specificity appears to be 0.22. With this, our training error rate was 0.125. The testing error was 0.127 for 2019. Now, we can estimate the round at which a team will go out.
In this multinomial model, I will show a model that tries to predict all rounds (Appendix 5). Model selection was simple and chose a model with similar variables to the previous model except two point shooting percentage is now included in the final model. This model also allows for easy interpretation of variables. For example, power rating is still a big indictator for success in the tournament, but as a team progresses, other variables become more important. A look at the training error shows this model does better than the previous in predicting teams to make the tournament and even does a fair job in predicting the round a team will make it to. Another nice feature about this model is the ability to view important factors associated with each round of the tournament. The error is around 75%, but each team has 9 options to land on (Appendix 5.2). This model did predict the champion and second place winner all four years (Appendix 5.1), but that could indicate the model is over fit on the training set. This model performed much better than the binomial model at predicting a team to make it to the tournament, but it may be useful to look at a model trained on the teams already known to make it to the tournament and see where they are predicted to make it in the tournament.
By setting up a model given a team has already made it into the tournament (Appendix 6), a summary of the most significant model shows power rating (both previous models biggest determining factor) is no longer included. The current model shows the highest magnitude predictor is now turnover rate which is somewhat interesting since it did not have a huge effect in the previous models and one does not typically view turnover rate as the statistic that wins games. Now, we see the error in the training set is still high (Appendix 6.2), but it is drastically reduced from the previous approach. The test set error reiterates this point. The model was actually even able to predict the correct champion in the test set. The next step would be to try out this line of thinking with a random forest.
This research question seems like it would be best fit for a categorical tree. We lose the quantitative inference capability about specific regressors the last models gave, but the main point of this analysis is prediction. We will compare the performance of a bagging and a random forest approach (Appendix 7). The bagging approach produced lower out of bag error as well as lower testing error per round. The bagging approach even correctly predicted the champion. From looking at importance we see both models put strong emphasis on wins above the bubble. This is the first model to do so. We will continue using the bagging model and try out a model trained on teams already in the tournament like before.
We fit a model in the same manner as the second multinomial model. The testing error is worse than the multinomial model (Appendix 8). We see the importance of wins above the bubble dropped as power rating became important once more. It will not be necessary to combine models on making the tournament versus round performance for the final prediction of the 2020 season. The multinomial model that considered all teams at once performed the best overall. This is nice since the model leaves interpretability of regressors intact. The table in Appendix 9 gives a comparison of error among the models created.
Each model has it own strengths. The binomial model was by far the least useful model in terms of error. Every other model was able to predict teams making the tournament much better. It appears the bag model performs the best on test data. It was also great at predicting teams to make the tournament. The multinomial model came in at a close second, but test error plays a big role when prediction is desired. We can try predicting the NCAA tournament for 2020 using both models as a comparison.
The final prediction of the tournament in 2020, which never happened due to the coronavirus, is given in Appendix 10. In this prediction, we will put more weight in the bag model’s predictions due to the performance seen in the error table (Appendix 9). The summary of the predictions (Appendix 10) shows the bag model predicted 303 teams not making the tournament and 2 teams making it past the round of 32 and the multinomial model predicted 309 teams not making the tournament and 3 teams making it past the round of 32. We can view the teams the bag model predicted to make it to the Elite 8 which were Kansas and Gonzaga. The multinomial model also had Kansas and Gonzaga in the Elite 8 with Dayton coming out of nowhere and being the runner up in the tournament. The bag model predicted Dayton to go out in the round of 32. Last, but not least, we can see the comparison of teams in the Big Ten and their predicted round at which they lost in (Appendix 10.2). Indiana was predicted to go out in the first round in both models while, our rival, Purdue did not even make the tournament in either model.
Due to the error seen in the training and testing sets with these models, we cannot put much weight in their predictions, but the teams projected to make the tournament can almost be guaranteed. Every year sees drastic variability with the presence of “bracket busting” teams. It is difficult to identify winning teams without being able to compare specific matchups. This analysis gives a good measure of minimum performance based on overall team statistics alone. The main take away is the various impactful predictors an above average team possesses in order to make it late into the tournament such as power rating and wins above the bubble. The final prediction ability is less than desired, but this has been an interesting look on what could have been through machine learning.
Methods and Results
Appendix 1. Variable Definitions
Team Information
YEAR: Season
TEAM: The Division I college basketball school
CONF: The Athletic Conference in which the school participates in
A10 = Atlantic 10
ACC = Atlantic Coast Conference
AE = America East
Amer = American
ASun = ASUN
B10 = Big Ten
B12 = Big 12
BE = Big East
BSky = Big Sky
BSth = Big South
BW = Big West
CAA = Colonial Athletic Association
CUSA = Conference USA
Horz = Horizon League
IND = Independent schools
Ivy = Ivy League
MAAC = Metro Atlantic Athletic Conference
MAC = Mid-American Conference
MEAC = Mid-Eastern Athletic Conference
MVC = Missouri Valley Conference
MWC = Mountain West
NEC = Northeast Conference
OVC = Ohio Valley Conference
P12 = Pac-12
Pat = Patriot League
SB = Sun Belt
SC = Southern Conference
SEC = South Eastern Conference
Slnd = Southland Conference
Sum = Summit League
SWAC = Southwestern Athletic Conference
WAC = Western Athletic Conference
WCC = West Coast Conference
Tournament Information
SEED: Seed in the NCAA March Madness Tournament
TRNMT: Made tournament, yes or no
PS_WINS: Post season wins in NCAA tournament
POSTSEASON: Round where the given team was eliminated or where their season ended
R68 = First Four
R64 = Round of 64
R32 = Round of 32
S16 = Sweet Sixteen
E8 = Elite Eight
F4 = Final Four
2ND = Runner-up
Champions = Winner of the NCAA March Madness Tournament for that given year
Team Statistics
G: Number of games played in total
W: Number of games won in total
BARTHAG: Power Rating (Chance of beating an average Division I team)
WAB: Wins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)
Offensive Statistics
ADJOE: Adjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)
EFG_O: Effective Field Goal Percentage Shot
TOR: Turnover Percentage Allowed (Turnover Rate)
ORB: Offensive Rebound Percentage
FTR : Free Throw Rate (How often the given team shoots Free Throws)
TWO_P_O: Two-Point Shooting Percentage
THREE_P_O: Three-Point Shooting Percentage
ADJ_T: Adjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)
Defensive Statistics
ADJDE: Adjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)
EFG_D: Effective Field Goal Percentage Allowed
TORD: Turnover Percentage Committed (Steal Rate)
DRB: Defensive Rebound Percentage
FTRD: Free Throw Rate Allowed
TWO_P_D: Two-Point Shooting Percentage Allowed
THREE_P_D: Three-Point Shooting Percentage Allowed
Appendix 2. Data Manipulation
# Read Data into an Object
# https://www.kaggle.com/andrewsundberg/college-basketball-dataset/data
raw_data_15_19 = fread("cbb.csv")
raw_data_20 = fread("cbb20.csv")
# Combining Dataframes
raw_data_20 <- raw_data_20[,-c("RK")] #shows rank which isn't included in other years
raw_data_20 <- raw_data_20 %>%
mutate(POSTSEASON="No Tournament", #including arbitrary values so dataframes match columns
SEED=99,
YEAR=2020)
raw_data <- bind_rows(raw_data_15_19, raw_data_20)
# Remove unneeded dataframes
rm(raw_data_15_19)
rm(raw_data_20)
# View Data
summary(raw_data) #notice NAs
## TEAM CONF G W
## Length:2110 Length:2110 Min. :24.0 Min. : 0.00
## Class :character Class :character 1st Qu.:30.0 1st Qu.:12.00
## Mode :character Mode :character Median :31.0 Median :16.00
## Mean :31.3 Mean :16.48
## 3rd Qu.:33.0 3rd Qu.:21.00
## Max. :40.0 Max. :38.00
##
## ADJOE ADJDE BARTHAG EFG_O
## Min. : 76.7 Min. : 84.0 Min. :0.0077 Min. :39.30
## 1st Qu.: 98.4 1st Qu.: 98.6 1st Qu.:0.2833 1st Qu.:48.00
## Median :103.0 Median :103.3 Median :0.4746 Median :49.90
## Mean :103.3 Mean :103.3 Mean :0.4941 Mean :50.03
## 3rd Qu.:107.9 3rd Qu.:107.8 3rd Qu.:0.7111 3rd Qu.:52.00
## Max. :129.1 Max. :124.0 Max. :0.9842 Max. :59.80
##
## EFG_D TOR TORD ORB
## Min. :39.60 Min. :12.40 Min. :10.20 Min. :14.20
## 1st Qu.:48.30 1st Qu.:17.30 1st Qu.:17.10 1st Qu.:26.30
## Median :50.10 Median :18.60 Median :18.50 Median :29.10
## Mean :50.19 Mean :18.65 Mean :18.58 Mean :29.04
## 3rd Qu.:52.10 3rd Qu.:19.90 3rd Qu.:20.00 3rd Qu.:31.80
## Max. :59.50 Max. :26.60 Max. :28.00 Max. :42.10
##
## DRB FTR FTRD 2P_O
## Min. :18.40 Min. :21.60 Min. :19.70 Min. :37.70
## 1st Qu.:27.10 1st Qu.:31.30 1st Qu.:30.60 1st Qu.:46.90
## Median :29.20 Median :34.60 Median :34.30 Median :49.10
## Mean :29.22 Mean :34.69 Mean :34.94 Mean :49.19
## 3rd Qu.:31.30 3rd Qu.:38.00 3rd Qu.:38.80 3rd Qu.:51.40
## Max. :40.40 Max. :51.00 Max. :58.50 Max. :62.60
##
## 2P_D 3P_O 3P_D ADJ_T
## Min. :37.70 Min. :24.80 Min. :27.1 Min. :57.2
## 1st Qu.:47.20 1st Qu.:32.40 1st Qu.:32.9 1st Qu.:66.4
## Median :49.30 Median :34.30 Median :34.5 Median :68.5
## Mean :49.32 Mean :34.33 Mean :34.5 Mean :68.4
## 3rd Qu.:51.60 3rd Qu.:36.20 3rd Qu.:36.1 3rd Qu.:70.3
## Max. :61.20 Max. :44.10 Max. :43.1 Max. :83.4
##
## WAB POSTSEASON SEED YEAR
## Min. :-25.200 Length:2110 Min. : 1.00 Min. :2015
## 1st Qu.:-13.000 Class :character 1st Qu.: 9.00 1st Qu.:2016
## Median : -8.300 Mode :character Median :99.00 Median :2018
## Mean : -7.814 Mean :54.74 Mean :2018
## 3rd Qu.: -3.100 3rd Qu.:99.00 3rd Qu.:2019
## Max. : 13.100 Max. :99.00 Max. :2020
## NA's :1417
# Data Cleaning
raw_data$POSTSEASON[is.na(raw_data$POSTSEASON)] = "No Tournament" # removing NAs
raw_data$SEED[is.na(raw_data$SEED)] = 99
raw_data <- raw_data %>%
mutate(TWO_P_O = `2P_O`, # Not a good naming format for R
TWO_P_D = `2P_D`,
THREE_P_O = `3P_O`,
THREE_P_D = `3P_D`,
TRNMT = ifelse(POSTSEASON=="No Tournament", "No", "Yes")) %>%
select(everything(), -c(`2P_O`, `2P_D`, `3P_O`, `3P_D`))
raw_data$POSTSEASON <- factor(raw_data$POSTSEASON, order = TRUE, levels = c('No Tournament', 'R68', 'R64',
'R32', 'S16', 'E8', 'F4', '2ND',
'Champions'))
raw_data$PS_WINS = ifelse(as.numeric(raw_data$POSTSEASON) - 3 < 0, 0, as.numeric(raw_data$POSTSEASON) - 3)
# Changing Data Types
raw_data$CONF <- as.factor(raw_data$CONF)
raw_data$TRNMT <- as.factor(raw_data$TRNMT)
raw_data$YEAR <- as.factor(raw_data$YEAR) # make year start at zero
# Data Cleaning Done
clean_data <- raw_data
Appendix 2.1. View Data
# View Data Attributes
head(clean_data)
## TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR TORD ORB
## 1: North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 15.4 18.2 40.7
## 2: Wisconsin B10 40 36 129.1 93.6 0.9758 54.8 47.7 12.4 15.8 32.1
## 3: Michigan B10 40 33 114.4 90.4 0.9375 53.9 47.7 14.0 19.5 25.5
## 4: Texas Tech B12 38 31 115.2 85.2 0.9696 53.5 43.0 17.7 22.8 27.4
## 5: Gonzaga WCC 39 37 117.8 86.3 0.9728 56.6 41.1 16.2 17.1 30.0
## 6: Duke ACC 39 35 125.2 90.6 0.9764 56.6 46.5 16.3 18.6 35.8
## DRB FTR FTRD ADJ_T WAB POSTSEASON SEED YEAR TWO_P_O TWO_P_D THREE_P_O
## 1: 30.0 32.3 30.4 71.7 8.6 2ND 1 2016 53.9 44.6 32.7
## 2: 23.7 36.2 22.4 59.3 11.3 2ND 1 2015 54.8 44.7 36.5
## 3: 24.9 30.7 30.0 65.9 6.9 2ND 3 2018 54.7 46.8 35.2
## 4: 28.7 32.9 36.6 67.5 7.0 2ND 3 2019 52.8 41.9 36.5
## 5: 26.2 39.0 26.9 71.5 7.7 2ND 1 2017 56.3 40.0 38.2
## 6: 30.2 39.8 23.9 66.4 10.7 Champions 1 2015 55.9 46.3 38.7
## THREE_P_D TRNMT PS_WINS
## 1: 36.2 Yes 5
## 2: 37.5 Yes 5
## 3: 33.2 Yes 5
## 4: 29.7 Yes 5
## 5: 29.0 Yes 5
## 6: 31.4 Yes 6
summary(clean_data)
## TEAM CONF G W
## Length:2110 ACC : 90 Min. :24.0 Min. : 0.00
## Class :character A10 : 84 1st Qu.:30.0 1st Qu.:12.00
## Mode :character B10 : 84 Median :31.0 Median :16.00
## CUSA : 84 Mean :31.3 Mean :16.48
## SEC : 84 3rd Qu.:33.0 3rd Qu.:21.00
## Slnd : 78 Max. :40.0 Max. :38.00
## (Other):1606
## ADJOE ADJDE BARTHAG EFG_O
## Min. : 76.7 Min. : 84.0 Min. :0.0077 Min. :39.30
## 1st Qu.: 98.4 1st Qu.: 98.6 1st Qu.:0.2833 1st Qu.:48.00
## Median :103.0 Median :103.3 Median :0.4746 Median :49.90
## Mean :103.3 Mean :103.3 Mean :0.4941 Mean :50.03
## 3rd Qu.:107.9 3rd Qu.:107.8 3rd Qu.:0.7111 3rd Qu.:52.00
## Max. :129.1 Max. :124.0 Max. :0.9842 Max. :59.80
##
## EFG_D TOR TORD ORB
## Min. :39.60 Min. :12.40 Min. :10.20 Min. :14.20
## 1st Qu.:48.30 1st Qu.:17.30 1st Qu.:17.10 1st Qu.:26.30
## Median :50.10 Median :18.60 Median :18.50 Median :29.10
## Mean :50.19 Mean :18.65 Mean :18.58 Mean :29.04
## 3rd Qu.:52.10 3rd Qu.:19.90 3rd Qu.:20.00 3rd Qu.:31.80
## Max. :59.50 Max. :26.60 Max. :28.00 Max. :42.10
##
## DRB FTR FTRD ADJ_T
## Min. :18.40 Min. :21.60 Min. :19.70 Min. :57.2
## 1st Qu.:27.10 1st Qu.:31.30 1st Qu.:30.60 1st Qu.:66.4
## Median :29.20 Median :34.60 Median :34.30 Median :68.5
## Mean :29.22 Mean :34.69 Mean :34.94 Mean :68.4
## 3rd Qu.:31.30 3rd Qu.:38.00 3rd Qu.:38.80 3rd Qu.:70.3
## Max. :40.40 Max. :51.00 Max. :58.50 Max. :83.4
##
## WAB POSTSEASON SEED YEAR
## Min. :-25.200 No Tournament:1770 Min. : 1.00 2015:351
## 1st Qu.:-13.000 R64 : 160 1st Qu.:99.00 2016:351
## Median : -8.300 R32 : 80 Median :99.00 2017:351
## Mean : -7.814 S16 : 40 Mean :84.46 2018:351
## 3rd Qu.: -3.100 R68 : 20 3rd Qu.:99.00 2019:353
## Max. : 13.100 E8 : 20 Max. :99.00 2020:353
## (Other) : 20
## TWO_P_O TWO_P_D THREE_P_O THREE_P_D TRNMT
## Min. :37.70 Min. :37.70 Min. :24.80 Min. :27.1 No :1770
## 1st Qu.:46.90 1st Qu.:47.20 1st Qu.:32.40 1st Qu.:32.9 Yes: 340
## Median :49.10 Median :49.30 Median :34.30 Median :34.5
## Mean :49.19 Mean :49.32 Mean :34.33 Mean :34.5
## 3rd Qu.:51.40 3rd Qu.:51.60 3rd Qu.:36.20 3rd Qu.:36.1
## Max. :62.60 Max. :61.20 Max. :44.10 Max. :43.1
##
## PS_WINS
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1493
## 3rd Qu.:0.0000
## Max. :6.0000
##
# Separate data for train and test sets
train_data <- clean_data[which(clean_data$YEAR==2015|clean_data$YEAR==2016|clean_data$YEAR==2017|clean_data$YEAR==2018), ]
test_data_19 <- clean_data[which(clean_data$YEAR==2019), ]
test_data_20 <- clean_data[which(clean_data$YEAR==2020), ]
rm(raw_data, clean_data)
Appendix 3. Check Assumptions
Appendix 3.1. Variance Inflation Factor
# Check multicollinearity
sort(faraway::vif(train_data[,-c(1:4, 18:20, 25:26)])) # removing factors and potential responses
## ADJ_T FTR FTRD DRB ORB TORD TOR
## 1.236197 1.299749 1.740173 2.191951 2.946805 3.098201 3.105086
## WAB ADJDE ADJOE THREE_P_O BARTHAG THREE_P_D TWO_P_O
## 13.243053 22.690760 28.540690 33.348308 36.511312 43.064142 69.770015
## TWO_P_D EFG_O EFG_D
## 121.052453 152.321249 224.862661
Appendix 3.2. Correlation Matrix
# correlation matrix
round(cor(train_data[,-c(1:4, 18:20, 25)])[,c(18)], 4)
## ADJOE ADJDE BARTHAG EFG_O EFG_D TOR TORD ORB
## 0.4683 -0.4000 0.4270 0.2836 -0.2782 -0.2472 0.0534 0.1817
## DRB FTR FTRD ADJ_T WAB TWO_P_O TWO_P_D THREE_P_O
## -0.0908 0.0108 -0.2017 -0.0362 0.4950 0.2764 -0.2553 0.1983
## THREE_P_D PS_WINS
## -0.1937 1.0000
Appendix 3.3. Scatter Plot Matrix
# Data Viz
car::scatterplotMatrix(~PS_WINS + ADJOE + ADJDE + BARTHAG + WAB +
TWO_P_D + TWO_P_O, train_data, plot.points = FALSE)
Appendix 3.4. Transformations
The scatter plot in Appendix 3.3 shows the relationships between post season wins and the regressors. There does appear to be pretty significant multicollinearity as we found before.
# Transformations
summary(bc_x <- powerTransform(cbind(ADJOE, ADJDE, BARTHAG, EFG_O, EFG_D,
TOR, TORD, ORB, DRB, FTR, FTRD, ADJ_T,
TWO_P_O, TWO_P_D, THREE_P_O, THREE_P_D
) ~ 1, train_data))
## bcPower Transformations to Multinormality
## Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## ADJOE -0.9338 -1.00 -1.1530 -0.7146
## ADJDE 1.6317 1.63 1.3267 1.9367
## BARTHAG 0.7129 0.71 0.6719 0.7539
## EFG_O 0.2583 0.33 0.0695 0.4471
## EFG_D 0.8537 1.00 0.6601 1.0473
## TOR 0.8215 1.00 0.5086 1.1344
## TORD 0.5926 0.50 0.3252 0.8601
## ORB 1.3590 1.36 1.1254 1.5927
## DRB 1.0166 1.00 0.6691 1.3641
## FTR 0.5851 0.50 0.2638 0.9065
## FTRD 0.1689 0.00 -0.0738 0.4116
## ADJ_T 0.7688 1.00 0.0217 1.5158
## TWO_P_O -0.1324 0.00 -0.3323 0.0675
## TWO_P_D 0.6877 0.50 0.4879 0.8876
## THREE_P_O 0.9606 1.00 0.7557 1.1654
## THREE_P_D 1.0243 1.00 0.8111 1.2374
##
## Likelihood ratio test that transformation parameters are equal to 0
## (all log transformations)
## LRT df pval
## LR test, lambda = (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 2724.65 16 < 2.22e-16
##
## Likelihood ratio test that no transformations are needed
## LRT df pval
## LR test, lambda = (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1) 532.7003 16 < 2.22e-16
testTransform(bc_x, c(-1, 1.63, 0.71, 0.33, 1, 1, 0.5, 1.36, 1, 0.50, 0, 1, 0, 0.5, 1, 1))
## LRT
## LR test, lambda = (-1 1.63 0.71 0.33 1 1 0.5 1.36 1 0.5 0 1 0 0.5 1 1) 51.98504
## df
## LR test, lambda = (-1 1.63 0.71 0.33 1 1 0.5 1.36 1 0.5 0 1 0 0.5 1 1) 16
## pval
## LR test, lambda = (-1 1.63 0.71 0.33 1 1 0.5 1.36 1 0.5 0 1 0 0.5 1 1) 1.1015e-05
Appendix 4. Model 1: Logistic Model with Random Effects
model_binom <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +
TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T +
TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
Appendix 4.1. Diagnostic Check
halfnorm(resid(model_binom, type="pearson"))
train_data[1329,]
## TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR TORD ORB DRB
## 1: Holy Cross Pat 35 15 96.7 106.9 0.2398 47.9 53.2 16.8 19.6 23.1 29.6
## FTR FTRD ADJ_T WAB POSTSEASON SEED YEAR TWO_P_O TWO_P_D THREE_P_O
## 1: 36.1 33.4 64.6 -14.5 R64 16 2016 47.2 52.8 32.6
## THREE_P_D TRNMT PS_WINS
## 1: 35.7 Yes 0
summary(model_binom)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## THREE_P_D + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 613.1 712.8 -287.5 575.1 1385
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -11.7565 -0.2281 -0.0917 -0.0114 13.7424
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.641462 7.079618 2.633 0.00846 **
## ADJOE 0.524289 0.131314 3.993 6.53e-05 ***
## ADJDE -0.643822 0.147179 -4.374 1.22e-05 ***
## BARTHAG -31.242596 5.566575 -5.613 1.99e-08 ***
## EFG_O 0.459814 0.405906 1.133 0.25729
## EFG_D -0.910582 0.612143 -1.488 0.13687
## TOR -0.178962 0.105989 -1.688 0.09132 .
## TORD 0.149197 0.096556 1.545 0.12230
## ORB 0.035003 0.046874 0.747 0.45522
## DRB 0.009714 0.054970 0.177 0.85973
## FTR 0.059434 0.027141 2.190 0.02853 *
## FTRD -0.023435 0.024953 -0.939 0.34766
## ADJ_T 0.027996 0.036494 0.767 0.44300
## TWO_P_O -0.145603 0.253550 -0.574 0.56579
## TWO_P_D 0.535918 0.391771 1.368 0.17133
## THREE_P_O -0.118350 0.221847 -0.533 0.59371
## THREE_P_D 0.404400 0.319733 1.265 0.20594
## WAB 0.554127 0.067668 8.189 2.64e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 18 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Appendix 4.2. Model Selection
# Fixed Effects Test
model_binom1 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove DRB
TOR + TORD + ORB + FTR + FTRD + ADJ_T +
TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom, model_binom1) # p-value: 0.8595
## Data: train_data
## Models:
## model_binom1: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom1: ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## model_binom1: THREE_P_D + WAB + (1 | CONF)
## model_binom: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom: ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## model_binom: THREE_P_D + WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom1 18 611.09 705.53 -287.54 575.09
## model_binom 19 613.06 712.75 -287.53 575.06 0.0313 1 0.8595
summary(model_binom1)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## THREE_P_D + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 611.1 705.5 -287.5 575.1 1386
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -11.6255 -0.2299 -0.0912 -0.0113 13.7483
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 4.986e-16 2.233e-08
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.34933 6.86945 2.671 0.00756 **
## ADJOE 0.53131 0.12560 4.230 2.34e-05 ***
## ADJDE -0.63718 0.14203 -4.486 7.25e-06 ***
## BARTHAG -31.27090 5.56993 -5.614 1.97e-08 ***
## EFG_O 0.45365 0.40445 1.122 0.26201
## EFG_D -0.93198 0.59894 -1.556 0.11970
## TOR -0.17176 0.09768 -1.758 0.07867 .
## TORD 0.16007 0.07431 2.154 0.03123 *
## ORB 0.03182 0.04323 0.736 0.46158
## FTR 0.05903 0.02705 2.182 0.02909 *
## FTRD -0.02460 0.02406 -1.022 0.30665
## ADJ_T 0.02752 0.03641 0.756 0.44969
## TWO_P_O -0.14786 0.25327 -0.584 0.55936
## TWO_P_D 0.54361 0.38887 1.398 0.16213
## THREE_P_O -0.12083 0.22149 -0.546 0.58538
## THREE_P_D 0.41092 0.31715 1.296 0.19509
## WAB 0.55395 0.06763 8.191 2.59e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 17 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom2 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove THREE_P_O
TOR + TORD + ORB + FTR + FTRD + ADJ_T +
TWO_P_O + TWO_P_D + THREE_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom1, model_binom2) # p-value: 0.5832
## Data: train_data
## Models:
## model_binom2: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom2: ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_D +
## model_binom2: WAB + (1 | CONF)
## model_binom1: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom1: ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## model_binom1: THREE_P_D + WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom2 17 609.39 698.59 -287.69 575.39
## model_binom1 18 611.09 705.53 -287.54 575.09 0.3011 1 0.5832
summary(model_binom2)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_D +
## WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 609.4 698.6 -287.7 575.4 1387
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -11.5355 -0.2297 -0.0907 -0.0112 13.8210
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.21711 6.87400 2.650 0.00805 **
## ADJOE 0.53349 0.12584 4.239 2.24e-05 ***
## ADJDE -0.63791 0.14241 -4.480 7.48e-06 ***
## BARTHAG -31.29282 5.58442 -5.604 2.10e-08 ***
## EFG_O 0.24236 0.11515 2.105 0.03531 *
## EFG_D -0.93488 0.59985 -1.559 0.11911
## TOR -0.17005 0.09761 -1.742 0.08148 .
## TORD 0.16104 0.07417 2.171 0.02991 *
## ORB 0.02927 0.04298 0.681 0.49582
## FTR 0.05893 0.02703 2.180 0.02922 *
## FTRD -0.02491 0.02403 -1.037 0.29992
## ADJ_T 0.02777 0.03633 0.764 0.44460
## TWO_P_O -0.01579 0.07354 -0.215 0.82994
## TWO_P_D 0.54526 0.38953 1.400 0.16158
## THREE_P_D 0.41070 0.31764 1.293 0.19602
## WAB 0.55209 0.06744 8.186 2.70e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom3 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove TWO_P_O
TOR + TORD + ORB + FTR + FTRD + ADJ_T +
TWO_P_D + THREE_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom2, model_binom3) # p-value: 0.8298
## Data: train_data
## Models:
## model_binom3: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom3: ORB + FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 |
## model_binom3: CONF)
## model_binom2: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom2: ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_D +
## model_binom2: WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom3 16 607.43 691.39 -287.72 575.43
## model_binom2 17 609.39 698.59 -287.69 575.39 0.0462 1 0.8298
summary(model_binom3)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## ORB + FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 607.4 691.4 -287.7 575.4 1388
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -11.6171 -0.2299 -0.0913 -0.0113 13.7168
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.30302 6.86095 2.668 0.00764 **
## ADJOE 0.53646 0.12508 4.289 1.80e-05 ***
## ADJDE -0.63877 0.14238 -4.486 7.24e-06 ***
## BARTHAG -31.37965 5.57202 -5.632 1.78e-08 ***
## EFG_O 0.22495 0.08174 2.752 0.00592 **
## EFG_D -0.92798 0.59859 -1.550 0.12108
## TOR -0.17034 0.09759 -1.745 0.08090 .
## TORD 0.15949 0.07378 2.162 0.03065 *
## ORB 0.02819 0.04266 0.661 0.50866
## FTR 0.05841 0.02692 2.170 0.02998 *
## FTRD -0.02439 0.02391 -1.020 0.30783
## ADJ_T 0.02634 0.03569 0.738 0.46052
## TWO_P_D 0.54016 0.38852 1.390 0.16443
## THREE_P_D 0.40812 0.31727 1.286 0.19833
## WAB 0.55267 0.06740 8.199 2.42e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom4 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove ORB
TOR + TORD + FTR + FTRD + ADJ_T +
TWO_P_D + THREE_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom3, model_binom4) # p-value: 0.5077
## Data: train_data
## Models:
## model_binom4: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom4: FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## model_binom3: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom3: ORB + FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 |
## model_binom3: CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom4 15 605.87 684.58 -287.94 575.87
## model_binom3 16 607.43 691.39 -287.72 575.43 0.4387 1 0.5077
summary(model_binom4)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 605.9 684.6 -287.9 575.9 1389
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -10.6103 -0.2327 -0.0904 -0.0114 13.6077
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 17.75005 6.77934 2.618 0.00884 **
## ADJOE 0.55128 0.12292 4.485 7.29e-06 ***
## ADJDE -0.62165 0.13920 -4.466 7.98e-06 ***
## BARTHAG -31.29170 5.54852 -5.640 1.70e-08 ***
## EFG_O 0.19019 0.06218 3.059 0.00222 **
## EFG_D -0.95654 0.59667 -1.603 0.10890
## TOR -0.13285 0.07933 -1.675 0.09401 .
## TORD 0.17140 0.07141 2.400 0.01639 *
## FTR 0.05566 0.02661 2.092 0.03648 *
## FTRD -0.02430 0.02389 -1.017 0.30895
## ADJ_T 0.02573 0.03559 0.723 0.46979
## TWO_P_D 0.54660 0.38808 1.408 0.15899
## THREE_P_D 0.41365 0.31679 1.306 0.19164
## WAB 0.56268 0.06577 8.555 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom5 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove ADJ_T
TOR + TORD + FTR + FTRD +
TWO_P_D + THREE_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom4, model_binom5) # p-value: 0.4698
## Data: train_data
## Models:
## model_binom5: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom5: FTR + FTRD + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## model_binom4: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom4: FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom5 14 604.40 677.85 -288.20 576.40
## model_binom4 15 605.87 684.58 -287.94 575.87 0.5223 1 0.4698
summary(model_binom5)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## FTR + FTRD + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 604.4 677.9 -288.2 576.4 1390
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -10.3533 -0.2309 -0.0920 -0.0118 12.9150
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.42403 6.72383 2.740 0.00614 **
## ADJOE 0.54672 0.12234 4.469 7.87e-06 ***
## ADJDE -0.61378 0.13830 -4.438 9.08e-06 ***
## BARTHAG -31.03592 5.51870 -5.624 1.87e-08 ***
## EFG_O 0.19197 0.06204 3.094 0.00197 **
## EFG_D -0.92037 0.59287 -1.552 0.12056
## TOR -0.13512 0.07907 -1.709 0.08749 .
## TORD 0.16674 0.07091 2.351 0.01871 *
## FTR 0.05776 0.02643 2.186 0.02884 *
## FTRD -0.02206 0.02364 -0.933 0.35092
## TWO_P_D 0.53090 0.38631 1.374 0.16936
## THREE_P_D 0.39836 0.31527 1.264 0.20638
## WAB 0.56444 0.06571 8.590 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom6 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove FTRD
TOR + TORD + FTR +
TWO_P_D + THREE_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom5, model_binom6) # p-value: 0.3498
## Data: train_data
## Models:
## model_binom6: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom6: FTR + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## model_binom5: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom5: FTR + FTRD + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom6 13 603.27 671.48 -288.63 577.27
## model_binom5 14 604.40 677.85 -288.20 576.40 0.8743 1 0.3498
summary(model_binom6)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## FTR + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 603.3 671.5 -288.6 577.3 1391
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -11.0055 -0.2310 -0.0925 -0.0118 13.3267
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 1.813e-15 4.258e-08
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.86651 6.71840 2.808 0.004982 **
## ADJOE 0.53968 0.12274 4.397 1.10e-05 ***
## ADJDE -0.62169 0.13876 -4.480 7.45e-06 ***
## BARTHAG -31.02780 5.54089 -5.600 2.15e-08 ***
## EFG_O 0.20306 0.06071 3.345 0.000824 ***
## EFG_D -0.85127 0.58762 -1.449 0.147425
## TOR -0.14438 0.07863 -1.836 0.066350 .
## TORD 0.13618 0.06285 2.167 0.030256 *
## FTR 0.05514 0.02624 2.102 0.035573 *
## TWO_P_D 0.49529 0.38413 1.289 0.197265
## THREE_P_D 0.36740 0.31325 1.173 0.240851
## WAB 0.57073 0.06531 8.739 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ADJOE ADJDE BARTHA EFG_O EFG_D TOR TORD FTR TWO_P_
## ADJOE 0.422
## ADJDE -0.702 -0.877
## BARTHAG -0.620 -0.938 0.942
## EFG_O 0.215 0.047 -0.314 -0.202
## EFG_D 0.123 0.029 -0.074 -0.048 0.013
## TOR -0.487 -0.064 0.248 0.166 -0.328 -0.055
## TORD -0.406 0.119 0.114 -0.007 -0.050 -0.189 0.053
## FTR 0.057 0.133 -0.232 -0.155 0.302 0.120 -0.361 -0.087
## TWO_P_D -0.108 -0.049 0.056 0.055 -0.014 -0.990 0.057 0.130 -0.103
## THREE_P_D -0.125 -0.060 0.066 0.060 0.001 -0.981 0.048 0.151 -0.114 0.976
## WAB 0.105 -0.093 0.063 -0.122 -0.093 -0.025 0.183 -0.042 -0.163 0.040
## THREE_
## ADJOE
## ADJDE
## BARTHAG
## EFG_O
## EFG_D
## TOR
## TORD
## FTR
## TWO_P_D
## THREE_P_D
## WAB 0.050
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom7 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove THREE_P_D
TOR + TORD + FTR +
TWO_P_D +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom6, model_binom7) # p-value: 0.2336
## Data: train_data
## Models:
## model_binom7: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom7: FTR + TWO_P_D + WAB + (1 | CONF)
## model_binom6: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom6: FTR + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom7 12 602.69 665.65 -289.34 578.69
## model_binom6 13 603.27 671.48 -288.63 577.27 1.4189 1 0.2336
summary(model_binom7)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## FTR + TWO_P_D + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 602.7 665.7 -289.3 578.7 1392
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -11.1463 -0.2323 -0.0930 -0.0119 12.7045
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 19.91828 6.64933 2.996 0.00274 **
## ADJOE 0.54984 0.12292 4.473 7.70e-06 ***
## ADJDE -0.63418 0.13865 -4.574 4.79e-06 ***
## BARTHAG -31.50604 5.54307 -5.684 1.32e-08 ***
## EFG_O 0.20360 0.06060 3.360 0.00078 ***
## EFG_D -0.17737 0.11311 -1.568 0.11684
## TOR -0.14911 0.07833 -1.904 0.05697 .
## TORD 0.12484 0.06172 2.023 0.04310 *
## FTR 0.05872 0.02608 2.251 0.02437 *
## TWO_P_D 0.05760 0.08315 0.693 0.48848
## WAB 0.56843 0.06538 8.694 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ADJOE ADJDE BARTHA EFG_O EFG_D TOR TORD FTR TWO_P_
## ADJOE 0.418
## ADJDE -0.699 -0.877
## BARTHAG -0.617 -0.938 0.942
## EFG_O 0.217 0.049 -0.315 -0.203
## EFG_D 0.005 -0.157 -0.045 0.059 0.074
## TOR -0.486 -0.063 0.246 0.163 -0.327 -0.039
## TORD -0.391 0.135 0.100 -0.022 -0.051 -0.218 0.048
## FTR 0.040 0.127 -0.225 -0.148 0.305 0.045 -0.359 -0.067
## TWO_P_D 0.061 0.047 -0.038 -0.019 -0.071 -0.760 0.049 -0.075 0.032
## WAB 0.115 -0.089 0.058 -0.127 -0.091 0.122 0.181 -0.053 -0.162 -0.042
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom8 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + # remove TWO_P_D
TOR + TORD + FTR +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom7, model_binom8) # p-value: 0.488
## Data: train_data
## Models:
## model_binom8: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom8: FTR + WAB + (1 | CONF)
## model_binom7: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom7: FTR + TWO_P_D + WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom8 11 601.17 658.89 -289.58 579.17
## model_binom7 12 602.69 665.65 -289.34 578.69 0.4809 1 0.488
summary(model_binom8)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## FTR + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 601.2 658.9 -289.6 579.2 1393
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -11.3517 -0.2339 -0.0917 -0.0122 12.8717
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 19.66107 6.63094 2.965 0.00303 **
## ADJOE 0.54657 0.12279 4.451 8.54e-06 ***
## ADJDE -0.63132 0.13845 -4.560 5.12e-06 ***
## BARTHAG -31.46860 5.54107 -5.679 1.35e-08 ***
## EFG_O 0.20675 0.06041 3.423 0.00062 ***
## EFG_D -0.11798 0.07351 -1.605 0.10849
## TOR -0.15195 0.07828 -1.941 0.05224 .
## TORD 0.12805 0.06148 2.083 0.03725 *
## FTR 0.05821 0.02604 2.235 0.02542 *
## WAB 0.57075 0.06525 8.747 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ADJOE ADJDE BARTHA EFG_O EFG_D TOR TORD FTR
## ADJOE 0.416
## ADJDE -0.698 -0.877
## BARTHAG -0.616 -0.939 0.942
## EFG_O 0.220 0.050 -0.317 -0.202
## EFG_D 0.076 -0.187 -0.113 0.069 0.033
## TOR -0.490 -0.062 0.245 0.161 -0.324 -0.001
## TORD -0.386 0.142 0.094 -0.027 -0.057 -0.424 0.053
## FTR 0.034 0.123 -0.222 -0.144 0.309 0.108 -0.358 -0.064
## WAB 0.116 -0.085 0.056 -0.129 -0.094 0.138 0.186 -0.055 -0.162
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom9 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + # remove EFG_D
TOR + TORD + FTR +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom8, model_binom9) # p-value: 0.108
## Data: train_data
## Models:
## model_binom9: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + TORD + FTR +
## model_binom9: WAB + (1 | CONF)
## model_binom8: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom8: FTR + WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom9 10 601.75 654.22 -290.88 581.75
## model_binom8 11 601.17 658.89 -289.58 579.17 2.5831 1 0.108
summary(model_binom9)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + TORD + FTR +
## WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 601.8 654.2 -290.9 581.8 1394
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -10.1205 -0.2348 -0.0951 -0.0122 11.4464
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 1.23e-15 3.506e-08
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 20.55090 6.54679 3.139 0.001695 **
## ADJOE 0.51194 0.11966 4.278 1.88e-05 ***
## ADJDE -0.65937 0.13591 -4.852 1.22e-06 ***
## BARTHAG -30.98621 5.47740 -5.657 1.54e-08 ***
## EFG_O 0.21105 0.06016 3.508 0.000451 ***
## TOR -0.15231 0.07781 -1.957 0.050302 .
## TORD 0.08614 0.05538 1.555 0.119853
## FTR 0.06296 0.02584 2.436 0.014846 *
## WAB 0.58789 0.06464 9.095 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ADJOE ADJDE BARTHA EFG_O TOR TORD FTR
## ADJOE 0.430
## ADJDE -0.689 -0.919
## BARTHAG -0.618 -0.944 0.957
## EFG_O 0.212 0.050 -0.309 -0.196
## TOR -0.487 -0.053 0.237 0.151 -0.320
## TORD -0.391 0.075 0.049 0.001 -0.053 0.056
## FTR 0.025 0.145 -0.212 -0.151 0.310 -0.360 -0.021
## WAB 0.113 -0.055 0.069 -0.148 -0.107 0.188 0.003 -0.183
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom10 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + # remove TORD
TOR + FTR +
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom9, model_binom10) # p-value: 0.1183
## Data: train_data
## Models:
## model_binom10: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + FTR + WAB + (1 |
## model_binom10: CONF)
## model_binom9: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + TORD + FTR +
## model_binom9: WAB + (1 | CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom10 9 602.19 649.42 -292.10 584.19
## model_binom9 10 601.75 654.22 -290.88 581.75 2.4394 1 0.1183
summary(model_binom10)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + FTR + WAB + (1 |
## CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 602.2 649.4 -292.1 584.2 1395
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.2942 -0.2370 -0.0964 -0.0121 11.5546
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 24.63257 6.00029 4.105 4.04e-05 ***
## ADJOE 0.50122 0.11906 4.210 2.55e-05 ***
## ADJDE -0.67338 0.13587 -4.956 7.20e-07 ***
## BARTHAG -31.15990 5.46869 -5.698 1.21e-08 ***
## EFG_O 0.21664 0.06000 3.611 0.000305 ***
## TOR -0.15942 0.07784 -2.048 0.040540 *
## FTR 0.06424 0.02585 2.485 0.012964 *
## WAB 0.59009 0.06424 9.186 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ADJOE ADJDE BARTHA EFG_O TOR FTR
## ADJOE 0.502
## ADJDE -0.731 -0.926
## BARTHAG -0.673 -0.947 0.958
## EFG_O 0.224 0.063 -0.319 -0.208
## TOR -0.503 -0.056 0.234 0.148 -0.324
## FTR 0.025 0.150 -0.215 -0.154 0.308 -0.364
## WAB 0.115 -0.060 0.074 -0.142 -0.104 0.192 -0.186
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom11 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + FTR + # remove TOR
WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom10, model_binom11) # p-value: 0.03884 so do not remove TOR
## Data: train_data
## Models:
## model_binom11: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + FTR + WAB + (1 | CONF)
## model_binom10: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + FTR + WAB + (1 |
## model_binom10: CONF)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_binom11 8 604.46 646.44 -294.23 588.46
## model_binom10 9 602.19 649.42 -292.10 584.19 4.2679 1 0.03884 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_binom11)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + FTR + WAB + (1 | CONF)
## Data: train_data
##
## AIC BIC logLik deviance df.resid
## 604.5 646.4 -294.2 588.5 1396
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -8.6476 -0.2347 -0.0981 -0.0134 14.6925
##
## Random effects:
## Groups Name Variance Std.Dev.
## CONF (Intercept) 0 0
## Number of obs: 1404, groups: CONF, 33
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 18.63127 5.13189 3.630 0.000283 ***
## ADJOE 0.49358 0.11700 4.219 2.46e-05 ***
## ADJDE -0.61506 0.12963 -4.745 2.09e-06 ***
## BARTHAG -29.79642 5.32059 -5.600 2.14e-08 ***
## EFG_O 0.17765 0.05667 3.135 0.001720 **
## FTR 0.04517 0.02392 1.888 0.058970 .
## WAB 0.61936 0.06310 9.815 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ADJOE ADJDE BARTHA EFG_O FTR
## ADJOE 0.537
## ADJDE -0.723 -0.939
## BARTHAG -0.692 -0.949 0.960
## EFG_O 0.068 0.040 -0.260 -0.164
## FTR -0.210 0.136 -0.137 -0.104 0.215
## WAB 0.259 -0.047 0.026 -0.182 -0.049 -0.125
## convergence code: 0
## boundary (singular) fit: see ?isSingular
# Random Effects
model_binom12 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + # REML = F
TOR + FTR +
WAB + (1|CONF), family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
model_binom13 <- glm(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + # remove CONF
TOR + FTR +
WAB, family = "binomial", train_data)
# anova(model_binom12, model_binom13) # p-value: 1
summary(model_binom13)
##
## Call:
## glm(formula = TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR +
## FTR + WAB, family = "binomial", data = train_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.99008 -0.33058 -0.13608 -0.01713 3.13102
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 24.63257 5.95394 4.137 3.52e-05 ***
## ADJOE 0.50122 0.11747 4.267 1.98e-05 ***
## ADJDE -0.67338 0.13383 -5.032 4.86e-07 ***
## BARTHAG -31.15990 5.39211 -5.779 7.52e-09 ***
## EFG_O 0.21664 0.05992 3.616 0.0003 ***
## TOR -0.15942 0.07778 -2.050 0.0404 *
## FTR 0.06424 0.02584 2.486 0.0129 *
## WAB 0.59009 0.06423 9.187 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1380.38 on 1403 degrees of freedom
## Residual deviance: 584.19 on 1396 degrees of freedom
## AIC: 600.19
##
## Number of Fisher Scoring iterations: 7
Appendix 4.3. Post Diagnostic Checks
# Check Diagnostics
halfnorm(resid(model_binom13, type="pearson"))
sort(faraway::vif(train_data[,c(5:8, 10, 14, 17)]))
## FTR TOR EFG_O WAB ADJDE ADJOE BARTHAG
## 1.174302 1.849976 2.771205 11.270400 14.118669 19.035144 36.023259
Appendix 4.4. Prediction
# Checking Performance
predprob=predict(model_binom13, train_data, type="response")
thresh <- seq(0.01,0.5,0.01)
Sensitivity <- numeric(length(thresh))
Specificity <- numeric(length(thresh))
for(j in seq(along=thresh)){
pp <- ifelse(predprob < thresh[j],"No","Yes")
xx <- xtabs( ~ train_data$TRNMT + pp)
Specificity[j] <- xx[1,1]/(xx[1,1]+xx[1,2])
Sensitivity[j] <- xx[2,2]/(xx[2,1]+xx[2,2])
}
matplot(thresh,cbind(Sensitivity,Specificity),type="l",xlab="Threshold",ylab="Proportion",lty=1:2)
plot(1-Specificity,Sensitivity,type="l")
abline(0,1,lty=2)
# Classification: Sensitivity and Specificity (ROC)
predout=ifelse(predprob < 0.18, "No", "Yes")
xtabs( ~ train_data$TRNMT + predout)
## predout
## train_data$TRNMT No Yes
## No 988 144
## Yes 32 240
# Training Error classification rate
1-(988+240)/(988+240+144+32)
## [1] 0.1253561
# Testing Error classification rate
predprob_test=predict(model_binom10, test_data_19, type="response")
predout_test=ifelse(predprob_test < 0.18, "No", "Yes")
xtabs( ~ test_data_19$TRNMT + predout_test)
## predout_test
## test_data_19$TRNMT No Yes
## No 247 38
## Yes 7 61
1-(247+61)/(247+61+7+38)
## [1] 0.1274788
Appendix 5. Model 2: Multinomial Model - All possibilities
mmod <- multinom(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
THREE_P_O + THREE_P_D + WAB + CONF, train_data, trace = FALSE)
mmod1 <- step(mmod, trace=FALSE)
Appendix 5.1. Prediction
summary(mmod1)
## Call:
## multinom(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O +
## EFG_D + TOR + TWO_P_O + WAB, data = train_data, trace = FALSE)
##
## Coefficients:
## (Intercept) ADJOE ADJDE BARTHAG EFG_O EFG_D
## R68 22.36917 0.6930545 -0.6387554 -35.55184 0.10514273 -0.36310116
## R64 20.05135 0.3591553 -0.4764981 -25.62416 0.18339258 -0.03428303
## R32 43.97908 0.4799637 -0.7631146 -29.61770 0.00383604 -0.08257088
## S16 -25.97634 0.2539697 -0.3535155 24.02096 -0.16766864 0.07755960
## E8 25.74885 0.5936211 -0.9496349 -25.40841 -0.40395542 0.17909511
## F4 31.32057 0.7035847 -1.1607442 -31.56321 0.51679491 0.05044687
## 2ND 38.68386 1.5249836 -1.8073067 -14.92299 -4.01741400 -0.42421578
## Champions -26.29208 4.2496101 -5.5472676 -91.10589 -3.70585697 3.37708667
## TOR TWO_P_O WAB
## R68 -0.20504636 0.0856522548 0.3284524
## R64 -0.04884582 -0.0006071909 0.6555698
## R32 -0.38844792 0.2031092293 0.7157722
## S16 -0.14868146 0.3246490614 0.6036942
## E8 -0.28493368 0.6690769336 0.8535485
## F4 0.22295890 -0.1748514204 0.5565037
## 2ND -3.52135154 4.7837765799 -0.0726802
## Champions 0.62909476 2.9240761067 -1.2029202
##
## Std. Errors:
## (Intercept) ADJOE ADJDE BARTHAG EFG_O EFG_D
## R68 0.24180880 0.07939885 0.08769994 0.71626161 0.2176209 0.15506872
## R64 1.74444597 0.07222554 0.05875595 2.26236503 0.0995162 0.07314185
## R32 2.19415373 0.09344604 0.08559134 3.07789830 0.1635410 0.11333051
## S16 1.09455542 0.09604182 0.12575847 1.02116845 0.2265747 0.14700707
## E8 0.32485850 0.12335647 0.16585797 0.33350867 0.3171549 0.19966822
## F4 0.10823782 0.15291949 0.21733239 0.12743699 0.3413094 0.23504649
## 2ND 0.04078992 0.41198907 0.59784005 0.03145735 1.3029543 0.64280916
## Champions 0.04017241 0.92960088 1.48949072 0.04935019 1.3030123 1.38532950
## TOR TWO_P_O WAB
## R68 0.13954521 0.1670340 0.13315705
## R64 0.06981779 0.0760650 0.06980336
## R32 0.10913873 0.1265307 0.10697108
## S16 0.13881999 0.1784446 0.13511803
## E8 0.20043210 0.2589957 0.18736637
## F4 0.27781350 0.2728535 0.21217333
## 2ND 1.07712119 1.4243489 0.45968890
## Champions 0.91461645 1.1864221 1.38636167
##
## Residual Deviance: 1142.137
## AIC: 1286.137
# Train Error
mmod1.pred <- predict(mmod1, train_data)
fct_expand(mmod1.pred, "R68")
## [1] 2ND 2ND 2ND 2ND Champions
## [6] Champions Champions Champions R32 E8
## [11] R64 E8 E8 R64 F4
## [16] R32 R32 R32 R64 E8
## [21] S16 R32 E8 R32 No Tournament
## [26] R32 R64 S16 R64 R32
## [31] E8 R32 No Tournament No Tournament No Tournament
## [36] No Tournament No Tournament No Tournament No Tournament No Tournament
## [41] No Tournament No Tournament No Tournament No Tournament No Tournament
## [46] No Tournament No Tournament No Tournament No Tournament No Tournament
## [51] No Tournament No Tournament No Tournament No Tournament No Tournament
## [56] No Tournament No Tournament No Tournament No Tournament No Tournament
## [61] No Tournament No Tournament No Tournament No Tournament No Tournament
## [66] No Tournament No Tournament No Tournament No Tournament No Tournament
## [71] No Tournament No Tournament No Tournament No Tournament No Tournament
## [76] No Tournament No Tournament No Tournament No Tournament No Tournament
## [81] No Tournament No Tournament No Tournament No Tournament No Tournament
## [86] No Tournament No Tournament No Tournament No Tournament R32
## [91] No Tournament No Tournament No Tournament No Tournament No Tournament
## [96] No Tournament No Tournament No Tournament No Tournament No Tournament
## [101] No Tournament No Tournament No Tournament No Tournament No Tournament
## [106] No Tournament No Tournament No Tournament No Tournament No Tournament
## [111] No Tournament No Tournament No Tournament No Tournament No Tournament
## [116] No Tournament No Tournament No Tournament No Tournament No Tournament
## [121] No Tournament No Tournament No Tournament No Tournament No Tournament
## [126] No Tournament No Tournament No Tournament No Tournament No Tournament
## [131] No Tournament No Tournament No Tournament No Tournament No Tournament
## [136] No Tournament R64 No Tournament No Tournament No Tournament
## [141] No Tournament No Tournament No Tournament No Tournament No Tournament
## [146] No Tournament No Tournament No Tournament No Tournament R64
## [151] No Tournament No Tournament No Tournament No Tournament No Tournament
## [156] No Tournament No Tournament No Tournament No Tournament No Tournament
## [161] No Tournament No Tournament No Tournament No Tournament No Tournament
## [166] No Tournament No Tournament No Tournament No Tournament No Tournament
## [171] No Tournament No Tournament No Tournament No Tournament No Tournament
## [176] No Tournament No Tournament No Tournament No Tournament No Tournament
## [181] No Tournament No Tournament No Tournament No Tournament No Tournament
## [186] No Tournament No Tournament No Tournament No Tournament No Tournament
## [191] No Tournament No Tournament No Tournament No Tournament No Tournament
## [196] No Tournament No Tournament No Tournament No Tournament No Tournament
## [201] No Tournament No Tournament No Tournament No Tournament No Tournament
## [206] No Tournament No Tournament No Tournament No Tournament No Tournament
## [211] No Tournament No Tournament No Tournament No Tournament No Tournament
## [216] No Tournament No Tournament No Tournament No Tournament No Tournament
## [221] No Tournament No Tournament No Tournament No Tournament No Tournament
## [226] R64 No Tournament S16 No Tournament No Tournament
## [231] No Tournament No Tournament No Tournament No Tournament No Tournament
## [236] No Tournament No Tournament No Tournament No Tournament No Tournament
## [241] No Tournament No Tournament No Tournament No Tournament No Tournament
## [246] No Tournament No Tournament No Tournament No Tournament No Tournament
## [251] No Tournament No Tournament No Tournament No Tournament No Tournament
## [256] No Tournament No Tournament R64 No Tournament No Tournament
## [261] No Tournament No Tournament No Tournament No Tournament No Tournament
## [266] No Tournament No Tournament No Tournament No Tournament No Tournament
## [271] No Tournament No Tournament No Tournament No Tournament No Tournament
## [276] No Tournament No Tournament No Tournament No Tournament No Tournament
## [281] No Tournament No Tournament No Tournament No Tournament No Tournament
## [286] No Tournament No Tournament No Tournament No Tournament No Tournament
## [291] No Tournament No Tournament No Tournament No Tournament No Tournament
## [296] No Tournament No Tournament No Tournament No Tournament No Tournament
## [301] No Tournament No Tournament No Tournament No Tournament No Tournament
## [306] No Tournament No Tournament No Tournament No Tournament No Tournament
## [311] No Tournament No Tournament No Tournament No Tournament No Tournament
## [316] No Tournament No Tournament No Tournament No Tournament No Tournament
## [321] No Tournament No Tournament No Tournament No Tournament No Tournament
## [326] No Tournament No Tournament No Tournament No Tournament No Tournament
## [331] No Tournament No Tournament No Tournament No Tournament No Tournament
## [336] No Tournament No Tournament No Tournament No Tournament No Tournament
## [341] No Tournament No Tournament No Tournament No Tournament No Tournament
## [346] No Tournament R64 No Tournament No Tournament No Tournament
## [351] No Tournament No Tournament No Tournament No Tournament No Tournament
## [356] No Tournament No Tournament No Tournament No Tournament No Tournament
## [361] No Tournament No Tournament No Tournament No Tournament No Tournament
## [366] No Tournament No Tournament No Tournament No Tournament No Tournament
## [371] No Tournament No Tournament No Tournament No Tournament No Tournament
## [376] No Tournament No Tournament No Tournament No Tournament No Tournament
## [381] No Tournament No Tournament No Tournament No Tournament No Tournament
## [386] No Tournament No Tournament No Tournament No Tournament No Tournament
## [391] No Tournament No Tournament No Tournament No Tournament No Tournament
## [396] No Tournament No Tournament No Tournament No Tournament No Tournament
## [401] No Tournament No Tournament No Tournament No Tournament No Tournament
## [406] No Tournament No Tournament No Tournament No Tournament No Tournament
## [411] No Tournament No Tournament No Tournament No Tournament No Tournament
## [416] No Tournament No Tournament No Tournament No Tournament No Tournament
## [421] No Tournament No Tournament No Tournament No Tournament No Tournament
## [426] No Tournament No Tournament No Tournament No Tournament No Tournament
## [431] No Tournament R64 No Tournament No Tournament No Tournament
## [436] No Tournament No Tournament No Tournament No Tournament No Tournament
## [441] No Tournament No Tournament No Tournament No Tournament No Tournament
## [446] No Tournament No Tournament No Tournament No Tournament No Tournament
## [451] No Tournament No Tournament R64 No Tournament No Tournament
## [456] No Tournament No Tournament No Tournament No Tournament No Tournament
## [461] No Tournament No Tournament No Tournament No Tournament No Tournament
## [466] No Tournament No Tournament No Tournament No Tournament No Tournament
## [471] No Tournament No Tournament No Tournament No Tournament No Tournament
## [476] No Tournament No Tournament No Tournament No Tournament No Tournament
## [481] No Tournament No Tournament No Tournament No Tournament No Tournament
## [486] No Tournament No Tournament No Tournament No Tournament No Tournament
## [491] No Tournament No Tournament No Tournament No Tournament No Tournament
## [496] No Tournament No Tournament No Tournament No Tournament No Tournament
## [501] No Tournament No Tournament No Tournament No Tournament No Tournament
## [506] No Tournament No Tournament No Tournament No Tournament No Tournament
## [511] No Tournament No Tournament No Tournament No Tournament No Tournament
## [516] No Tournament No Tournament No Tournament No Tournament No Tournament
## [521] No Tournament No Tournament No Tournament No Tournament No Tournament
## [526] No Tournament No Tournament No Tournament No Tournament No Tournament
## [531] No Tournament No Tournament No Tournament No Tournament No Tournament
## [536] No Tournament No Tournament No Tournament No Tournament No Tournament
## [541] No Tournament No Tournament No Tournament No Tournament No Tournament
## [546] No Tournament No Tournament No Tournament No Tournament R64
## [551] No Tournament No Tournament No Tournament No Tournament No Tournament
## [556] No Tournament No Tournament No Tournament No Tournament No Tournament
## [561] No Tournament No Tournament No Tournament No Tournament No Tournament
## [566] No Tournament No Tournament No Tournament No Tournament No Tournament
## [571] No Tournament No Tournament No Tournament No Tournament No Tournament
## [576] No Tournament No Tournament No Tournament No Tournament No Tournament
## [581] No Tournament No Tournament No Tournament No Tournament No Tournament
## [586] No Tournament No Tournament No Tournament No Tournament No Tournament
## [591] No Tournament No Tournament No Tournament No Tournament No Tournament
## [596] No Tournament No Tournament No Tournament No Tournament No Tournament
## [601] No Tournament No Tournament No Tournament No Tournament No Tournament
## [606] No Tournament No Tournament No Tournament No Tournament No Tournament
## [611] No Tournament No Tournament No Tournament No Tournament No Tournament
## [616] No Tournament No Tournament No Tournament No Tournament No Tournament
## [621] No Tournament No Tournament No Tournament No Tournament No Tournament
## [626] No Tournament No Tournament No Tournament No Tournament No Tournament
## [631] No Tournament No Tournament No Tournament No Tournament No Tournament
## [636] No Tournament No Tournament No Tournament No Tournament No Tournament
## [641] No Tournament No Tournament No Tournament No Tournament No Tournament
## [646] No Tournament No Tournament No Tournament No Tournament No Tournament
## [651] No Tournament No Tournament No Tournament No Tournament No Tournament
## [656] No Tournament No Tournament No Tournament No Tournament No Tournament
## [661] No Tournament No Tournament No Tournament No Tournament No Tournament
## [666] No Tournament No Tournament No Tournament No Tournament No Tournament
## [671] No Tournament No Tournament No Tournament No Tournament No Tournament
## [676] No Tournament R64 No Tournament No Tournament No Tournament
## [681] No Tournament No Tournament No Tournament No Tournament No Tournament
## [686] No Tournament No Tournament No Tournament No Tournament No Tournament
## [691] No Tournament No Tournament R64 No Tournament No Tournament
## [696] No Tournament No Tournament No Tournament No Tournament No Tournament
## [701] No Tournament No Tournament No Tournament No Tournament No Tournament
## [706] No Tournament No Tournament No Tournament No Tournament No Tournament
## [711] No Tournament No Tournament No Tournament No Tournament No Tournament
## [716] No Tournament No Tournament No Tournament No Tournament No Tournament
## [721] No Tournament No Tournament No Tournament No Tournament No Tournament
## [726] No Tournament No Tournament No Tournament No Tournament No Tournament
## [731] No Tournament No Tournament No Tournament No Tournament No Tournament
## [736] No Tournament No Tournament No Tournament No Tournament No Tournament
## [741] No Tournament No Tournament No Tournament No Tournament No Tournament
## [746] No Tournament No Tournament No Tournament No Tournament No Tournament
## [751] No Tournament No Tournament No Tournament No Tournament No Tournament
## [756] No Tournament No Tournament No Tournament No Tournament No Tournament
## [761] No Tournament No Tournament No Tournament No Tournament No Tournament
## [766] No Tournament No Tournament No Tournament No Tournament No Tournament
## [771] No Tournament No Tournament No Tournament No Tournament No Tournament
## [776] No Tournament No Tournament No Tournament No Tournament No Tournament
## [781] No Tournament No Tournament No Tournament No Tournament No Tournament
## [786] No Tournament No Tournament R64 No Tournament No Tournament
## [791] No Tournament No Tournament No Tournament No Tournament No Tournament
## [796] No Tournament No Tournament No Tournament No Tournament No Tournament
## [801] No Tournament No Tournament No Tournament No Tournament No Tournament
## [806] No Tournament No Tournament No Tournament No Tournament No Tournament
## [811] No Tournament No Tournament No Tournament No Tournament No Tournament
## [816] No Tournament No Tournament No Tournament No Tournament No Tournament
## [821] No Tournament No Tournament No Tournament No Tournament No Tournament
## [826] No Tournament No Tournament No Tournament No Tournament No Tournament
## [831] No Tournament No Tournament No Tournament No Tournament No Tournament
## [836] No Tournament No Tournament No Tournament No Tournament No Tournament
## [841] No Tournament No Tournament No Tournament No Tournament No Tournament
## [846] No Tournament No Tournament No Tournament No Tournament No Tournament
## [851] No Tournament No Tournament No Tournament No Tournament No Tournament
## [856] No Tournament No Tournament No Tournament No Tournament No Tournament
## [861] No Tournament No Tournament No Tournament No Tournament No Tournament
## [866] No Tournament No Tournament No Tournament No Tournament No Tournament
## [871] No Tournament No Tournament No Tournament No Tournament No Tournament
## [876] No Tournament No Tournament No Tournament No Tournament No Tournament
## [881] No Tournament No Tournament No Tournament No Tournament No Tournament
## [886] No Tournament No Tournament No Tournament No Tournament No Tournament
## [891] No Tournament No Tournament No Tournament No Tournament No Tournament
## [896] No Tournament No Tournament No Tournament No Tournament No Tournament
## [901] No Tournament No Tournament No Tournament No Tournament No Tournament
## [906] No Tournament No Tournament No Tournament No Tournament No Tournament
## [911] No Tournament No Tournament No Tournament No Tournament No Tournament
## [916] No Tournament No Tournament No Tournament No Tournament No Tournament
## [921] No Tournament No Tournament No Tournament No Tournament No Tournament
## [926] No Tournament No Tournament No Tournament No Tournament No Tournament
## [931] No Tournament No Tournament No Tournament No Tournament No Tournament
## [936] No Tournament No Tournament No Tournament No Tournament No Tournament
## [941] No Tournament No Tournament No Tournament No Tournament No Tournament
## [946] No Tournament No Tournament No Tournament No Tournament No Tournament
## [951] No Tournament No Tournament No Tournament No Tournament No Tournament
## [956] No Tournament No Tournament No Tournament No Tournament No Tournament
## [961] No Tournament No Tournament No Tournament No Tournament No Tournament
## [966] No Tournament No Tournament No Tournament No Tournament No Tournament
## [971] R64 No Tournament No Tournament No Tournament No Tournament
## [976] No Tournament No Tournament No Tournament No Tournament No Tournament
## [981] No Tournament No Tournament No Tournament No Tournament No Tournament
## [986] No Tournament No Tournament No Tournament No Tournament No Tournament
## [991] No Tournament No Tournament No Tournament No Tournament No Tournament
## [996] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1001] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1006] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1011] R64 No Tournament No Tournament No Tournament No Tournament
## [1016] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1021] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1026] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1031] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1036] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1041] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1046] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1051] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1056] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1061] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1066] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1071] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1076] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1081] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1086] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1091] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1096] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1101] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1106] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1111] R64 No Tournament No Tournament No Tournament No Tournament
## [1116] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1121] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1126] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1131] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1136] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1141] No Tournament No Tournament No Tournament R32 No Tournament
## [1146] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1151] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1156] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1161] R64 No Tournament No Tournament No Tournament R64
## [1166] R32 No Tournament R32 R64 R32
## [1171] R32 R32 R32 R32 R32
## [1176] S16 No Tournament No Tournament R32 R32
## [1181] R32 R64 S16 R64 R32
## [1186] R32 R64 R32 R32 S16
## [1191] R32 R32 R32 R64 R32
## [1196] R64 R64 R32 E8 S16
## [1201] R64 S16 R64 No Tournament No Tournament
## [1206] R32 No Tournament No Tournament No Tournament R32
## [1211] No Tournament R32 R32 R32 R64
## [1216] R64 R64 No Tournament R64 R64
## [1221] R32 R64 No Tournament R64 R32
## [1226] R32 No Tournament R32 R64 R32
## [1231] R64 R64 R64 No Tournament R64
## [1236] R64 R64 R64 R64 R64
## [1241] R32 R64 No Tournament R64 R64
## [1246] R64 No Tournament No Tournament R32 R64
## [1251] No Tournament No Tournament No Tournament No Tournament R64
## [1256] R64 R32 R32 R64 R32
## [1261] R64 R32 No Tournament R64 R64
## [1266] R32 No Tournament S16 R64 S16
## [1271] R64 R64 R32 R32 R32
## [1276] R32 R32 R64 R64 R32
## [1281] R64 No Tournament No Tournament No Tournament No Tournament
## [1286] No Tournament No Tournament R64 No Tournament No Tournament
## [1291] No Tournament No Tournament No Tournament No Tournament R64
## [1296] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1301] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1306] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1311] No Tournament R64 No Tournament R64 No Tournament
## [1316] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1321] No Tournament R64 R32 No Tournament No Tournament
## [1326] No Tournament R64 No Tournament No Tournament No Tournament
## [1331] No Tournament No Tournament No Tournament R64 R64
## [1336] No Tournament No Tournament R64 No Tournament No Tournament
## [1341] R64 R64 R64 R64 No Tournament
## [1346] No Tournament No Tournament No Tournament R64 No Tournament
## [1351] R64 No Tournament No Tournament No Tournament R64
## [1356] R64 R64 No Tournament No Tournament No Tournament
## [1361] No Tournament No Tournament No Tournament R64 No Tournament
## [1366] No Tournament No Tournament No Tournament R32 No Tournament
## [1371] No Tournament R64 S16 R64 R32
## [1376] S16 R32 No Tournament R64 R64
## [1381] No Tournament S16 R32 R32 R32
## [1386] R32 R64 S16 S16 R32
## [1391] S16 S16 R32 R32 R64
## [1396] No Tournament F4 R32 R32 R32
## [1401] R64 R64 R32 R32
## Levels: No Tournament R68 R64 R32 S16 E8 F4 2ND Champions
mmod1.table <- table(mmod1.pred, as_vector(train_data[,"POSTSEASON"]))
mmod1.error <- numeric(dim(mmod1.table)[1])
for(i in 1:dim(mmod1.table)[1]){
mmod1.error[i] = round(((1-(mmod1.table[i,i])/(sum(mmod1.table[,i])))*100), 4)
}
mmod1.error.table <- data.frame(names(mmod1.table[,1]), as.data.frame(mmod1.error))
colnames(mmod1.error.table) <- c("Round", "% Error")
# Test Error
mmod1.pred.test <- predict(mmod1, test_data_19)
mmod1.table.test <- table(mmod1.pred.test, as_vector(test_data_19[,"POSTSEASON"]))
mmod1.error.test <- numeric(dim(mmod1.table.test)[1])
for(i in 1:dim(mmod1.table.test)[1]){
mmod1.error.test[i] = round(((1-(mmod1.table.test[i,i])/(sum(mmod1.table.test[,i])))*100), 4)
}
mmod1.error.test.table <- data.frame(names(mmod1.table.test[,1]), mmod1.error.test)
colnames(mmod1.error.test.table) <- c("Round", "% Error")
Appendix 5.2. Error Tables (%)
knitr::kable(mmod1.error.table)
Round | % Error |
---|---|
No Tournament | 1.5901 |
R68 | 100.0000 |
R64 | 65.6250 |
R32 | 54.6875 |
S16 | 78.1250 |
E8 | 68.7500 |
F4 | 100.0000 |
2ND | 0.0000 |
Champions | 0.0000 |
knitr::kable(mmod1.error.test.table)
Round | % Error |
---|---|
No Tournament | 1.7544 |
R68 | 100.0000 |
R64 | 75.0000 |
R32 | 75.0000 |
S16 | 75.0000 |
E8 | 75.0000 |
F4 | 100.0000 |
2ND | 100.0000 |
Champions | 100.0000 |
Appendix 6. Model 3: Multinomial Model - Round Selection Given Already in Tournament
train_given_trnmt <- train_data[which(train_data$TRNMT=="Yes"), ]
train_given_trnmt$POSTSEASON <- as.character(train_given_trnmt$POSTSEASON)
train_given_trnmt$POSTSEASON <- as.factor(train_given_trnmt$POSTSEASON)
test_given_trnmt_19 <- test_data_19[which(test_data_19$TRNMT=="Yes"), ]
test_given_trnmt_19$POSTSEASON <- as.character(test_given_trnmt_19$POSTSEASON)
test_given_trnmt_19$POSTSEASON <- as.factor(test_given_trnmt_19$POSTSEASON)
mmod2 <- multinom(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
THREE_P_O + THREE_P_D + WAB + CONF, train_given_trnmt, trace = FALSE)
mmod3 <- step(mmod2, trace=0)
Appendix 6.1. Prediction
summary(mmod3)
## Call:
## multinom(formula = POSTSEASON ~ ADJOE + ADJDE + TOR + ORB + DRB +
## TWO_P_O + THREE_P_D, data = train_given_trnmt, trace = FALSE)
##
## Coefficients:
## (Intercept) ADJOE ADJDE TOR ORB DRB
## Champions -236.056663 2.852387 2.555557 11.78722 -2.025824 -1.3480990
## E8 8.900188 -2.173893 5.519055 19.74867 -2.692411 1.0909452
## F4 3.967331 -1.924539 5.569355 20.86565 -3.114750 1.3208299
## R32 15.016957 -2.274593 5.875645 19.84169 -2.725084 0.9152105
## R64 -8.026639 -2.307673 6.205060 20.28381 -2.765259 0.7592421
## R68 -25.960280 -2.152685 6.488939 20.72191 -2.997752 0.8513871
## S16 -11.220968 -2.103083 5.802243 20.07609 -2.719376 0.9520225
## TWO_P_O THREE_P_D
## Champions -6.599466 -1.821630
## E8 -7.732776 -2.242833
## F4 -8.075189 -2.994908
## R32 -8.013475 -2.467972
## R64 -8.209898 -2.358930
## R68 -8.396799 -3.068424
## S16 -8.022214 -2.227293
##
## Std. Errors:
## (Intercept) ADJOE ADJDE TOR ORB DRB TWO_P_O
## Champions 0.18694932 1.3967303 1.5390742 3.6535538 0.6854387 1.090490 1.141480
## E8 0.44581638 0.8797126 0.8448868 0.5426949 0.6486490 1.168592 1.078119
## F4 0.05398827 0.8881353 0.8596159 0.6125661 0.6667125 1.185734 1.085946
## R32 3.50710706 0.8794955 0.8478554 0.5219588 0.6466125 1.170012 1.076049
## R64 4.00974955 0.8798437 0.8484075 0.5212229 0.6472330 1.170833 1.076647
## R68 0.74323733 0.8820179 0.8511628 0.5477912 0.6548939 1.176055 1.084958
## S16 0.25546310 0.8796789 0.8476392 0.5272317 0.6467417 1.170386 1.077323
## THREE_P_D
## Champions 1.913517
## E8 1.338540
## F4 1.367293
## R32 1.340593
## R64 1.341498
## R68 1.356640
## S16 1.340915
##
## Residual Deviance: 538.46
## AIC: 650.46
# Train Error
mmod3.pred <- predict(mmod3, train_given_trnmt)
mmod3.table <- table(mmod3.pred, as_vector(train_given_trnmt[,"POSTSEASON"]))
mmod3.error <- numeric(dim(mmod3.table)[1])
for(i in 1:dim(mmod3.table)[1]){
mmod3.error[i] = round(((1-(mmod3.table[i,i])/(sum(mmod3.table[,i])))*100), 4)
}
mmod3.error.table <- data.frame(names(mmod3.table[,1]), mmod3.error)
colnames(mmod3.error.table) <- c("Round", "% Error")
# Test Error
mmod3.pred.test <- predict(mmod3, test_given_trnmt_19)
mmod3.table.test <- table(mmod3.pred.test, as_vector(test_given_trnmt_19[,"POSTSEASON"]))
mmod3.error.test <- numeric(dim(mmod3.table.test)[1])
for(i in 1:dim(mmod3.table.test)[1]){
mmod3.error.test[i] = round(((1-(mmod3.table.test[i,i])/(sum(mmod3.table.test[,i])))*100), 4)
}
mmod3.error.test.table <- data.frame(names(mmod3.table.test[,1]), mmod3.error.test)
colnames(mmod3.error.test.table) <- c("Round", "% Error")
Appendix 6.2. Error Tables (%)
knitr::kable(mmod3.error.table)
Round | % Error |
---|---|
2ND | 0.0000 |
Champions | 0.0000 |
E8 | 75.0000 |
F4 | 75.0000 |
R32 | 53.1250 |
R64 | 13.2812 |
R68 | 81.2500 |
S16 | 75.0000 |
knitr::kable(mmod3.error.test.table)
Round | % Error |
---|---|
2ND | 100.0 |
Champions | 0.0 |
E8 | 75.0 |
F4 | 100.0 |
R32 | 50.0 |
R64 | 25.0 |
R68 | 50.0 |
S16 | 62.5 |
Appendix 7. Model 4: Classification Tree - All possibilities
# Bagging
bag.cbb <- randomForest(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
THREE_P_O + THREE_P_D + WAB + CONF, train_data, mtry=18, importance=T)
bag.cbb
##
## Call:
## randomForest(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D + WAB + CONF, data = train_data, mtry = 18, importance = T)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 18
##
## OOB estimate of error rate: 15.95%
## Confusion matrix:
## No Tournament R68 R64 R32 S16 E8 F4 2ND Champions class.error
## No Tournament 1111 0 18 2 1 0 0 0 0 0.01855124
## R68 13 0 3 0 0 0 0 0 0 1.00000000
## R64 68 0 44 12 4 0 0 0 0 0.65625000
## R32 13 0 20 18 9 3 0 0 1 0.71875000
## S16 2 0 7 15 5 3 0 0 0 0.84375000
## E8 0 0 4 6 3 2 1 0 0 0.87500000
## F4 1 0 1 4 2 0 0 0 0 1.00000000
## 2ND 0 0 0 1 0 2 0 0 1 1.00000000
## Champions 0 0 0 2 0 2 0 0 0 1.00000000
importance(bag.cbb)
## No Tournament R68 R64 R32 S16
## ADJOE 14.693006 0.4440248 -3.1185169 -1.4620368 2.4783650
## ADJDE 7.747591 -0.1394167 -2.4419360 5.3325330 -2.7066895
## BARTHAG 21.836832 -0.6047249 -14.1426766 3.2154108 14.0523913
## EFG_O 7.189715 0.3780650 3.6222121 -4.1506084 0.1094855
## EFG_D 11.638146 -1.3074498 -1.5565995 2.9389837 -4.4268763
## TOR 14.608630 3.5769435 -5.9979645 9.7918820 -3.9948242
## TORD 4.321935 2.8410128 -1.7760157 1.9439753 3.9888383
## ORB 7.978463 0.8527321 1.2988782 -2.0801451 -3.1395203
## DRB 6.332592 2.3298065 1.1971745 -0.8983184 -3.2804665
## FTR 9.156979 1.6296585 2.9338975 -0.3980024 -2.2590968
## FTRD 13.738119 1.2843276 -3.1357350 -1.5092088 2.5294373
## ADJ_T 1.007006 0.3207851 -2.5078066 0.0622195 0.7259926
## TWO_P_O 9.962770 1.5545570 2.7418340 -2.6405845 1.9936402
## TWO_P_D 7.825292 -0.1682910 -0.1003262 3.4716651 -4.1357440
## THREE_P_O 5.656172 2.4258361 2.5236871 -0.1341332 -3.5759872
## THREE_P_D 13.429163 2.8294239 -2.5206609 5.0194624 -4.1732219
## WAB 87.003975 3.1484094 26.0803033 42.6016440 28.7116953
## CONF 18.489315 1.2886483 4.9451721 1.2614006 -1.1212501
## E8 F4 2ND Champions MeanDecreaseAccuracy
## ADJOE -1.31320940 0.7672427 -0.2242418 5.507257e+00 14.2589773
## ADJDE 2.86573141 1.3745786 0.4546394 3.966439e+00 7.5037078
## BARTHAG 10.46715959 6.0190629 4.2116839 1.113415e+01 23.3994186
## EFG_O 2.29694603 -1.4803312 0.7596947 1.081386e+00 7.2219512
## EFG_D 0.41291553 0.1857017 1.4170505 7.577059e-01 11.4377413
## TOR -0.02988336 -1.6012229 3.2749397 2.807296e+00 13.7075678
## TORD -1.12474365 -0.5275564 -0.8256994 5.001250e-01 4.2655501
## ORB 1.36027030 0.7450443 -1.7858408 9.184367e-01 6.3490918
## DRB -0.64208963 2.5339079 0.8256994 3.991070e-01 5.4512906
## FTR 3.70598841 -0.8685080 -0.6327087 -3.916044e+00 8.7928130
## FTRD -3.15168876 1.6971147 -0.7279924 1.725445e+00 11.3600602
## ADJ_T -2.24144536 -0.3509110 -1.6373653 5.939417e-01 -0.3785405
## TWO_P_O 6.02674675 -1.0883789 2.0998026 2.707652e+00 10.4583652
## TWO_P_D -0.64129740 -1.7372705 1.5465017 -2.617756e+00 7.4470449
## THREE_P_O 0.10088071 1.6847943 -1.8445339 7.802447e-17 5.2730020
## THREE_P_D -1.52158697 1.2034063 -1.6768545 -5.064996e-01 11.4262795
## WAB 18.24089413 5.6009920 4.5360391 1.052109e+01 101.4158902
## CONF 1.12956856 0.8643815 -0.4459112 2.452705e-01 17.7170519
## MeanDecreaseGini
## ADJOE 13.473373
## ADJDE 10.179459
## BARTHAG 33.620990
## EFG_O 9.845677
## EFG_D 9.572318
## TOR 16.147928
## TORD 15.396427
## ORB 17.143303
## DRB 14.959902
## FTR 18.192945
## FTRD 17.117198
## ADJ_T 14.047700
## TWO_P_O 12.994126
## TWO_P_D 10.510833
## THREE_P_O 14.549332
## THREE_P_D 15.947099
## WAB 184.727023
## CONF 46.207794
# Random Forest
rf.cbb <- randomForest(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
THREE_P_O + THREE_P_D + WAB + CONF, train_data, importance=T)
rf.cbb
##
## Call:
## randomForest(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D + WAB + CONF, data = train_data, importance = T)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 16.03%
## Confusion matrix:
## No Tournament R68 R64 R32 S16 E8 F4 2ND Champions class.error
## No Tournament 1114 0 15 2 1 0 0 0 0 0.01590106
## R68 12 0 4 0 0 0 0 0 0 1.00000000
## R64 74 0 39 12 3 0 0 0 0 0.69531250
## R32 13 0 17 21 10 2 0 0 1 0.67187500
## S16 2 0 8 16 4 2 0 0 0 0.87500000
## E8 0 0 4 7 4 1 0 0 0 0.93750000
## F4 1 0 1 5 1 0 0 0 0 1.00000000
## 2ND 0 0 0 2 0 2 0 0 0 1.00000000
## Champions 0 0 0 3 0 1 0 0 0 1.00000000
importance(rf.cbb)
## No Tournament R68 R64 R32 S16 E8
## ADJOE 19.376366 1.4810319 -3.9703618 4.5258519 9.4644151 5.1730254
## ADJDE 18.412810 -3.1338359 -3.2681625 7.5141304 2.7649426 3.5695577
## BARTHAG 22.291706 -1.5690585 -5.7478830 8.1327102 16.6435303 10.3177051
## EFG_O 14.972747 -0.7469644 1.4069655 -1.7308101 2.2936453 2.1276984
## EFG_D 15.717949 -3.1313464 -2.5467335 4.8399441 -3.8100993 1.4704558
## TOR 9.743195 0.2038443 -4.2960570 8.9859818 -1.3329785 1.0242003
## TORD 6.614555 -0.7634266 -0.7423748 1.5962272 3.6991474 1.8105454
## ORB 9.282291 0.3818331 -1.8835782 -1.3932224 -1.0148649 1.8802359
## DRB 6.039617 0.7067129 2.2308905 -3.5572001 -0.6820759 0.5422508
## FTR 4.448567 2.8281370 2.7113231 0.1697197 -1.4589494 2.5916891
## FTRD 10.138136 -2.0914389 -0.4530209 -2.4241170 2.3954424 -0.7741202
## ADJ_T 1.166151 1.3584661 -0.5803824 0.6512515 -0.1853880 -1.3922390
## TWO_P_O 11.207764 -0.6061895 1.8266307 -2.7369705 2.4911775 5.9226190
## TWO_P_D 13.100090 -1.7363319 -5.4243638 1.1657276 -1.2384335 0.2844982
## THREE_P_O 11.660054 0.7389345 3.1101352 1.3531775 -1.0800993 1.6703342
## THREE_P_D 11.958875 0.8308385 -1.5940261 6.8665252 -3.0526045 0.5090595
## WAB 35.937206 0.3231533 20.3089868 16.7806926 16.8252984 12.1797288
## CONF 10.545908 -0.5163549 3.0979494 1.0373125 1.4133624 3.3157393
## F4 2ND Champions MeanDecreaseAccuracy
## ADJOE -0.47451002 0.2390594 6.3848645 20.0777614
## ADJDE 0.42665499 2.5921489 4.7048375 18.8956049
## BARTHAG 1.63609960 4.1007396 7.8566422 22.9286041
## EFG_O -0.97242524 0.6110753 3.3169263 15.3048099
## EFG_D -0.63270867 1.0386316 1.2937035 15.5460174
## TOR -2.88910819 2.2923020 3.3749150 10.0343684
## TORD -0.06694858 -1.5465017 1.5364267 6.6756427
## ORB -1.19618889 -1.3462975 -2.2443539 6.8959072
## DRB -0.65724770 0.8277367 0.0848195 5.1103273
## FTR -0.65400526 -0.7019921 -1.6901371 4.9100954
## FTRD 0.76706912 1.1562432 1.7230039 9.0296243
## ADJ_T -0.51019412 0.2773714 -0.2582161 0.6882219
## TWO_P_O -2.68725303 1.9834152 2.1090080 11.1858823
## TWO_P_D -1.20316125 2.3702273 -0.7596947 12.0377247
## THREE_P_O 0.70655905 -3.0376946 -0.2349010 11.5137192
## THREE_P_D 1.33390990 -1.7901449 0.2931304 11.7515077
## WAB -0.69987003 2.8375186 6.7405407 39.8475749
## CONF 0.14318512 0.2425499 1.8197293 10.8591068
## MeanDecreaseGini
## ADJOE 38.21148
## ADJDE 31.92614
## BARTHAG 62.23914
## EFG_O 18.63888
## EFG_D 19.87889
## TOR 19.56675
## TORD 16.63633
## ORB 16.33869
## DRB 14.67000
## FTR 18.06851
## FTRD 16.61739
## ADJ_T 16.25925
## TWO_P_O 17.94546
## TWO_P_D 15.28942
## THREE_P_O 17.06756
## THREE_P_D 16.25383
## WAB 85.44423
## CONF 34.04471
Appendix 7.1. Prediction
# Testing Error - Bagging
bag.pred_test <- predict(bag.cbb, test_data_19, type = "class")
bag.table <- table(bag.pred_test, as_vector(test_data_19[,"POSTSEASON"]))
bag.error <- numeric(dim(bag.table)[1])
for(i in 1:dim(bag.table)[1]){
bag.error[i] = round(((1-(bag.table[i,i])/(sum(bag.table[,i])))*100), 4)
}
bag.error.table <- data.frame(names(bag.table[,1]), bag.error)
colnames(bag.error.table) <- c("Round", "% Error")
# Testing Error - Random Forest
rf.pred_test <- predict(rf.cbb, test_data_19, type = "class")
rf.table <- table(rf.pred_test, as_vector(test_data_19[,"POSTSEASON"]))
rf.error <- numeric(dim(rf.table)[1])
for(i in 1:dim(rf.table)[1]){
rf.error[i] = round(((1-(rf.table[i,i])/(sum(rf.table[,i])))*100), 4)
}
rf.error.table <- data.frame(names(rf.table[,1]), rf.error)
colnames(rf.error.table) <- c("Round", "% Error")
Appendix 7.2. Error Tables (%)
knitr::kable(bag.error.table)
Round | % Error |
---|---|
No Tournament | 1.4035 |
R68 | 100.0000 |
R64 | 65.6250 |
R32 | 81.2500 |
S16 | 87.5000 |
E8 | 50.0000 |
F4 | 100.0000 |
2ND | 100.0000 |
Champions | 0.0000 |
knitr::kable(rf.error.table)
Round | % Error |
---|---|
No Tournament | 1.0526 |
R68 | 100.0000 |
R64 | 75.0000 |
R32 | 81.2500 |
S16 | 100.0000 |
E8 | 50.0000 |
F4 | 100.0000 |
2ND | 100.0000 |
Champions | 100.0000 |
Appendix 8. Model 5: Classification Tree - Round Selection Given Already in Tournament
# Bagging
bag.cbb_trmnt <- randomForest(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
THREE_P_O + THREE_P_D + WAB + CONF, train_given_trnmt, mtry=18, importance=T)
bag.cbb_trmnt
##
## Call:
## randomForest(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D + WAB + CONF, data = train_given_trnmt, mtry = 18, importance = T)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 18
##
## OOB estimate of error rate: 47.79%
## Confusion matrix:
## 2ND Champions E8 F4 R32 R64 R68 S16 class.error
## 2ND 0 1 2 0 1 0 0 0 1.000000
## Champions 0 0 1 0 3 0 0 0 1.000000
## E8 0 0 2 1 5 4 0 4 0.875000
## F4 0 0 1 0 4 1 0 2 1.000000
## R32 0 2 2 0 25 28 0 7 0.609375
## R64 0 0 0 0 12 110 1 5 0.140625
## R68 0 0 0 0 1 15 0 0 1.000000
## S16 0 0 3 0 15 9 0 5 0.843750
importance(bag.cbb_trmnt)
## 2ND Champions E8 F4 R32 R64
## ADJOE -1.76154909 3.8962000 0.6218487 -1.0865020 -1.8490426 5.9537284
## ADJDE -0.61107528 2.7903499 0.4823857 -1.9792418 2.4257830 6.8499007
## BARTHAG 2.62098713 10.5164978 9.5398510 1.8129054 -0.8812557 34.3454209
## EFG_O -0.37801848 -0.2144324 1.3066682 -1.4791500 -0.2706611 2.7474807
## EFG_D -1.00100150 0.0000000 -1.0369149 -0.8546145 -0.7236456 2.1784028
## TOR 0.77506185 2.4282316 0.4901042 -0.1527105 4.8421345 4.7848683
## TORD -2.19329605 0.2294278 -0.9493571 -0.2534856 1.7963168 2.5938395
## ORB -0.81704146 -0.2760473 1.2761118 1.1622040 -0.9982349 -0.8652428
## DRB 1.73727046 -1.6713157 0.1401045 1.4699235 -1.8634389 -1.3863785
## FTR 0.35607964 -1.0010015 1.4384137 1.3518013 1.7406022 0.5130969
## FTRD 1.57532347 -0.3599027 1.0825839 -0.1541351 0.7430720 3.3912628
## ADJ_T -2.53979942 -1.0240638 -0.5482276 -1.1482325 -2.3326360 -1.1190747
## TWO_P_O 0.00000000 3.7445511 2.7560859 -2.2570553 -2.5171149 2.9150121
## TWO_P_D -1.00100150 -2.0080483 1.9808181 -2.1556531 0.2129333 2.4730054
## THREE_P_O -1.63736531 -2.0828173 -0.4326900 0.1235623 2.0929621 3.9321661
## THREE_P_D -0.51846036 0.9211386 0.1636098 1.6493015 5.8643234 6.4408576
## WAB -0.09901573 3.6533577 6.5717247 -0.4581846 -6.2063789 11.7102947
## CONF 0.65493442 -0.2731359 1.8779570 -0.2241620 1.8893849 1.6723992
## R68 S16 MeanDecreaseAccuracy MeanDecreaseGini
## ADJOE -0.03512634 -2.8437507 3.71974753 6.055142
## ADJDE 3.78575477 -2.3906729 6.59445674 8.071508
## BARTHAG 5.17445644 9.4018992 32.01883721 39.648707
## EFG_O -1.24454845 -0.5641005 1.62062571 5.712309
## EFG_D -2.29250102 -1.3904508 0.04543124 5.126351
## TOR 0.77579571 -5.3987485 4.10235212 9.666950
## TORD -3.01446696 2.6874380 2.60708401 9.658453
## ORB -0.65407709 -3.1681778 -2.38353726 8.804123
## DRB 2.75195067 -1.6945776 -1.41904265 6.565659
## FTR -0.60650086 -1.6307059 0.83076517 7.944541
## FTRD 0.71652811 -0.5686724 2.71390069 9.460938
## ADJ_T 0.74415242 -1.5720448 -2.79797646 6.759103
## TWO_P_O -0.31144054 1.0295151 1.83511565 7.505399
## TWO_P_D -1.76817451 -1.9906925 0.77493038 5.342501
## THREE_P_O 0.28480341 -0.2274760 3.38052282 8.348564
## THREE_P_D 3.61867620 -2.8464933 7.74081983 11.918373
## WAB 5.12101132 0.6466023 10.91408080 12.469167
## CONF -0.04941686 -1.5103443 2.42194686 21.304169
Appendix 8.1. Prediction
# Testing Error - Bagging
bag1.pred_test <- predict(bag.cbb_trmnt, test_given_trnmt_19, type = "class")
bag1.table <- table(bag1.pred_test, as_vector(test_given_trnmt_19[,"POSTSEASON"]))
bag1.error <- numeric(dim(bag1.table)[1])
for(i in 1:dim(bag1.table)[1]){
bag1.error[i] = round(((1-(bag1.table[i,i])/(sum(bag1.table[,i])))*100), 4)
}
bag1.error.table <- data.frame(names(bag1.table[,1]), bag1.error)
colnames(bag1.error.table) <- c("Round", "% Error")
Appendix 8.2. Error Table (%)
knitr::kable(bag1.error.table)
Round | % Error |
---|---|
2ND | 100.00 |
Champions | 0.00 |
E8 | 50.00 |
F4 | 100.00 |
R32 | 81.25 |
R64 | 18.75 |
R68 | 75.00 |
S16 | 87.50 |
Appendix 9. Full Error Table (%)
# Model 1: Binomial
binom.table.error <- xtabs( ~ train_data$TRNMT + predout)
binom.error = round((1-(binom.table.error[1,1]+binom.table.error[2,2])/(sum(binom.table.error)))*100, 4)
binom.error.full <- c(binom.error, rep(NA, 8))
# Testing Error classification rate
binom.table.error.test <- xtabs( ~ test_data_19$TRNMT + predout_test)
binom.error.test = round((1-(binom.table.error.test[1,1]+binom.table.error.test[2,2])/(sum(binom.table.error.test)))*100, 4)
binom.error.test.full <- c(binom.error.test, rep(NA, 8))
# Model 2: Multi
# Model 3: Multi
mmod3.error.full <- c(NA, mmod3.error)
mmod3.error.test.full <- c(NA, mmod3.error.test)
# Model 4: RF
bag.error.train <- round(bag.cbb$confusion[,"class.error"]*100, 4)
rf.error.train <- round(rf.cbb$confusion[,"class.error"]*100, 4)
# Model 5: RF
bag1.error.train <- round(bag.cbb_trmnt$confusion[,"class.error"]*100, 4)
bag1.error.train.full <- c(NA, bag1.error.train)
bag1.error.full <- c(NA, bag1.error)
# Final Table
mini.error.table <- data.frame(mmod3.error.full, mmod3.error.test.full, bag1.error.train.full, bag1.error.full)
mini.error.table <- mini.error.table[c(1, 8, 7, 6, 9, 4, 5, 2, 3),] # correcting order to match other data
full.error.table1 <- data.frame(binom.error.full, binom.error.test.full,
mmod1.error, mmod1.error.test, bag.error.train,
bag.error)
full.error.table2 <- data.frame(rf.error.train, rf.error, mini.error.table)
names(full.error.table1) <- c("Binom. Train", "Binom. Test", "Multi. Train", "Multi. Test",
"Bag Train", "Bag Test")
names(full.error.table2) <- c("RF Train", "RF Test", "Sp. Multi. Train",
"Sp. Multi. Test", "Sp. Bag Train", "Sp. Bag Test")
knitr::kable(full.error.table1)
Binom. Train | Binom. Test | Multi. Train | Multi. Test | Bag Train | Bag Test | |
---|---|---|---|---|---|---|
No Tournament | 12.5356 | 12.7479 | 1.5901 | 1.7544 | 1.8551 | 1.4035 |
R68 | NA | NA | 100.0000 | 100.0000 | 100.0000 | 100.0000 |
R64 | NA | NA | 65.6250 | 75.0000 | 65.6250 | 65.6250 |
R32 | NA | NA | 54.6875 | 75.0000 | 71.8750 | 81.2500 |
S16 | NA | NA | 78.1250 | 75.0000 | 84.3750 | 87.5000 |
E8 | NA | NA | 68.7500 | 75.0000 | 87.5000 | 50.0000 |
F4 | NA | NA | 100.0000 | 100.0000 | 100.0000 | 100.0000 |
2ND | NA | NA | 0.0000 | 100.0000 | 100.0000 | 100.0000 |
Champions | NA | NA | 0.0000 | 100.0000 | 100.0000 | 0.0000 |
knitr::kable(full.error.table2)
RF Train | RF Test | Sp. Multi. Train | Sp. Multi. Test | Sp. Bag Train | Sp. Bag Test | |
---|---|---|---|---|---|---|
No Tournament | 1.5901 | 1.0526 | NA | NA | NA | NA |
R68 | 100.0000 | 100.0000 | 81.2500 | 50.0 | 100.0000 | 75.00 |
R64 | 69.5312 | 75.0000 | 13.2812 | 25.0 | 14.0625 | 18.75 |
R32 | 67.1875 | 81.2500 | 53.1250 | 50.0 | 60.9375 | 81.25 |
S16 | 87.5000 | 100.0000 | 75.0000 | 62.5 | 84.3750 | 87.50 |
E8 | 93.7500 | 50.0000 | 75.0000 | 75.0 | 87.5000 | 50.00 |
F4 | 100.0000 | 100.0000 | 75.0000 | 100.0 | 100.0000 | 100.00 |
2ND | 100.0000 | 100.0000 | 0.0000 | 100.0 | 100.0000 | 100.00 |
Champions | 100.0000 | 100.0000 | 0.0000 | 0.0 | 100.0000 | 0.00 |
Appendix 10. 2020 March Madness Predictions
bag.pred_test_20 <- predict(bag.cbb, test_data_20, type = "class")
mmod.pred_test_20 <- predict(mmod1, test_data_20, type = "class")
final_20 <- data.frame(test_data_20, bag.pred_test_20, mmod.pred_test_20)
summary(final_20[,c("bag.pred_test_20", "mmod.pred_test_20")])
## bag.pred_test_20 mmod.pred_test_20
## No Tournament:304 No Tournament:309
## R64 : 26 R32 : 26
## R32 : 20 R64 : 15
## E8 : 2 E8 : 2
## S16 : 1 2ND : 1
## R68 : 0 R68 : 0
## (Other) : 0 (Other) : 0
Appendix 10.1. Late Round Predictions
final_20[which(final_20$bag.pred_test_20=="E8"), c("TEAM", "bag.pred_test_20")]
## TEAM bag.pred_test_20
## 1 Kansas E8
## 3 Gonzaga E8
final_20[which(final_20$mmod.pred_test_20=="E8"), c("TEAM", "mmod.pred_test_20")]
## TEAM mmod.pred_test_20
## 1 Kansas E8
## 3 Gonzaga E8
final_20[which(final_20$mmod.pred_test_20=="2ND"), c("TEAM", "mmod.pred_test_20")]
## TEAM mmod.pred_test_20
## 4 Dayton 2ND
Appendix 10.2. Big Ten Predictions
final_20[which(final_20$CONF=="B10"), c("TEAM", "bag.pred_test_20", "mmod.pred_test_20")]
## TEAM bag.pred_test_20 mmod.pred_test_20
## 5 Michigan St. S16 R32
## 8 Ohio St. R32 R64
## 14 Michigan R32 R32
## 15 Penn St. R32 R32
## 19 Wisconsin R32 R32
## 23 Purdue No Tournament No Tournament
## 26 Maryland R32 R32
## 27 Minnesota No Tournament No Tournament
## 29 Illinois R32 R32
## 30 Rutgers R32 R32
## 31 Iowa R64 R64
## 36 Indiana R64 R64
## 116 Northwestern No Tournament No Tournament
## 159 Nebraska No Tournament No Tournament
options(op)