Executive Summary

Sadly, the 2020 NCAA Men’s Basketball tournament could not be held this year due to the coronavirus. I found data on the past four years of college basketball seasons and thought it would be interesting to see if I could accurately represent what could have been. Based on several variables in this dataset found in Appendix 1 and 2.1, is it possible to predict the teams that will make the tournament using machine learning techniques? Also, based on this information, can we give an estimate of the round this team will make it to? These are great questions to be answered using a variety of different predictive models.

First, we should examine the data (Appendix 2.1). There are several variables that could have a potentially high correlation. The variance inflation factors in Appendix 3.1 with extremely high correlation are effective field goal percentage of shots taken and allowed. This is nearly a direct calculation of other variables presented in the dataset. Another variable with a high inflation factor is the power rating. This rating is more or less a summary of a team based on several factors presented in the data already. The last two factors of potential concern are offensive and defensive efficiency. Model selection should be able to take care of this multicollinearity. We can check our final models with diagnostic plots. The last step is to view the correlation of our potential responses and regressors with a correlation matrix. My main variable of interest is post season wins. It seems there is high correlation between several variables and the response, Appendix 3.2. We can plot some of the variables with the highest correlation. The scatter plot in Appendix 3.3 shows the relationships between post season wins and the regressors. There does appear to be significant multicollinearity as we found before.

Before we begin setting up models, it is necessary to check if the regressors need transformations. The Box-Cox method provides significant evidence for transformations, Appendix 3.4. Testing the case with no transformations gives a p-value of less than 0.0001 so transformations are clearly needed, but even after testing the recommended transformations, the p-value was still below 0.0001. It seems another factor is at play such as the multicollinearity noted previously. We can again proceed with caution. Model selection and diagnostic checks should give us a better look at what is going on later. Besides, the recommended transformations hurt the interpretability of our results quite substantially. Since we want models that perform classification, we are unable to test the transformation of our response.

First, we build a model using all potentially useful variables, Appendix 4. We can check for outliers before we begin model selection. The halfnorm plot in Appendix 4.1 suggests observation 1329 is an outlier. Upon further inspection, this team made the tournament with a terrible record among other poor variables. We will continue the analysis with this team since removing this team is unjustified. By reviewing the summary of our first model, we see several variables are insignificant. We will test removing the variable with the highest p-value in the summary of the model recursively until the ANOVA test provides evidence to accept the full model. The method is not exact, but it gives us a good idea of the variables and models that are most significant. The variables to be kept in the model were power rating, turnover rate, and wins above the bubble to name a few. Even the random effects were not significant with this final model (Appendix 4.2). Next, diagnostic checks were completed once more (Appendix 4.3). Observation 1329 was still an outlier, but the jump in trend is relatively insignificant. There is still some pretty high VIF results, but they are much lower than before. It seems this model does not drastically break any assumptions. Last, we can test the model’s predictive ability. By looking at a ROC curve (Appendix 4.4), we find the best threshold for acceptance to balance the sensitivity and specificity appears to be 0.22. With this, our training error rate was 0.125. The testing error was 0.127 for 2019. Now, we can estimate the round at which a team will go out.

In this multinomial model, I will show a model that tries to predict all rounds (Appendix 5). Model selection was simple and chose a model with similar variables to the previous model except two point shooting percentage is now included in the final model. This model also allows for easy interpretation of variables. For example, power rating is still a big indictator for success in the tournament, but as a team progresses, other variables become more important. A look at the training error shows this model does better than the previous in predicting teams to make the tournament and even does a fair job in predicting the round a team will make it to. Another nice feature about this model is the ability to view important factors associated with each round of the tournament. The error is around 75%, but each team has 9 options to land on (Appendix 5.2). This model did predict the champion and second place winner all four years (Appendix 5.1), but that could indicate the model is over fit on the training set. This model performed much better than the binomial model at predicting a team to make it to the tournament, but it may be useful to look at a model trained on the teams already known to make it to the tournament and see where they are predicted to make it in the tournament.

By setting up a model given a team has already made it into the tournament (Appendix 6), a summary of the most significant model shows power rating (both previous models biggest determining factor) is no longer included. The current model shows the highest magnitude predictor is now turnover rate which is somewhat interesting since it did not have a huge effect in the previous models and one does not typically view turnover rate as the statistic that wins games. Now, we see the error in the training set is still high (Appendix 6.2), but it is drastically reduced from the previous approach. The test set error reiterates this point. The model was actually even able to predict the correct champion in the test set. The next step would be to try out this line of thinking with a random forest.

This research question seems like it would be best fit for a categorical tree. We lose the quantitative inference capability about specific regressors the last models gave, but the main point of this analysis is prediction. We will compare the performance of a bagging and a random forest approach (Appendix 7). The bagging approach produced lower out of bag error as well as lower testing error per round. The bagging approach even correctly predicted the champion. From looking at importance we see both models put strong emphasis on wins above the bubble. This is the first model to do so. We will continue using the bagging model and try out a model trained on teams already in the tournament like before.

We fit a model in the same manner as the second multinomial model. The testing error is worse than the multinomial model (Appendix 8). We see the importance of wins above the bubble dropped as power rating became important once more. It will not be necessary to combine models on making the tournament versus round performance for the final prediction of the 2020 season. The multinomial model that considered all teams at once performed the best overall. This is nice since the model leaves interpretability of regressors intact. The table in Appendix 9 gives a comparison of error among the models created.

Each model has it own strengths. The binomial model was by far the least useful model in terms of error. Every other model was able to predict teams making the tournament much better. It appears the bag model performs the best on test data. It was also great at predicting teams to make the tournament. The multinomial model came in at a close second, but test error plays a big role when prediction is desired. We can try predicting the NCAA tournament for 2020 using both models as a comparison.

The final prediction of the tournament in 2020, which never happened due to the coronavirus, is given in Appendix 10. In this prediction, we will put more weight in the bag model’s predictions due to the performance seen in the error table (Appendix 9). The summary of the predictions (Appendix 10) shows the bag model predicted 303 teams not making the tournament and 2 teams making it past the round of 32 and the multinomial model predicted 309 teams not making the tournament and 3 teams making it past the round of 32. We can view the teams the bag model predicted to make it to the Elite 8 which were Kansas and Gonzaga. The multinomial model also had Kansas and Gonzaga in the Elite 8 with Dayton coming out of nowhere and being the runner up in the tournament. The bag model predicted Dayton to go out in the round of 32. Last, but not least, we can see the comparison of teams in the Big Ten and their predicted round at which they lost in (Appendix 10.2). Indiana was predicted to go out in the first round in both models while, our rival, Purdue did not even make the tournament in either model.

Due to the error seen in the training and testing sets with these models, we cannot put much weight in their predictions, but the teams projected to make the tournament can almost be guaranteed. Every year sees drastic variability with the presence of “bracket busting” teams. It is difficult to identify winning teams without being able to compare specific matchups. This analysis gives a good measure of minimum performance based on overall team statistics alone. The main take away is the various impactful predictors an above average team possesses in order to make it late into the tournament such as power rating and wins above the bubble. The final prediction ability is less than desired, but this has been an interesting look on what could have been through machine learning.

Methods and Results

Appendix 1. Variable Definitions

Team Information

YEAR: Season

TEAM: The Division I college basketball school

CONF: The Athletic Conference in which the school participates in

      A10 = Atlantic 10

      ACC = Atlantic Coast Conference

      AE = America East

      Amer = American

      ASun = ASUN

      B10 = Big Ten

      B12 = Big 12

      BE = Big East

      BSky = Big Sky

      BSth = Big South

      BW = Big West

      CAA = Colonial Athletic Association

      CUSA = Conference USA

      Horz = Horizon League

      IND = Independent schools

      Ivy = Ivy League

      MAAC = Metro Atlantic Athletic Conference

      MAC = Mid-American Conference

      MEAC = Mid-Eastern Athletic Conference

      MVC = Missouri Valley Conference

      MWC = Mountain West

      NEC = Northeast Conference

      OVC = Ohio Valley Conference

      P12 = Pac-12

      Pat = Patriot League

      SB = Sun Belt

      SC = Southern Conference

      SEC = South Eastern Conference

      Slnd = Southland Conference

      Sum = Summit League

      SWAC = Southwestern Athletic Conference

      WAC = Western Athletic Conference

      WCC = West Coast Conference

Tournament Information

SEED: Seed in the NCAA March Madness Tournament

TRNMT: Made tournament, yes or no

PS_WINS: Post season wins in NCAA tournament

POSTSEASON: Round where the given team was eliminated or where their season ended

      R68 = First Four

      R64 = Round of 64

      R32 = Round of 32

      S16 = Sweet Sixteen

      E8 = Elite Eight

      F4 = Final Four

      2ND = Runner-up

      Champions = Winner of the NCAA March Madness Tournament for that given year

Team Statistics

G: Number of games played in total

W: Number of games won in total

BARTHAG: Power Rating (Chance of beating an average Division I team)

WAB: Wins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)

Offensive Statistics

ADJOE: Adjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)

EFG_O: Effective Field Goal Percentage Shot

TOR: Turnover Percentage Allowed (Turnover Rate)

ORB: Offensive Rebound Percentage

FTR : Free Throw Rate (How often the given team shoots Free Throws)

TWO_P_O: Two-Point Shooting Percentage

THREE_P_O: Three-Point Shooting Percentage

ADJ_T: Adjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)

Defensive Statistics

ADJDE: Adjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)

EFG_D: Effective Field Goal Percentage Allowed

TORD: Turnover Percentage Committed (Steal Rate)

DRB: Defensive Rebound Percentage

FTRD: Free Throw Rate Allowed

TWO_P_D: Two-Point Shooting Percentage Allowed

THREE_P_D: Three-Point Shooting Percentage Allowed

Appendix 2. Data Manipulation

# Read Data into an Object
# https://www.kaggle.com/andrewsundberg/college-basketball-dataset/data
raw_data_15_19 = fread("cbb.csv")
raw_data_20 = fread("cbb20.csv")

# Combining Dataframes
raw_data_20 <- raw_data_20[,-c("RK")] #shows rank which isn't included in other years
raw_data_20 <- raw_data_20 %>%
                  mutate(POSTSEASON="No Tournament", #including arbitrary values so dataframes match columns
                         SEED=99,
                         YEAR=2020)
raw_data <- bind_rows(raw_data_15_19, raw_data_20)

# Remove unneeded dataframes
rm(raw_data_15_19)
rm(raw_data_20)
# View Data
summary(raw_data) #notice NAs
##      TEAM               CONF                 G              W
##  Length:2110        Length:2110        Min.   :24.0   Min.   : 0.00
##  Class :character   Class :character   1st Qu.:30.0   1st Qu.:12.00
##  Mode  :character   Mode  :character   Median :31.0   Median :16.00
##                                        Mean   :31.3   Mean   :16.48
##                                        3rd Qu.:33.0   3rd Qu.:21.00
##                                        Max.   :40.0   Max.   :38.00
##
##      ADJOE           ADJDE          BARTHAG           EFG_O
##  Min.   : 76.7   Min.   : 84.0   Min.   :0.0077   Min.   :39.30
##  1st Qu.: 98.4   1st Qu.: 98.6   1st Qu.:0.2833   1st Qu.:48.00
##  Median :103.0   Median :103.3   Median :0.4746   Median :49.90
##  Mean   :103.3   Mean   :103.3   Mean   :0.4941   Mean   :50.03
##  3rd Qu.:107.9   3rd Qu.:107.8   3rd Qu.:0.7111   3rd Qu.:52.00
##  Max.   :129.1   Max.   :124.0   Max.   :0.9842   Max.   :59.80
##
##      EFG_D            TOR             TORD            ORB
##  Min.   :39.60   Min.   :12.40   Min.   :10.20   Min.   :14.20
##  1st Qu.:48.30   1st Qu.:17.30   1st Qu.:17.10   1st Qu.:26.30
##  Median :50.10   Median :18.60   Median :18.50   Median :29.10
##  Mean   :50.19   Mean   :18.65   Mean   :18.58   Mean   :29.04
##  3rd Qu.:52.10   3rd Qu.:19.90   3rd Qu.:20.00   3rd Qu.:31.80
##  Max.   :59.50   Max.   :26.60   Max.   :28.00   Max.   :42.10
##
##       DRB             FTR             FTRD            2P_O
##  Min.   :18.40   Min.   :21.60   Min.   :19.70   Min.   :37.70
##  1st Qu.:27.10   1st Qu.:31.30   1st Qu.:30.60   1st Qu.:46.90
##  Median :29.20   Median :34.60   Median :34.30   Median :49.10
##  Mean   :29.22   Mean   :34.69   Mean   :34.94   Mean   :49.19
##  3rd Qu.:31.30   3rd Qu.:38.00   3rd Qu.:38.80   3rd Qu.:51.40
##  Max.   :40.40   Max.   :51.00   Max.   :58.50   Max.   :62.60
##
##       2P_D            3P_O            3P_D          ADJ_T
##  Min.   :37.70   Min.   :24.80   Min.   :27.1   Min.   :57.2
##  1st Qu.:47.20   1st Qu.:32.40   1st Qu.:32.9   1st Qu.:66.4
##  Median :49.30   Median :34.30   Median :34.5   Median :68.5
##  Mean   :49.32   Mean   :34.33   Mean   :34.5   Mean   :68.4
##  3rd Qu.:51.60   3rd Qu.:36.20   3rd Qu.:36.1   3rd Qu.:70.3
##  Max.   :61.20   Max.   :44.10   Max.   :43.1   Max.   :83.4
##
##       WAB           POSTSEASON             SEED            YEAR
##  Min.   :-25.200   Length:2110        Min.   : 1.00   Min.   :2015
##  1st Qu.:-13.000   Class :character   1st Qu.: 9.00   1st Qu.:2016
##  Median : -8.300   Mode  :character   Median :99.00   Median :2018
##  Mean   : -7.814                      Mean   :54.74   Mean   :2018
##  3rd Qu.: -3.100                      3rd Qu.:99.00   3rd Qu.:2019
##  Max.   : 13.100                      Max.   :99.00   Max.   :2020
##                                       NA's   :1417
# Data Cleaning
raw_data$POSTSEASON[is.na(raw_data$POSTSEASON)] = "No Tournament" # removing NAs
raw_data$SEED[is.na(raw_data$SEED)] = 99
raw_data <- raw_data %>%
                mutate(TWO_P_O = `2P_O`,  # Not a good naming format for R
                       TWO_P_D = `2P_D`,
                       THREE_P_O = `3P_O`,
                       THREE_P_D = `3P_D`,
                       TRNMT = ifelse(POSTSEASON=="No Tournament", "No", "Yes")) %>%
                select(everything(), -c(`2P_O`, `2P_D`, `3P_O`, `3P_D`))
raw_data$POSTSEASON <- factor(raw_data$POSTSEASON, order = TRUE, levels = c('No Tournament', 'R68', 'R64',
                                                                            'R32', 'S16', 'E8', 'F4', '2ND',
                                                                            'Champions'))
raw_data$PS_WINS = ifelse(as.numeric(raw_data$POSTSEASON) - 3 < 0, 0, as.numeric(raw_data$POSTSEASON) - 3)

# Changing Data Types
raw_data$CONF <- as.factor(raw_data$CONF)
raw_data$TRNMT <- as.factor(raw_data$TRNMT)
raw_data$YEAR <- as.factor(raw_data$YEAR)         # make year start at zero

# Data Cleaning Done
clean_data <- raw_data

Appendix 2.1. View Data

# View Data Attributes
head(clean_data)
##              TEAM CONF  G  W ADJOE ADJDE BARTHAG EFG_O EFG_D  TOR TORD  ORB
## 1: North Carolina  ACC 40 33 123.3  94.9  0.9531  52.6  48.1 15.4 18.2 40.7
## 2:      Wisconsin  B10 40 36 129.1  93.6  0.9758  54.8  47.7 12.4 15.8 32.1
## 3:       Michigan  B10 40 33 114.4  90.4  0.9375  53.9  47.7 14.0 19.5 25.5
## 4:     Texas Tech  B12 38 31 115.2  85.2  0.9696  53.5  43.0 17.7 22.8 27.4
## 5:        Gonzaga  WCC 39 37 117.8  86.3  0.9728  56.6  41.1 16.2 17.1 30.0
## 6:           Duke  ACC 39 35 125.2  90.6  0.9764  56.6  46.5 16.3 18.6 35.8
##     DRB  FTR FTRD ADJ_T  WAB POSTSEASON SEED YEAR TWO_P_O TWO_P_D THREE_P_O
## 1: 30.0 32.3 30.4  71.7  8.6        2ND    1 2016    53.9    44.6      32.7
## 2: 23.7 36.2 22.4  59.3 11.3        2ND    1 2015    54.8    44.7      36.5
## 3: 24.9 30.7 30.0  65.9  6.9        2ND    3 2018    54.7    46.8      35.2
## 4: 28.7 32.9 36.6  67.5  7.0        2ND    3 2019    52.8    41.9      36.5
## 5: 26.2 39.0 26.9  71.5  7.7        2ND    1 2017    56.3    40.0      38.2
## 6: 30.2 39.8 23.9  66.4 10.7  Champions    1 2015    55.9    46.3      38.7
##    THREE_P_D TRNMT PS_WINS
## 1:      36.2   Yes       5
## 2:      37.5   Yes       5
## 3:      33.2   Yes       5
## 4:      29.7   Yes       5
## 5:      29.0   Yes       5
## 6:      31.4   Yes       6
summary(clean_data)
##      TEAM                CONF            G              W
##  Length:2110        ACC    :  90   Min.   :24.0   Min.   : 0.00
##  Class :character   A10    :  84   1st Qu.:30.0   1st Qu.:12.00
##  Mode  :character   B10    :  84   Median :31.0   Median :16.00
##                     CUSA   :  84   Mean   :31.3   Mean   :16.48
##                     SEC    :  84   3rd Qu.:33.0   3rd Qu.:21.00
##                     Slnd   :  78   Max.   :40.0   Max.   :38.00
##                     (Other):1606
##      ADJOE           ADJDE          BARTHAG           EFG_O
##  Min.   : 76.7   Min.   : 84.0   Min.   :0.0077   Min.   :39.30
##  1st Qu.: 98.4   1st Qu.: 98.6   1st Qu.:0.2833   1st Qu.:48.00
##  Median :103.0   Median :103.3   Median :0.4746   Median :49.90
##  Mean   :103.3   Mean   :103.3   Mean   :0.4941   Mean   :50.03
##  3rd Qu.:107.9   3rd Qu.:107.8   3rd Qu.:0.7111   3rd Qu.:52.00
##  Max.   :129.1   Max.   :124.0   Max.   :0.9842   Max.   :59.80
##
##      EFG_D            TOR             TORD            ORB
##  Min.   :39.60   Min.   :12.40   Min.   :10.20   Min.   :14.20
##  1st Qu.:48.30   1st Qu.:17.30   1st Qu.:17.10   1st Qu.:26.30
##  Median :50.10   Median :18.60   Median :18.50   Median :29.10
##  Mean   :50.19   Mean   :18.65   Mean   :18.58   Mean   :29.04
##  3rd Qu.:52.10   3rd Qu.:19.90   3rd Qu.:20.00   3rd Qu.:31.80
##  Max.   :59.50   Max.   :26.60   Max.   :28.00   Max.   :42.10
##
##       DRB             FTR             FTRD           ADJ_T
##  Min.   :18.40   Min.   :21.60   Min.   :19.70   Min.   :57.2
##  1st Qu.:27.10   1st Qu.:31.30   1st Qu.:30.60   1st Qu.:66.4
##  Median :29.20   Median :34.60   Median :34.30   Median :68.5
##  Mean   :29.22   Mean   :34.69   Mean   :34.94   Mean   :68.4
##  3rd Qu.:31.30   3rd Qu.:38.00   3rd Qu.:38.80   3rd Qu.:70.3
##  Max.   :40.40   Max.   :51.00   Max.   :58.50   Max.   :83.4
##
##       WAB                  POSTSEASON        SEED         YEAR
##  Min.   :-25.200   No Tournament:1770   Min.   : 1.00   2015:351
##  1st Qu.:-13.000   R64          : 160   1st Qu.:99.00   2016:351
##  Median : -8.300   R32          :  80   Median :99.00   2017:351
##  Mean   : -7.814   S16          :  40   Mean   :84.46   2018:351
##  3rd Qu.: -3.100   R68          :  20   3rd Qu.:99.00   2019:353
##  Max.   : 13.100   E8           :  20   Max.   :99.00   2020:353
##                    (Other)      :  20
##     TWO_P_O         TWO_P_D        THREE_P_O       THREE_P_D    TRNMT
##  Min.   :37.70   Min.   :37.70   Min.   :24.80   Min.   :27.1   No :1770
##  1st Qu.:46.90   1st Qu.:47.20   1st Qu.:32.40   1st Qu.:32.9   Yes: 340
##  Median :49.10   Median :49.30   Median :34.30   Median :34.5
##  Mean   :49.19   Mean   :49.32   Mean   :34.33   Mean   :34.5
##  3rd Qu.:51.40   3rd Qu.:51.60   3rd Qu.:36.20   3rd Qu.:36.1
##  Max.   :62.60   Max.   :61.20   Max.   :44.10   Max.   :43.1
##
##     PS_WINS
##  Min.   :0.0000
##  1st Qu.:0.0000
##  Median :0.0000
##  Mean   :0.1493
##  3rd Qu.:0.0000
##  Max.   :6.0000
## 
# Separate data for train and test sets
train_data <- clean_data[which(clean_data$YEAR==2015|clean_data$YEAR==2016|clean_data$YEAR==2017|clean_data$YEAR==2018), ]
test_data_19 <- clean_data[which(clean_data$YEAR==2019), ]
test_data_20 <- clean_data[which(clean_data$YEAR==2020), ]
rm(raw_data, clean_data)

Appendix 3. Check Assumptions

Appendix 3.1. Variance Inflation Factor

# Check multicollinearity
sort(faraway::vif(train_data[,-c(1:4, 18:20, 25:26)]))         # removing factors and potential responses
##      ADJ_T        FTR       FTRD        DRB        ORB       TORD        TOR
##   1.236197   1.299749   1.740173   2.191951   2.946805   3.098201   3.105086
##        WAB      ADJDE      ADJOE  THREE_P_O    BARTHAG  THREE_P_D    TWO_P_O
##  13.243053  22.690760  28.540690  33.348308  36.511312  43.064142  69.770015
##    TWO_P_D      EFG_O      EFG_D
## 121.052453 152.321249 224.862661

Appendix 3.2. Correlation Matrix

# correlation matrix
round(cor(train_data[,-c(1:4, 18:20, 25)])[,c(18)], 4)
##     ADJOE     ADJDE   BARTHAG     EFG_O     EFG_D       TOR      TORD       ORB
##    0.4683   -0.4000    0.4270    0.2836   -0.2782   -0.2472    0.0534    0.1817
##       DRB       FTR      FTRD     ADJ_T       WAB   TWO_P_O   TWO_P_D THREE_P_O
##   -0.0908    0.0108   -0.2017   -0.0362    0.4950    0.2764   -0.2553    0.1983
## THREE_P_D   PS_WINS
##   -0.1937    1.0000

Appendix 3.3. Scatter Plot Matrix

# Data Viz
car::scatterplotMatrix(~PS_WINS + ADJOE + ADJDE + BARTHAG + WAB +
                         TWO_P_D + TWO_P_O, train_data, plot.points = FALSE)

Appendix 3.4. Transformations

The scatter plot in Appendix 3.3 shows the relationships between post season wins and the regressors. There does appear to be pretty significant multicollinearity as we found before.

# Transformations
summary(bc_x <- powerTransform(cbind(ADJOE, ADJDE, BARTHAG, EFG_O, EFG_D,
                                     TOR, TORD, ORB, DRB, FTR, FTRD, ADJ_T,
                                     TWO_P_O, TWO_P_D, THREE_P_O, THREE_P_D
                                     ) ~ 1, train_data))
## bcPower Transformations to Multinormality
##           Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## ADJOE       -0.9338       -1.00      -1.1530      -0.7146
## ADJDE        1.6317        1.63       1.3267       1.9367
## BARTHAG      0.7129        0.71       0.6719       0.7539
## EFG_O        0.2583        0.33       0.0695       0.4471
## EFG_D        0.8537        1.00       0.6601       1.0473
## TOR          0.8215        1.00       0.5086       1.1344
## TORD         0.5926        0.50       0.3252       0.8601
## ORB          1.3590        1.36       1.1254       1.5927
## DRB          1.0166        1.00       0.6691       1.3641
## FTR          0.5851        0.50       0.2638       0.9065
## FTRD         0.1689        0.00      -0.0738       0.4116
## ADJ_T        0.7688        1.00       0.0217       1.5158
## TWO_P_O     -0.1324        0.00      -0.3323       0.0675
## TWO_P_D      0.6877        0.50       0.4879       0.8876
## THREE_P_O    0.9606        1.00       0.7557       1.1654
## THREE_P_D    1.0243        1.00       0.8111       1.2374
##
## Likelihood ratio test that transformation parameters are equal to 0
##  (all log transformations)
##                                                         LRT df       pval
## LR test, lambda = (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 2724.65 16 < 2.22e-16
##
## Likelihood ratio test that no transformations are needed
##                                                          LRT df       pval
## LR test, lambda = (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1) 532.7003 16 < 2.22e-16
testTransform(bc_x, c(-1, 1.63, 0.71, 0.33, 1, 1, 0.5, 1.36, 1, 0.50, 0, 1, 0, 0.5, 1, 1))
##                                                                             LRT
## LR test, lambda = (-1 1.63 0.71 0.33 1 1 0.5 1.36 1 0.5 0 1 0 0.5 1 1) 51.98504
##                                                                        df
## LR test, lambda = (-1 1.63 0.71 0.33 1 1 0.5 1.36 1 0.5 0 1 0 0.5 1 1) 16
##                                                                              pval
## LR test, lambda = (-1 1.63 0.71 0.33 1 1 0.5 1.36 1 0.5 0 1 0 0.5 1 1) 1.1015e-05

Appendix 4. Model 1: Logistic Model with Random Effects

model_binom <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +
                       TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T +
                       TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular

Appendix 4.1. Diagnostic Check

halfnorm(resid(model_binom, type="pearson"))

train_data[1329,]
##          TEAM CONF  G  W ADJOE ADJDE BARTHAG EFG_O EFG_D  TOR TORD  ORB  DRB
## 1: Holy Cross  Pat 35 15  96.7 106.9  0.2398  47.9  53.2 16.8 19.6 23.1 29.6
##     FTR FTRD ADJ_T   WAB POSTSEASON SEED YEAR TWO_P_O TWO_P_D THREE_P_O
## 1: 36.1 33.4  64.6 -14.5        R64   16 2016    47.2    52.8      32.6
##    THREE_P_D TRNMT PS_WINS
## 1:      35.7   Yes       0
summary(model_binom)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
##     THREE_P_D + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    613.1    712.8   -287.5    575.1     1385
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -11.7565  -0.2281  -0.0917  -0.0114  13.7424
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)  18.641462   7.079618   2.633  0.00846 **
## ADJOE         0.524289   0.131314   3.993 6.53e-05 ***
## ADJDE        -0.643822   0.147179  -4.374 1.22e-05 ***
## BARTHAG     -31.242596   5.566575  -5.613 1.99e-08 ***
## EFG_O         0.459814   0.405906   1.133  0.25729
## EFG_D        -0.910582   0.612143  -1.488  0.13687
## TOR          -0.178962   0.105989  -1.688  0.09132 .
## TORD          0.149197   0.096556   1.545  0.12230
## ORB           0.035003   0.046874   0.747  0.45522
## DRB           0.009714   0.054970   0.177  0.85973
## FTR           0.059434   0.027141   2.190  0.02853 *
## FTRD         -0.023435   0.024953  -0.939  0.34766
## ADJ_T         0.027996   0.036494   0.767  0.44300
## TWO_P_O      -0.145603   0.253550  -0.574  0.56579
## TWO_P_D       0.535918   0.391771   1.368  0.17133
## THREE_P_O    -0.118350   0.221847  -0.533  0.59371
## THREE_P_D     0.404400   0.319733   1.265  0.20594
## WAB           0.554127   0.067668   8.189 2.64e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 18 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Appendix 4.2. Model Selection

# Fixed Effects Test
model_binom1 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove DRB
                       TOR + TORD + ORB + FTR + FTRD + ADJ_T +
                       TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom, model_binom1)                                                    # p-value: 0.8595
## Data: train_data
## Models:
## model_binom1: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom1:     ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## model_binom1:     THREE_P_D + WAB + (1 | CONF)
## model_binom: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom:     ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## model_binom:     THREE_P_D + WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom1   18 611.09 705.53 -287.54   575.09
## model_binom    19 613.06 712.75 -287.53   575.06 0.0313  1     0.8595
summary(model_binom1)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
##     THREE_P_D + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    611.1    705.5   -287.5    575.1     1386
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -11.6255  -0.2299  -0.0912  -0.0113  13.7483
##
## Random effects:
##  Groups Name        Variance  Std.Dev.
##  CONF   (Intercept) 4.986e-16 2.233e-08
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  18.34933    6.86945   2.671  0.00756 **
## ADJOE         0.53131    0.12560   4.230 2.34e-05 ***
## ADJDE        -0.63718    0.14203  -4.486 7.25e-06 ***
## BARTHAG     -31.27090    5.56993  -5.614 1.97e-08 ***
## EFG_O         0.45365    0.40445   1.122  0.26201
## EFG_D        -0.93198    0.59894  -1.556  0.11970
## TOR          -0.17176    0.09768  -1.758  0.07867 .
## TORD          0.16007    0.07431   2.154  0.03123 *
## ORB           0.03182    0.04323   0.736  0.46158
## FTR           0.05903    0.02705   2.182  0.02909 *
## FTRD         -0.02460    0.02406  -1.022  0.30665
## ADJ_T         0.02752    0.03641   0.756  0.44969
## TWO_P_O      -0.14786    0.25327  -0.584  0.55936
## TWO_P_D       0.54361    0.38887   1.398  0.16213
## THREE_P_O    -0.12083    0.22149  -0.546  0.58538
## THREE_P_D     0.41092    0.31715   1.296  0.19509
## WAB           0.55395    0.06763   8.191 2.59e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 17 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom2 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove THREE_P_O
                       TOR + TORD + ORB + FTR + FTRD + ADJ_T +
                       TWO_P_O + TWO_P_D + THREE_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom1, model_binom2)                                                    # p-value: 0.5832
## Data: train_data
## Models:
## model_binom2: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom2:     ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_D +
## model_binom2:     WAB + (1 | CONF)
## model_binom1: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom1:     ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_O +
## model_binom1:     THREE_P_D + WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom2   17 609.39 698.59 -287.69   575.39
## model_binom1   18 611.09 705.53 -287.54   575.09 0.3011  1     0.5832
summary(model_binom2)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_D +
##     WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    609.4    698.6   -287.7    575.4     1387
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -11.5355  -0.2297  -0.0907  -0.0112  13.8210
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  18.21711    6.87400   2.650  0.00805 **
## ADJOE         0.53349    0.12584   4.239 2.24e-05 ***
## ADJDE        -0.63791    0.14241  -4.480 7.48e-06 ***
## BARTHAG     -31.29282    5.58442  -5.604 2.10e-08 ***
## EFG_O         0.24236    0.11515   2.105  0.03531 *
## EFG_D        -0.93488    0.59985  -1.559  0.11911
## TOR          -0.17005    0.09761  -1.742  0.08148 .
## TORD          0.16104    0.07417   2.171  0.02991 *
## ORB           0.02927    0.04298   0.681  0.49582
## FTR           0.05893    0.02703   2.180  0.02922 *
## FTRD         -0.02491    0.02403  -1.037  0.29992
## ADJ_T         0.02777    0.03633   0.764  0.44460
## TWO_P_O      -0.01579    0.07354  -0.215  0.82994
## TWO_P_D       0.54526    0.38953   1.400  0.16158
## THREE_P_D     0.41070    0.31764   1.293  0.19602
## WAB           0.55209    0.06744   8.186 2.70e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom3 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove TWO_P_O
                       TOR + TORD + ORB + FTR + FTRD + ADJ_T +
                       TWO_P_D + THREE_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom2, model_binom3)                                                    # p-value: 0.8298
## Data: train_data
## Models:
## model_binom3: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom3:     ORB + FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 |
## model_binom3:     CONF)
## model_binom2: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom2:     ORB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D + THREE_P_D +
## model_binom2:     WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom3   16 607.43 691.39 -287.72   575.43
## model_binom2   17 609.39 698.59 -287.69   575.39 0.0462  1     0.8298
summary(model_binom3)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     ORB + FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 |      CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    607.4    691.4   -287.7    575.4     1388
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -11.6171  -0.2299  -0.0913  -0.0113  13.7168
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  18.30302    6.86095   2.668  0.00764 **
## ADJOE         0.53646    0.12508   4.289 1.80e-05 ***
## ADJDE        -0.63877    0.14238  -4.486 7.24e-06 ***
## BARTHAG     -31.37965    5.57202  -5.632 1.78e-08 ***
## EFG_O         0.22495    0.08174   2.752  0.00592 **
## EFG_D        -0.92798    0.59859  -1.550  0.12108
## TOR          -0.17034    0.09759  -1.745  0.08090 .
## TORD          0.15949    0.07378   2.162  0.03065 *
## ORB           0.02819    0.04266   0.661  0.50866
## FTR           0.05841    0.02692   2.170  0.02998 *
## FTRD         -0.02439    0.02391  -1.020  0.30783
## ADJ_T         0.02634    0.03569   0.738  0.46052
## TWO_P_D       0.54016    0.38852   1.390  0.16443
## THREE_P_D     0.40812    0.31727   1.286  0.19833
## WAB           0.55267    0.06740   8.199 2.42e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom4 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove ORB
                       TOR + TORD + FTR + FTRD + ADJ_T +
                       TWO_P_D + THREE_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom3, model_binom4)                                                    # p-value: 0.5077
## Data: train_data
## Models:
## model_binom4: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom4:     FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## model_binom3: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom3:     ORB + FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 |
## model_binom3:     CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom4   15 605.87 684.58 -287.94   575.87
## model_binom3   16 607.43 691.39 -287.72   575.43 0.4387  1     0.5077
summary(model_binom4)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    605.9    684.6   -287.9    575.9     1389
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -10.6103  -0.2327  -0.0904  -0.0114  13.6077
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  17.75005    6.77934   2.618  0.00884 **
## ADJOE         0.55128    0.12292   4.485 7.29e-06 ***
## ADJDE        -0.62165    0.13920  -4.466 7.98e-06 ***
## BARTHAG     -31.29170    5.54852  -5.640 1.70e-08 ***
## EFG_O         0.19019    0.06218   3.059  0.00222 **
## EFG_D        -0.95654    0.59667  -1.603  0.10890
## TOR          -0.13285    0.07933  -1.675  0.09401 .
## TORD          0.17140    0.07141   2.400  0.01639 *
## FTR           0.05566    0.02661   2.092  0.03648 *
## FTRD         -0.02430    0.02389  -1.017  0.30895
## ADJ_T         0.02573    0.03559   0.723  0.46979
## TWO_P_D       0.54660    0.38808   1.408  0.15899
## THREE_P_D     0.41365    0.31679   1.306  0.19164
## WAB           0.56268    0.06577   8.555  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom5 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove ADJ_T
                       TOR + TORD + FTR + FTRD +
                       TWO_P_D + THREE_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom4, model_binom5)                                                    # p-value: 0.4698
## Data: train_data
## Models:
## model_binom5: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom5:     FTR + FTRD + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## model_binom4: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom4:     FTR + FTRD + ADJ_T + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom5   14 604.40 677.85 -288.20   576.40
## model_binom4   15 605.87 684.58 -287.94   575.87 0.5223  1     0.4698
summary(model_binom5)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     FTR + FTRD + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    604.4    677.9   -288.2    576.4     1390
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -10.3533  -0.2309  -0.0920  -0.0118  12.9150
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  18.42403    6.72383   2.740  0.00614 **
## ADJOE         0.54672    0.12234   4.469 7.87e-06 ***
## ADJDE        -0.61378    0.13830  -4.438 9.08e-06 ***
## BARTHAG     -31.03592    5.51870  -5.624 1.87e-08 ***
## EFG_O         0.19197    0.06204   3.094  0.00197 **
## EFG_D        -0.92037    0.59287  -1.552  0.12056
## TOR          -0.13512    0.07907  -1.709  0.08749 .
## TORD          0.16674    0.07091   2.351  0.01871 *
## FTR           0.05776    0.02643   2.186  0.02884 *
## FTRD         -0.02206    0.02364  -0.933  0.35092
## TWO_P_D       0.53090    0.38631   1.374  0.16936
## THREE_P_D     0.39836    0.31527   1.264  0.20638
## WAB           0.56444    0.06571   8.590  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom6 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove FTRD
                       TOR + TORD + FTR +
                       TWO_P_D + THREE_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom5, model_binom6)                                                    # p-value: 0.3498
## Data: train_data
## Models:
## model_binom6: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom6:     FTR + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
## model_binom5: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom5:     FTR + FTRD + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom6   13 603.27 671.48 -288.63   577.27
## model_binom5   14 604.40 677.85 -288.20   576.40 0.8743  1     0.3498
summary(model_binom6)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     FTR + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    603.3    671.5   -288.6    577.3     1391
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -11.0055  -0.2310  -0.0925  -0.0118  13.3267
##
## Random effects:
##  Groups Name        Variance  Std.Dev.
##  CONF   (Intercept) 1.813e-15 4.258e-08
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  18.86651    6.71840   2.808 0.004982 **
## ADJOE         0.53968    0.12274   4.397 1.10e-05 ***
## ADJDE        -0.62169    0.13876  -4.480 7.45e-06 ***
## BARTHAG     -31.02780    5.54089  -5.600 2.15e-08 ***
## EFG_O         0.20306    0.06071   3.345 0.000824 ***
## EFG_D        -0.85127    0.58762  -1.449 0.147425
## TOR          -0.14438    0.07863  -1.836 0.066350 .
## TORD          0.13618    0.06285   2.167 0.030256 *
## FTR           0.05514    0.02624   2.102 0.035573 *
## TWO_P_D       0.49529    0.38413   1.289 0.197265
## THREE_P_D     0.36740    0.31325   1.173 0.240851
## WAB           0.57073    0.06531   8.739  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##           (Intr) ADJOE  ADJDE  BARTHA EFG_O  EFG_D  TOR    TORD   FTR    TWO_P_
## ADJOE      0.422
## ADJDE     -0.702 -0.877
## BARTHAG   -0.620 -0.938  0.942
## EFG_O      0.215  0.047 -0.314 -0.202
## EFG_D      0.123  0.029 -0.074 -0.048  0.013
## TOR       -0.487 -0.064  0.248  0.166 -0.328 -0.055
## TORD      -0.406  0.119  0.114 -0.007 -0.050 -0.189  0.053
## FTR        0.057  0.133 -0.232 -0.155  0.302  0.120 -0.361 -0.087
## TWO_P_D   -0.108 -0.049  0.056  0.055 -0.014 -0.990  0.057  0.130 -0.103
## THREE_P_D -0.125 -0.060  0.066  0.060  0.001 -0.981  0.048  0.151 -0.114  0.976
## WAB        0.105 -0.093  0.063 -0.122 -0.093 -0.025  0.183 -0.042 -0.163  0.040
##           THREE_
## ADJOE
## ADJDE
## BARTHAG
## EFG_O
## EFG_D
## TOR
## TORD
## FTR
## TWO_P_D
## THREE_P_D
## WAB        0.050
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom7 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove THREE_P_D
                       TOR + TORD + FTR +
                       TWO_P_D +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom6, model_binom7)                                                    # p-value: 0.2336
## Data: train_data
## Models:
## model_binom7: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom7:     FTR + TWO_P_D + WAB + (1 | CONF)
## model_binom6: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom6:     FTR + TWO_P_D + THREE_P_D + WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom7   12 602.69 665.65 -289.34   578.69
## model_binom6   13 603.27 671.48 -288.63   577.27 1.4189  1     0.2336
summary(model_binom7)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     FTR + TWO_P_D + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    602.7    665.7   -289.3    578.7     1392
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -11.1463  -0.2323  -0.0930  -0.0119  12.7045
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  19.91828    6.64933   2.996  0.00274 **
## ADJOE         0.54984    0.12292   4.473 7.70e-06 ***
## ADJDE        -0.63418    0.13865  -4.574 4.79e-06 ***
## BARTHAG     -31.50604    5.54307  -5.684 1.32e-08 ***
## EFG_O         0.20360    0.06060   3.360  0.00078 ***
## EFG_D        -0.17737    0.11311  -1.568  0.11684
## TOR          -0.14911    0.07833  -1.904  0.05697 .
## TORD          0.12484    0.06172   2.023  0.04310 *
## FTR           0.05872    0.02608   2.251  0.02437 *
## TWO_P_D       0.05760    0.08315   0.693  0.48848
## WAB           0.56843    0.06538   8.694  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##         (Intr) ADJOE  ADJDE  BARTHA EFG_O  EFG_D  TOR    TORD   FTR    TWO_P_
## ADJOE    0.418
## ADJDE   -0.699 -0.877
## BARTHAG -0.617 -0.938  0.942
## EFG_O    0.217  0.049 -0.315 -0.203
## EFG_D    0.005 -0.157 -0.045  0.059  0.074
## TOR     -0.486 -0.063  0.246  0.163 -0.327 -0.039
## TORD    -0.391  0.135  0.100 -0.022 -0.051 -0.218  0.048
## FTR      0.040  0.127 -0.225 -0.148  0.305  0.045 -0.359 -0.067
## TWO_P_D  0.061  0.047 -0.038 -0.019 -0.071 -0.760  0.049 -0.075  0.032
## WAB      0.115 -0.089  0.058 -0.127 -0.091  0.122  0.181 -0.053 -0.162 -0.042
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom8 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D +             # remove TWO_P_D
                       TOR + TORD + FTR +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom7, model_binom8)                                                    # p-value: 0.488
## Data: train_data
## Models:
## model_binom8: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom8:     FTR + WAB + (1 | CONF)
## model_binom7: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom7:     FTR + TWO_P_D + WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom8   11 601.17 658.89 -289.58   579.17
## model_binom7   12 602.69 665.65 -289.34   578.69 0.4809  1      0.488
summary(model_binom8)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
##     FTR + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    601.2    658.9   -289.6    579.2     1393
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -11.3517  -0.2339  -0.0917  -0.0122  12.8717
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  19.66107    6.63094   2.965  0.00303 **
## ADJOE         0.54657    0.12279   4.451 8.54e-06 ***
## ADJDE        -0.63132    0.13845  -4.560 5.12e-06 ***
## BARTHAG     -31.46860    5.54107  -5.679 1.35e-08 ***
## EFG_O         0.20675    0.06041   3.423  0.00062 ***
## EFG_D        -0.11798    0.07351  -1.605  0.10849
## TOR          -0.15195    0.07828  -1.941  0.05224 .
## TORD          0.12805    0.06148   2.083  0.03725 *
## FTR           0.05821    0.02604   2.235  0.02542 *
## WAB           0.57075    0.06525   8.747  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##         (Intr) ADJOE  ADJDE  BARTHA EFG_O  EFG_D  TOR    TORD   FTR
## ADJOE    0.416
## ADJDE   -0.698 -0.877
## BARTHAG -0.616 -0.939  0.942
## EFG_O    0.220  0.050 -0.317 -0.202
## EFG_D    0.076 -0.187 -0.113  0.069  0.033
## TOR     -0.490 -0.062  0.245  0.161 -0.324 -0.001
## TORD    -0.386  0.142  0.094 -0.027 -0.057 -0.424  0.053
## FTR      0.034  0.123 -0.222 -0.144  0.309  0.108 -0.358 -0.064
## WAB      0.116 -0.085  0.056 -0.129 -0.094  0.138  0.186 -0.055 -0.162
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom9 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O +                      # remove EFG_D
                       TOR + TORD + FTR +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom8, model_binom9)                                                    # p-value: 0.108
## Data: train_data
## Models:
## model_binom9: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + TORD + FTR +
## model_binom9:     WAB + (1 | CONF)
## model_binom8: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR + TORD +
## model_binom8:     FTR + WAB + (1 | CONF)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom9   10 601.75 654.22 -290.88   581.75
## model_binom8   11 601.17 658.89 -289.58   579.17 2.5831  1      0.108
summary(model_binom9)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + TORD + FTR +
##     WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    601.8    654.2   -290.9    581.8     1394
##
## Scaled residuals:
##      Min       1Q   Median       3Q      Max
## -10.1205  -0.2348  -0.0951  -0.0122  11.4464
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 1.23e-15 3.506e-08
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  20.55090    6.54679   3.139 0.001695 **
## ADJOE         0.51194    0.11966   4.278 1.88e-05 ***
## ADJDE        -0.65937    0.13591  -4.852 1.22e-06 ***
## BARTHAG     -30.98621    5.47740  -5.657 1.54e-08 ***
## EFG_O         0.21105    0.06016   3.508 0.000451 ***
## TOR          -0.15231    0.07781  -1.957 0.050302 .
## TORD          0.08614    0.05538   1.555 0.119853
## FTR           0.06296    0.02584   2.436 0.014846 *
## WAB           0.58789    0.06464   9.095  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##         (Intr) ADJOE  ADJDE  BARTHA EFG_O  TOR    TORD   FTR
## ADJOE    0.430
## ADJDE   -0.689 -0.919
## BARTHAG -0.618 -0.944  0.957
## EFG_O    0.212  0.050 -0.309 -0.196
## TOR     -0.487 -0.053  0.237  0.151 -0.320
## TORD    -0.391  0.075  0.049  0.001 -0.053  0.056
## FTR      0.025  0.145 -0.212 -0.151  0.310 -0.360 -0.021
## WAB      0.113 -0.055  0.069 -0.148 -0.107  0.188  0.003 -0.183
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom10 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O +                      # remove TORD
                       TOR + FTR +
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom9, model_binom10)                                                    # p-value: 0.1183
## Data: train_data
## Models:
## model_binom10: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + FTR + WAB + (1 |
## model_binom10:     CONF)
## model_binom9: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + TORD + FTR +
## model_binom9:     WAB + (1 | CONF)
##               npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom10    9 602.19 649.42 -292.10   584.19
## model_binom9    10 601.75 654.22 -290.88   581.75 2.4394  1     0.1183
summary(model_binom10)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + FTR + WAB + (1 |
##     CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    602.2    649.4   -292.1    584.2     1395
##
## Scaled residuals:
##     Min      1Q  Median      3Q     Max
## -9.2942 -0.2370 -0.0964 -0.0121 11.5546
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  24.63257    6.00029   4.105 4.04e-05 ***
## ADJOE         0.50122    0.11906   4.210 2.55e-05 ***
## ADJDE        -0.67338    0.13587  -4.956 7.20e-07 ***
## BARTHAG     -31.15990    5.46869  -5.698 1.21e-08 ***
## EFG_O         0.21664    0.06000   3.611 0.000305 ***
## TOR          -0.15942    0.07784  -2.048 0.040540 *
## FTR           0.06424    0.02585   2.485 0.012964 *
## WAB           0.59009    0.06424   9.186  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##         (Intr) ADJOE  ADJDE  BARTHA EFG_O  TOR    FTR
## ADJOE    0.502
## ADJDE   -0.731 -0.926
## BARTHAG -0.673 -0.947  0.958
## EFG_O    0.224  0.063 -0.319 -0.208
## TOR     -0.503 -0.056  0.234  0.148 -0.324
## FTR      0.025  0.150 -0.215 -0.154  0.308 -0.364
## WAB      0.115 -0.060  0.074 -0.142 -0.104  0.192 -0.186
## convergence code: 0
## boundary (singular) fit: see ?isSingular
model_binom11 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + FTR +                # remove TOR
                       WAB + (1|CONF), nAGQ = 25, family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
anova(model_binom10, model_binom11)                                                    # p-value: 0.03884 so do not remove TOR
## Data: train_data
## Models:
## model_binom11: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + FTR + WAB + (1 | CONF)
## model_binom10: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR + FTR + WAB + (1 |
## model_binom10:     CONF)
##               npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_binom11    8 604.46 646.44 -294.23   588.46
## model_binom10    9 602.19 649.42 -292.10   584.19 4.2679  1    0.03884 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_binom11)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
##  Family: binomial  ( logit )
## Formula: TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + FTR + WAB + (1 | CONF)
##    Data: train_data
##
##      AIC      BIC   logLik deviance df.resid
##    604.5    646.4   -294.2    588.5     1396
##
## Scaled residuals:
##     Min      1Q  Median      3Q     Max
## -8.6476 -0.2347 -0.0981 -0.0134 14.6925
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CONF   (Intercept) 0        0
## Number of obs: 1404, groups:  CONF, 33
##
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  18.63127    5.13189   3.630 0.000283 ***
## ADJOE         0.49358    0.11700   4.219 2.46e-05 ***
## ADJDE        -0.61506    0.12963  -4.745 2.09e-06 ***
## BARTHAG     -29.79642    5.32059  -5.600 2.14e-08 ***
## EFG_O         0.17765    0.05667   3.135 0.001720 **
## FTR           0.04517    0.02392   1.888 0.058970 .
## WAB           0.61936    0.06310   9.815  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##         (Intr) ADJOE  ADJDE  BARTHA EFG_O  FTR
## ADJOE    0.537
## ADJDE   -0.723 -0.939
## BARTHAG -0.692 -0.949  0.960
## EFG_O    0.068  0.040 -0.260 -0.164
## FTR     -0.210  0.136 -0.137 -0.104  0.215
## WAB      0.259 -0.047  0.026 -0.182 -0.049 -0.125
## convergence code: 0
## boundary (singular) fit: see ?isSingular
# Random Effects
model_binom12 <- glmer(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O +                      # REML = F
                       TOR + FTR +
                       WAB + (1|CONF), family = "binomial", train_data)
## boundary (singular) fit: see ?isSingular
model_binom13 <- glm(TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O +                        # remove CONF
                       TOR + FTR +
                       WAB, family = "binomial", train_data)
# anova(model_binom12, model_binom13)                                                   # p-value: 1
summary(model_binom13)
##
## Call:
## glm(formula = TRNMT ~ ADJOE + ADJDE + BARTHAG + EFG_O + TOR +
##     FTR + WAB, family = "binomial", data = train_data)
##
## Deviance Residuals:
##      Min        1Q    Median        3Q       Max
## -2.99008  -0.33058  -0.13608  -0.01713   3.13102
##
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  24.63257    5.95394   4.137 3.52e-05 ***
## ADJOE         0.50122    0.11747   4.267 1.98e-05 ***
## ADJDE        -0.67338    0.13383  -5.032 4.86e-07 ***
## BARTHAG     -31.15990    5.39211  -5.779 7.52e-09 ***
## EFG_O         0.21664    0.05992   3.616   0.0003 ***
## TOR          -0.15942    0.07778  -2.050   0.0404 *
## FTR           0.06424    0.02584   2.486   0.0129 *
## WAB           0.59009    0.06423   9.187  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 1380.38  on 1403  degrees of freedom
## Residual deviance:  584.19  on 1396  degrees of freedom
## AIC: 600.19
##
## Number of Fisher Scoring iterations: 7

Appendix 4.3. Post Diagnostic Checks

# Check Diagnostics
halfnorm(resid(model_binom13, type="pearson"))

sort(faraway::vif(train_data[,c(5:8, 10, 14, 17)])) 
##       FTR       TOR     EFG_O       WAB     ADJDE     ADJOE   BARTHAG
##  1.174302  1.849976  2.771205 11.270400 14.118669 19.035144 36.023259

Appendix 4.4. Prediction

# Checking Performance
predprob=predict(model_binom13, train_data, type="response")

thresh <- seq(0.01,0.5,0.01)
Sensitivity <- numeric(length(thresh))
Specificity <- numeric(length(thresh))
for(j in seq(along=thresh)){
  pp <- ifelse(predprob < thresh[j],"No","Yes")
  xx <- xtabs( ~ train_data$TRNMT + pp)
  Specificity[j] <- xx[1,1]/(xx[1,1]+xx[1,2])
  Sensitivity[j] <- xx[2,2]/(xx[2,1]+xx[2,2])
}
matplot(thresh,cbind(Sensitivity,Specificity),type="l",xlab="Threshold",ylab="Proportion",lty=1:2)

plot(1-Specificity,Sensitivity,type="l")
abline(0,1,lty=2)

# Classification: Sensitivity and Specificity (ROC)
predout=ifelse(predprob < 0.18, "No", "Yes")
xtabs( ~ train_data$TRNMT + predout)
##                 predout
## train_data$TRNMT  No Yes
##              No  988 144
##              Yes  32 240
# Training Error classification rate
1-(988+240)/(988+240+144+32)
## [1] 0.1253561
# Testing Error classification rate
predprob_test=predict(model_binom10, test_data_19, type="response")
predout_test=ifelse(predprob_test < 0.18, "No", "Yes")
xtabs( ~ test_data_19$TRNMT + predout_test)
##                   predout_test
## test_data_19$TRNMT  No Yes
##                No  247  38
##                Yes   7  61
1-(247+61)/(247+61+7+38)
## [1] 0.1274788

Appendix 5. Model 2: Multinomial Model - All possibilities

mmod <- multinom(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
                   TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
                   THREE_P_O + THREE_P_D + WAB + CONF, train_data, trace = FALSE)
mmod1 <- step(mmod, trace=FALSE)

Appendix 5.1. Prediction

summary(mmod1)
## Call:
## multinom(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O +
##     EFG_D + TOR + TWO_P_O + WAB, data = train_data, trace = FALSE)
##
## Coefficients:
##           (Intercept)     ADJOE      ADJDE   BARTHAG       EFG_O       EFG_D
## R68          22.36917 0.6930545 -0.6387554 -35.55184  0.10514273 -0.36310116
## R64          20.05135 0.3591553 -0.4764981 -25.62416  0.18339258 -0.03428303
## R32          43.97908 0.4799637 -0.7631146 -29.61770  0.00383604 -0.08257088
## S16         -25.97634 0.2539697 -0.3535155  24.02096 -0.16766864  0.07755960
## E8           25.74885 0.5936211 -0.9496349 -25.40841 -0.40395542  0.17909511
## F4           31.32057 0.7035847 -1.1607442 -31.56321  0.51679491  0.05044687
## 2ND          38.68386 1.5249836 -1.8073067 -14.92299 -4.01741400 -0.42421578
## Champions   -26.29208 4.2496101 -5.5472676 -91.10589 -3.70585697  3.37708667
##                   TOR       TWO_P_O        WAB
## R68       -0.20504636  0.0856522548  0.3284524
## R64       -0.04884582 -0.0006071909  0.6555698
## R32       -0.38844792  0.2031092293  0.7157722
## S16       -0.14868146  0.3246490614  0.6036942
## E8        -0.28493368  0.6690769336  0.8535485
## F4         0.22295890 -0.1748514204  0.5565037
## 2ND       -3.52135154  4.7837765799 -0.0726802
## Champions  0.62909476  2.9240761067 -1.2029202
##
## Std. Errors:
##           (Intercept)      ADJOE      ADJDE    BARTHAG     EFG_O      EFG_D
## R68        0.24180880 0.07939885 0.08769994 0.71626161 0.2176209 0.15506872
## R64        1.74444597 0.07222554 0.05875595 2.26236503 0.0995162 0.07314185
## R32        2.19415373 0.09344604 0.08559134 3.07789830 0.1635410 0.11333051
## S16        1.09455542 0.09604182 0.12575847 1.02116845 0.2265747 0.14700707
## E8         0.32485850 0.12335647 0.16585797 0.33350867 0.3171549 0.19966822
## F4         0.10823782 0.15291949 0.21733239 0.12743699 0.3413094 0.23504649
## 2ND        0.04078992 0.41198907 0.59784005 0.03145735 1.3029543 0.64280916
## Champions  0.04017241 0.92960088 1.48949072 0.04935019 1.3030123 1.38532950
##                  TOR   TWO_P_O        WAB
## R68       0.13954521 0.1670340 0.13315705
## R64       0.06981779 0.0760650 0.06980336
## R32       0.10913873 0.1265307 0.10697108
## S16       0.13881999 0.1784446 0.13511803
## E8        0.20043210 0.2589957 0.18736637
## F4        0.27781350 0.2728535 0.21217333
## 2ND       1.07712119 1.4243489 0.45968890
## Champions 0.91461645 1.1864221 1.38636167
##
## Residual Deviance: 1142.137
## AIC: 1286.137
# Train Error
mmod1.pred <- predict(mmod1, train_data)
fct_expand(mmod1.pred, "R68")
##    [1] 2ND           2ND           2ND           2ND           Champions
##    [6] Champions     Champions     Champions     R32           E8
##   [11] R64           E8            E8            R64           F4
##   [16] R32           R32           R32           R64           E8
##   [21] S16           R32           E8            R32           No Tournament
##   [26] R32           R64           S16           R64           R32
##   [31] E8            R32           No Tournament No Tournament No Tournament
##   [36] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [41] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [46] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [51] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [56] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [61] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [66] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [71] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [76] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [81] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [86] No Tournament No Tournament No Tournament No Tournament R32
##   [91] No Tournament No Tournament No Tournament No Tournament No Tournament
##   [96] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [101] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [106] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [111] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [116] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [121] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [126] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [131] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [136] No Tournament R64           No Tournament No Tournament No Tournament
##  [141] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [146] No Tournament No Tournament No Tournament No Tournament R64
##  [151] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [156] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [161] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [166] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [171] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [176] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [181] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [186] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [191] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [196] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [201] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [206] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [211] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [216] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [221] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [226] R64           No Tournament S16           No Tournament No Tournament
##  [231] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [236] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [241] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [246] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [251] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [256] No Tournament No Tournament R64           No Tournament No Tournament
##  [261] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [266] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [271] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [276] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [281] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [286] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [291] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [296] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [301] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [306] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [311] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [316] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [321] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [326] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [331] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [336] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [341] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [346] No Tournament R64           No Tournament No Tournament No Tournament
##  [351] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [356] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [361] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [366] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [371] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [376] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [381] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [386] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [391] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [396] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [401] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [406] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [411] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [416] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [421] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [426] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [431] No Tournament R64           No Tournament No Tournament No Tournament
##  [436] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [441] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [446] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [451] No Tournament No Tournament R64           No Tournament No Tournament
##  [456] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [461] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [466] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [471] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [476] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [481] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [486] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [491] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [496] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [501] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [506] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [511] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [516] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [521] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [526] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [531] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [536] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [541] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [546] No Tournament No Tournament No Tournament No Tournament R64
##  [551] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [556] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [561] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [566] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [571] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [576] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [581] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [586] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [591] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [596] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [601] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [606] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [611] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [616] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [621] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [626] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [631] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [636] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [641] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [646] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [651] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [656] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [661] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [666] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [671] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [676] No Tournament R64           No Tournament No Tournament No Tournament
##  [681] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [686] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [691] No Tournament No Tournament R64           No Tournament No Tournament
##  [696] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [701] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [706] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [711] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [716] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [721] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [726] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [731] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [736] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [741] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [746] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [751] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [756] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [761] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [766] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [771] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [776] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [781] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [786] No Tournament No Tournament R64           No Tournament No Tournament
##  [791] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [796] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [801] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [806] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [811] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [816] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [821] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [826] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [831] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [836] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [841] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [846] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [851] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [856] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [861] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [866] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [871] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [876] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [881] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [886] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [891] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [896] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [901] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [906] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [911] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [916] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [921] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [926] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [931] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [936] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [941] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [946] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [951] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [956] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [961] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [966] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [971] R64           No Tournament No Tournament No Tournament No Tournament
##  [976] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [981] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [986] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [991] No Tournament No Tournament No Tournament No Tournament No Tournament
##  [996] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1001] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1006] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1011] R64           No Tournament No Tournament No Tournament No Tournament
## [1016] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1021] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1026] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1031] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1036] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1041] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1046] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1051] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1056] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1061] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1066] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1071] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1076] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1081] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1086] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1091] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1096] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1101] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1106] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1111] R64           No Tournament No Tournament No Tournament No Tournament
## [1116] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1121] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1126] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1131] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1136] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1141] No Tournament No Tournament No Tournament R32           No Tournament
## [1146] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1151] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1156] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1161] R64           No Tournament No Tournament No Tournament R64
## [1166] R32           No Tournament R32           R64           R32
## [1171] R32           R32           R32           R32           R32
## [1176] S16           No Tournament No Tournament R32           R32
## [1181] R32           R64           S16           R64           R32
## [1186] R32           R64           R32           R32           S16
## [1191] R32           R32           R32           R64           R32
## [1196] R64           R64           R32           E8            S16
## [1201] R64           S16           R64           No Tournament No Tournament
## [1206] R32           No Tournament No Tournament No Tournament R32
## [1211] No Tournament R32           R32           R32           R64
## [1216] R64           R64           No Tournament R64           R64
## [1221] R32           R64           No Tournament R64           R32
## [1226] R32           No Tournament R32           R64           R32
## [1231] R64           R64           R64           No Tournament R64
## [1236] R64           R64           R64           R64           R64
## [1241] R32           R64           No Tournament R64           R64
## [1246] R64           No Tournament No Tournament R32           R64
## [1251] No Tournament No Tournament No Tournament No Tournament R64
## [1256] R64           R32           R32           R64           R32
## [1261] R64           R32           No Tournament R64           R64
## [1266] R32           No Tournament S16           R64           S16
## [1271] R64           R64           R32           R32           R32
## [1276] R32           R32           R64           R64           R32
## [1281] R64           No Tournament No Tournament No Tournament No Tournament
## [1286] No Tournament No Tournament R64           No Tournament No Tournament
## [1291] No Tournament No Tournament No Tournament No Tournament R64
## [1296] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1301] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1306] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1311] No Tournament R64           No Tournament R64           No Tournament
## [1316] No Tournament No Tournament No Tournament No Tournament No Tournament
## [1321] No Tournament R64           R32           No Tournament No Tournament
## [1326] No Tournament R64           No Tournament No Tournament No Tournament
## [1331] No Tournament No Tournament No Tournament R64           R64
## [1336] No Tournament No Tournament R64           No Tournament No Tournament
## [1341] R64           R64           R64           R64           No Tournament
## [1346] No Tournament No Tournament No Tournament R64           No Tournament
## [1351] R64           No Tournament No Tournament No Tournament R64
## [1356] R64           R64           No Tournament No Tournament No Tournament
## [1361] No Tournament No Tournament No Tournament R64           No Tournament
## [1366] No Tournament No Tournament No Tournament R32           No Tournament
## [1371] No Tournament R64           S16           R64           R32
## [1376] S16           R32           No Tournament R64           R64
## [1381] No Tournament S16           R32           R32           R32
## [1386] R32           R64           S16           S16           R32
## [1391] S16           S16           R32           R32           R64
## [1396] No Tournament F4            R32           R32           R32
## [1401] R64           R64           R32           R32
## Levels: No Tournament R68 R64 R32 S16 E8 F4 2ND Champions
mmod1.table <- table(mmod1.pred, as_vector(train_data[,"POSTSEASON"]))
mmod1.error <- numeric(dim(mmod1.table)[1])
for(i in 1:dim(mmod1.table)[1]){
  mmod1.error[i] = round(((1-(mmod1.table[i,i])/(sum(mmod1.table[,i])))*100), 4)
}
mmod1.error.table <- data.frame(names(mmod1.table[,1]), as.data.frame(mmod1.error))
colnames(mmod1.error.table) <- c("Round", "% Error")

# Test Error
mmod1.pred.test <- predict(mmod1, test_data_19)
mmod1.table.test <- table(mmod1.pred.test, as_vector(test_data_19[,"POSTSEASON"]))
mmod1.error.test <- numeric(dim(mmod1.table.test)[1])
for(i in 1:dim(mmod1.table.test)[1]){
  mmod1.error.test[i] = round(((1-(mmod1.table.test[i,i])/(sum(mmod1.table.test[,i])))*100), 4)
}
mmod1.error.test.table <- data.frame(names(mmod1.table.test[,1]), mmod1.error.test)
colnames(mmod1.error.test.table) <- c("Round", "% Error")

Appendix 5.2. Error Tables (%)

knitr::kable(mmod1.error.table)
Round % Error
No Tournament 1.5901
R68 100.0000
R64 65.6250
R32 54.6875
S16 78.1250
E8 68.7500
F4 100.0000
2ND 0.0000
Champions 0.0000
knitr::kable(mmod1.error.test.table)
Round % Error
No Tournament 1.7544
R68 100.0000
R64 75.0000
R32 75.0000
S16 75.0000
E8 75.0000
F4 100.0000
2ND 100.0000
Champions 100.0000

Appendix 6. Model 3: Multinomial Model - Round Selection Given Already in Tournament

train_given_trnmt <- train_data[which(train_data$TRNMT=="Yes"), ]
train_given_trnmt$POSTSEASON <- as.character(train_given_trnmt$POSTSEASON)
train_given_trnmt$POSTSEASON <- as.factor(train_given_trnmt$POSTSEASON)
test_given_trnmt_19 <- test_data_19[which(test_data_19$TRNMT=="Yes"), ]
test_given_trnmt_19$POSTSEASON <- as.character(test_given_trnmt_19$POSTSEASON)
test_given_trnmt_19$POSTSEASON <- as.factor(test_given_trnmt_19$POSTSEASON)

mmod2 <- multinom(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
                   TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
                   THREE_P_O + THREE_P_D + WAB + CONF, train_given_trnmt, trace = FALSE)
mmod3 <- step(mmod2, trace=0)

Appendix 6.1. Prediction

summary(mmod3)
## Call:
## multinom(formula = POSTSEASON ~ ADJOE + ADJDE + TOR + ORB + DRB +
##     TWO_P_O + THREE_P_D, data = train_given_trnmt, trace = FALSE)
##
## Coefficients:
##           (Intercept)     ADJOE    ADJDE      TOR       ORB        DRB
## Champions -236.056663  2.852387 2.555557 11.78722 -2.025824 -1.3480990
## E8           8.900188 -2.173893 5.519055 19.74867 -2.692411  1.0909452
## F4           3.967331 -1.924539 5.569355 20.86565 -3.114750  1.3208299
## R32         15.016957 -2.274593 5.875645 19.84169 -2.725084  0.9152105
## R64         -8.026639 -2.307673 6.205060 20.28381 -2.765259  0.7592421
## R68        -25.960280 -2.152685 6.488939 20.72191 -2.997752  0.8513871
## S16        -11.220968 -2.103083 5.802243 20.07609 -2.719376  0.9520225
##             TWO_P_O THREE_P_D
## Champions -6.599466 -1.821630
## E8        -7.732776 -2.242833
## F4        -8.075189 -2.994908
## R32       -8.013475 -2.467972
## R64       -8.209898 -2.358930
## R68       -8.396799 -3.068424
## S16       -8.022214 -2.227293
##
## Std. Errors:
##           (Intercept)     ADJOE     ADJDE       TOR       ORB      DRB  TWO_P_O
## Champions  0.18694932 1.3967303 1.5390742 3.6535538 0.6854387 1.090490 1.141480
## E8         0.44581638 0.8797126 0.8448868 0.5426949 0.6486490 1.168592 1.078119
## F4         0.05398827 0.8881353 0.8596159 0.6125661 0.6667125 1.185734 1.085946
## R32        3.50710706 0.8794955 0.8478554 0.5219588 0.6466125 1.170012 1.076049
## R64        4.00974955 0.8798437 0.8484075 0.5212229 0.6472330 1.170833 1.076647
## R68        0.74323733 0.8820179 0.8511628 0.5477912 0.6548939 1.176055 1.084958
## S16        0.25546310 0.8796789 0.8476392 0.5272317 0.6467417 1.170386 1.077323
##           THREE_P_D
## Champions  1.913517
## E8         1.338540
## F4         1.367293
## R32        1.340593
## R64        1.341498
## R68        1.356640
## S16        1.340915
##
## Residual Deviance: 538.46
## AIC: 650.46
# Train Error
mmod3.pred <- predict(mmod3, train_given_trnmt)
mmod3.table <- table(mmod3.pred, as_vector(train_given_trnmt[,"POSTSEASON"]))
mmod3.error <- numeric(dim(mmod3.table)[1])
for(i in 1:dim(mmod3.table)[1]){
  mmod3.error[i] = round(((1-(mmod3.table[i,i])/(sum(mmod3.table[,i])))*100), 4)
}
mmod3.error.table <- data.frame(names(mmod3.table[,1]), mmod3.error)
colnames(mmod3.error.table) <- c("Round", "% Error")

# Test Error
mmod3.pred.test <- predict(mmod3, test_given_trnmt_19)
mmod3.table.test <- table(mmod3.pred.test, as_vector(test_given_trnmt_19[,"POSTSEASON"]))
mmod3.error.test <- numeric(dim(mmod3.table.test)[1])
for(i in 1:dim(mmod3.table.test)[1]){
  mmod3.error.test[i] = round(((1-(mmod3.table.test[i,i])/(sum(mmod3.table.test[,i])))*100), 4)
}
mmod3.error.test.table <- data.frame(names(mmod3.table.test[,1]), mmod3.error.test)
colnames(mmod3.error.test.table) <- c("Round", "% Error")

Appendix 6.2. Error Tables (%)

knitr::kable(mmod3.error.table)
Round % Error
2ND 0.0000
Champions 0.0000
E8 75.0000
F4 75.0000
R32 53.1250
R64 13.2812
R68 81.2500
S16 75.0000
knitr::kable(mmod3.error.test.table)
Round % Error
2ND 100.0
Champions 0.0
E8 75.0
F4 100.0
R32 50.0
R64 25.0
R68 50.0
S16 62.5

Appendix 7. Model 4: Classification Tree - All possibilities

# Bagging
bag.cbb <- randomForest(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
                   TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
                   THREE_P_O + THREE_P_D + WAB + CONF, train_data, mtry=18, importance=T)
bag.cbb
##
## Call:
##  randomForest(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG +      EFG_O + EFG_D + TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T +      TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D + WAB + CONF, data = train_data,      mtry = 18, importance = T)
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 18
##
##         OOB estimate of  error rate: 15.95%
## Confusion matrix:
##               No Tournament R68 R64 R32 S16 E8 F4 2ND Champions class.error
## No Tournament          1111   0  18   2   1  0  0   0         0  0.01855124
## R68                      13   0   3   0   0  0  0   0         0  1.00000000
## R64                      68   0  44  12   4  0  0   0         0  0.65625000
## R32                      13   0  20  18   9  3  0   0         1  0.71875000
## S16                       2   0   7  15   5  3  0   0         0  0.84375000
## E8                        0   0   4   6   3  2  1   0         0  0.87500000
## F4                        1   0   1   4   2  0  0   0         0  1.00000000
## 2ND                       0   0   0   1   0  2  0   0         1  1.00000000
## Champions                 0   0   0   2   0  2  0   0         0  1.00000000
importance(bag.cbb)
##           No Tournament        R68         R64        R32        S16
## ADJOE         14.693006  0.4440248  -3.1185169 -1.4620368  2.4783650
## ADJDE          7.747591 -0.1394167  -2.4419360  5.3325330 -2.7066895
## BARTHAG       21.836832 -0.6047249 -14.1426766  3.2154108 14.0523913
## EFG_O          7.189715  0.3780650   3.6222121 -4.1506084  0.1094855
## EFG_D         11.638146 -1.3074498  -1.5565995  2.9389837 -4.4268763
## TOR           14.608630  3.5769435  -5.9979645  9.7918820 -3.9948242
## TORD           4.321935  2.8410128  -1.7760157  1.9439753  3.9888383
## ORB            7.978463  0.8527321   1.2988782 -2.0801451 -3.1395203
## DRB            6.332592  2.3298065   1.1971745 -0.8983184 -3.2804665
## FTR            9.156979  1.6296585   2.9338975 -0.3980024 -2.2590968
## FTRD          13.738119  1.2843276  -3.1357350 -1.5092088  2.5294373
## ADJ_T          1.007006  0.3207851  -2.5078066  0.0622195  0.7259926
## TWO_P_O        9.962770  1.5545570   2.7418340 -2.6405845  1.9936402
## TWO_P_D        7.825292 -0.1682910  -0.1003262  3.4716651 -4.1357440
## THREE_P_O      5.656172  2.4258361   2.5236871 -0.1341332 -3.5759872
## THREE_P_D     13.429163  2.8294239  -2.5206609  5.0194624 -4.1732219
## WAB           87.003975  3.1484094  26.0803033 42.6016440 28.7116953
## CONF          18.489315  1.2886483   4.9451721  1.2614006 -1.1212501
##                    E8         F4        2ND     Champions MeanDecreaseAccuracy
## ADJOE     -1.31320940  0.7672427 -0.2242418  5.507257e+00           14.2589773
## ADJDE      2.86573141  1.3745786  0.4546394  3.966439e+00            7.5037078
## BARTHAG   10.46715959  6.0190629  4.2116839  1.113415e+01           23.3994186
## EFG_O      2.29694603 -1.4803312  0.7596947  1.081386e+00            7.2219512
## EFG_D      0.41291553  0.1857017  1.4170505  7.577059e-01           11.4377413
## TOR       -0.02988336 -1.6012229  3.2749397  2.807296e+00           13.7075678
## TORD      -1.12474365 -0.5275564 -0.8256994  5.001250e-01            4.2655501
## ORB        1.36027030  0.7450443 -1.7858408  9.184367e-01            6.3490918
## DRB       -0.64208963  2.5339079  0.8256994  3.991070e-01            5.4512906
## FTR        3.70598841 -0.8685080 -0.6327087 -3.916044e+00            8.7928130
## FTRD      -3.15168876  1.6971147 -0.7279924  1.725445e+00           11.3600602
## ADJ_T     -2.24144536 -0.3509110 -1.6373653  5.939417e-01           -0.3785405
## TWO_P_O    6.02674675 -1.0883789  2.0998026  2.707652e+00           10.4583652
## TWO_P_D   -0.64129740 -1.7372705  1.5465017 -2.617756e+00            7.4470449
## THREE_P_O  0.10088071  1.6847943 -1.8445339  7.802447e-17            5.2730020
## THREE_P_D -1.52158697  1.2034063 -1.6768545 -5.064996e-01           11.4262795
## WAB       18.24089413  5.6009920  4.5360391  1.052109e+01          101.4158902
## CONF       1.12956856  0.8643815 -0.4459112  2.452705e-01           17.7170519
##           MeanDecreaseGini
## ADJOE            13.473373
## ADJDE            10.179459
## BARTHAG          33.620990
## EFG_O             9.845677
## EFG_D             9.572318
## TOR              16.147928
## TORD             15.396427
## ORB              17.143303
## DRB              14.959902
## FTR              18.192945
## FTRD             17.117198
## ADJ_T            14.047700
## TWO_P_O          12.994126
## TWO_P_D          10.510833
## THREE_P_O        14.549332
## THREE_P_D        15.947099
## WAB             184.727023
## CONF             46.207794
# Random Forest
rf.cbb <- randomForest(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
                   TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
                   THREE_P_O + THREE_P_D + WAB + CONF, train_data, importance=T)
rf.cbb
##
## Call:
##  randomForest(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG +      EFG_O + EFG_D + TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T +      TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D + WAB + CONF, data = train_data,      importance = T)
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 4
##
##         OOB estimate of  error rate: 16.03%
## Confusion matrix:
##               No Tournament R68 R64 R32 S16 E8 F4 2ND Champions class.error
## No Tournament          1114   0  15   2   1  0  0   0         0  0.01590106
## R68                      12   0   4   0   0  0  0   0         0  1.00000000
## R64                      74   0  39  12   3  0  0   0         0  0.69531250
## R32                      13   0  17  21  10  2  0   0         1  0.67187500
## S16                       2   0   8  16   4  2  0   0         0  0.87500000
## E8                        0   0   4   7   4  1  0   0         0  0.93750000
## F4                        1   0   1   5   1  0  0   0         0  1.00000000
## 2ND                       0   0   0   2   0  2  0   0         0  1.00000000
## Champions                 0   0   0   3   0  1  0   0         0  1.00000000
importance(rf.cbb)
##           No Tournament        R68        R64        R32        S16         E8
## ADJOE         19.376366  1.4810319 -3.9703618  4.5258519  9.4644151  5.1730254
## ADJDE         18.412810 -3.1338359 -3.2681625  7.5141304  2.7649426  3.5695577
## BARTHAG       22.291706 -1.5690585 -5.7478830  8.1327102 16.6435303 10.3177051
## EFG_O         14.972747 -0.7469644  1.4069655 -1.7308101  2.2936453  2.1276984
## EFG_D         15.717949 -3.1313464 -2.5467335  4.8399441 -3.8100993  1.4704558
## TOR            9.743195  0.2038443 -4.2960570  8.9859818 -1.3329785  1.0242003
## TORD           6.614555 -0.7634266 -0.7423748  1.5962272  3.6991474  1.8105454
## ORB            9.282291  0.3818331 -1.8835782 -1.3932224 -1.0148649  1.8802359
## DRB            6.039617  0.7067129  2.2308905 -3.5572001 -0.6820759  0.5422508
## FTR            4.448567  2.8281370  2.7113231  0.1697197 -1.4589494  2.5916891
## FTRD          10.138136 -2.0914389 -0.4530209 -2.4241170  2.3954424 -0.7741202
## ADJ_T          1.166151  1.3584661 -0.5803824  0.6512515 -0.1853880 -1.3922390
## TWO_P_O       11.207764 -0.6061895  1.8266307 -2.7369705  2.4911775  5.9226190
## TWO_P_D       13.100090 -1.7363319 -5.4243638  1.1657276 -1.2384335  0.2844982
## THREE_P_O     11.660054  0.7389345  3.1101352  1.3531775 -1.0800993  1.6703342
## THREE_P_D     11.958875  0.8308385 -1.5940261  6.8665252 -3.0526045  0.5090595
## WAB           35.937206  0.3231533 20.3089868 16.7806926 16.8252984 12.1797288
## CONF          10.545908 -0.5163549  3.0979494  1.0373125  1.4133624  3.3157393
##                    F4        2ND  Champions MeanDecreaseAccuracy
## ADJOE     -0.47451002  0.2390594  6.3848645           20.0777614
## ADJDE      0.42665499  2.5921489  4.7048375           18.8956049
## BARTHAG    1.63609960  4.1007396  7.8566422           22.9286041
## EFG_O     -0.97242524  0.6110753  3.3169263           15.3048099
## EFG_D     -0.63270867  1.0386316  1.2937035           15.5460174
## TOR       -2.88910819  2.2923020  3.3749150           10.0343684
## TORD      -0.06694858 -1.5465017  1.5364267            6.6756427
## ORB       -1.19618889 -1.3462975 -2.2443539            6.8959072
## DRB       -0.65724770  0.8277367  0.0848195            5.1103273
## FTR       -0.65400526 -0.7019921 -1.6901371            4.9100954
## FTRD       0.76706912  1.1562432  1.7230039            9.0296243
## ADJ_T     -0.51019412  0.2773714 -0.2582161            0.6882219
## TWO_P_O   -2.68725303  1.9834152  2.1090080           11.1858823
## TWO_P_D   -1.20316125  2.3702273 -0.7596947           12.0377247
## THREE_P_O  0.70655905 -3.0376946 -0.2349010           11.5137192
## THREE_P_D  1.33390990 -1.7901449  0.2931304           11.7515077
## WAB       -0.69987003  2.8375186  6.7405407           39.8475749
## CONF       0.14318512  0.2425499  1.8197293           10.8591068
##           MeanDecreaseGini
## ADJOE             38.21148
## ADJDE             31.92614
## BARTHAG           62.23914
## EFG_O             18.63888
## EFG_D             19.87889
## TOR               19.56675
## TORD              16.63633
## ORB               16.33869
## DRB               14.67000
## FTR               18.06851
## FTRD              16.61739
## ADJ_T             16.25925
## TWO_P_O           17.94546
## TWO_P_D           15.28942
## THREE_P_O         17.06756
## THREE_P_D         16.25383
## WAB               85.44423
## CONF              34.04471

Appendix 7.1. Prediction

# Testing Error - Bagging
bag.pred_test <- predict(bag.cbb, test_data_19, type = "class")
bag.table <- table(bag.pred_test, as_vector(test_data_19[,"POSTSEASON"]))
bag.error <- numeric(dim(bag.table)[1])
for(i in 1:dim(bag.table)[1]){
  bag.error[i] = round(((1-(bag.table[i,i])/(sum(bag.table[,i])))*100), 4)
}
bag.error.table <- data.frame(names(bag.table[,1]), bag.error)
colnames(bag.error.table) <- c("Round", "% Error")

# Testing Error - Random Forest
rf.pred_test <- predict(rf.cbb, test_data_19, type = "class")
rf.table <- table(rf.pred_test, as_vector(test_data_19[,"POSTSEASON"]))
rf.error <- numeric(dim(rf.table)[1])
for(i in 1:dim(rf.table)[1]){
  rf.error[i] = round(((1-(rf.table[i,i])/(sum(rf.table[,i])))*100), 4)
}
rf.error.table <- data.frame(names(rf.table[,1]), rf.error)
colnames(rf.error.table) <- c("Round", "% Error")

Appendix 7.2. Error Tables (%)

knitr::kable(bag.error.table)
Round % Error
No Tournament 1.4035
R68 100.0000
R64 65.6250
R32 81.2500
S16 87.5000
E8 50.0000
F4 100.0000
2ND 100.0000
Champions 0.0000
knitr::kable(rf.error.table)
Round % Error
No Tournament 1.0526
R68 100.0000
R64 75.0000
R32 81.2500
S16 100.0000
E8 50.0000
F4 100.0000
2ND 100.0000
Champions 100.0000

Appendix 8. Model 5: Classification Tree - Round Selection Given Already in Tournament

# Bagging
bag.cbb_trmnt <- randomForest(POSTSEASON ~ ADJOE + ADJDE + BARTHAG + EFG_O + EFG_D + TOR +
                   TORD + ORB + DRB + FTR + FTRD + ADJ_T + TWO_P_O + TWO_P_D +
                   THREE_P_O + THREE_P_D + WAB + CONF, train_given_trnmt, mtry=18, importance=T)
bag.cbb_trmnt
##
## Call:
##  randomForest(formula = POSTSEASON ~ ADJOE + ADJDE + BARTHAG +      EFG_O + EFG_D + TOR + TORD + ORB + DRB + FTR + FTRD + ADJ_T +      TWO_P_O + TWO_P_D + THREE_P_O + THREE_P_D + WAB + CONF, data = train_given_trnmt,      mtry = 18, importance = T)
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 18
##
##         OOB estimate of  error rate: 47.79%
## Confusion matrix:
##           2ND Champions E8 F4 R32 R64 R68 S16 class.error
## 2ND         0         1  2  0   1   0   0   0    1.000000
## Champions   0         0  1  0   3   0   0   0    1.000000
## E8          0         0  2  1   5   4   0   4    0.875000
## F4          0         0  1  0   4   1   0   2    1.000000
## R32         0         2  2  0  25  28   0   7    0.609375
## R64         0         0  0  0  12 110   1   5    0.140625
## R68         0         0  0  0   1  15   0   0    1.000000
## S16         0         0  3  0  15   9   0   5    0.843750
importance(bag.cbb_trmnt)
##                   2ND  Champions         E8         F4        R32        R64
## ADJOE     -1.76154909  3.8962000  0.6218487 -1.0865020 -1.8490426  5.9537284
## ADJDE     -0.61107528  2.7903499  0.4823857 -1.9792418  2.4257830  6.8499007
## BARTHAG    2.62098713 10.5164978  9.5398510  1.8129054 -0.8812557 34.3454209
## EFG_O     -0.37801848 -0.2144324  1.3066682 -1.4791500 -0.2706611  2.7474807
## EFG_D     -1.00100150  0.0000000 -1.0369149 -0.8546145 -0.7236456  2.1784028
## TOR        0.77506185  2.4282316  0.4901042 -0.1527105  4.8421345  4.7848683
## TORD      -2.19329605  0.2294278 -0.9493571 -0.2534856  1.7963168  2.5938395
## ORB       -0.81704146 -0.2760473  1.2761118  1.1622040 -0.9982349 -0.8652428
## DRB        1.73727046 -1.6713157  0.1401045  1.4699235 -1.8634389 -1.3863785
## FTR        0.35607964 -1.0010015  1.4384137  1.3518013  1.7406022  0.5130969
## FTRD       1.57532347 -0.3599027  1.0825839 -0.1541351  0.7430720  3.3912628
## ADJ_T     -2.53979942 -1.0240638 -0.5482276 -1.1482325 -2.3326360 -1.1190747
## TWO_P_O    0.00000000  3.7445511  2.7560859 -2.2570553 -2.5171149  2.9150121
## TWO_P_D   -1.00100150 -2.0080483  1.9808181 -2.1556531  0.2129333  2.4730054
## THREE_P_O -1.63736531 -2.0828173 -0.4326900  0.1235623  2.0929621  3.9321661
## THREE_P_D -0.51846036  0.9211386  0.1636098  1.6493015  5.8643234  6.4408576
## WAB       -0.09901573  3.6533577  6.5717247 -0.4581846 -6.2063789 11.7102947
## CONF       0.65493442 -0.2731359  1.8779570 -0.2241620  1.8893849  1.6723992
##                   R68        S16 MeanDecreaseAccuracy MeanDecreaseGini
## ADJOE     -0.03512634 -2.8437507           3.71974753         6.055142
## ADJDE      3.78575477 -2.3906729           6.59445674         8.071508
## BARTHAG    5.17445644  9.4018992          32.01883721        39.648707
## EFG_O     -1.24454845 -0.5641005           1.62062571         5.712309
## EFG_D     -2.29250102 -1.3904508           0.04543124         5.126351
## TOR        0.77579571 -5.3987485           4.10235212         9.666950
## TORD      -3.01446696  2.6874380           2.60708401         9.658453
## ORB       -0.65407709 -3.1681778          -2.38353726         8.804123
## DRB        2.75195067 -1.6945776          -1.41904265         6.565659
## FTR       -0.60650086 -1.6307059           0.83076517         7.944541
## FTRD       0.71652811 -0.5686724           2.71390069         9.460938
## ADJ_T      0.74415242 -1.5720448          -2.79797646         6.759103
## TWO_P_O   -0.31144054  1.0295151           1.83511565         7.505399
## TWO_P_D   -1.76817451 -1.9906925           0.77493038         5.342501
## THREE_P_O  0.28480341 -0.2274760           3.38052282         8.348564
## THREE_P_D  3.61867620 -2.8464933           7.74081983        11.918373
## WAB        5.12101132  0.6466023          10.91408080        12.469167
## CONF      -0.04941686 -1.5103443           2.42194686        21.304169

Appendix 8.1. Prediction

# Testing Error - Bagging
bag1.pred_test <- predict(bag.cbb_trmnt, test_given_trnmt_19, type = "class")
bag1.table <- table(bag1.pred_test, as_vector(test_given_trnmt_19[,"POSTSEASON"]))
bag1.error <- numeric(dim(bag1.table)[1])
for(i in 1:dim(bag1.table)[1]){
  bag1.error[i] = round(((1-(bag1.table[i,i])/(sum(bag1.table[,i])))*100), 4)
}
bag1.error.table <- data.frame(names(bag1.table[,1]), bag1.error)
colnames(bag1.error.table) <- c("Round", "% Error")

Appendix 8.2. Error Table (%)

knitr::kable(bag1.error.table)
Round % Error
2ND 100.00
Champions 0.00
E8 50.00
F4 100.00
R32 81.25
R64 18.75
R68 75.00
S16 87.50

Appendix 9. Full Error Table (%)

# Model 1: Binomial
binom.table.error <- xtabs( ~ train_data$TRNMT + predout)
binom.error = round((1-(binom.table.error[1,1]+binom.table.error[2,2])/(sum(binom.table.error)))*100, 4)
binom.error.full <- c(binom.error, rep(NA, 8))
# Testing Error classification rate
binom.table.error.test <- xtabs( ~ test_data_19$TRNMT + predout_test)
binom.error.test = round((1-(binom.table.error.test[1,1]+binom.table.error.test[2,2])/(sum(binom.table.error.test)))*100, 4)
binom.error.test.full <- c(binom.error.test, rep(NA, 8))

# Model 2: Multi

# Model 3: Multi
mmod3.error.full <- c(NA, mmod3.error)
mmod3.error.test.full <- c(NA, mmod3.error.test)

# Model 4: RF
bag.error.train <- round(bag.cbb$confusion[,"class.error"]*100, 4)
rf.error.train <- round(rf.cbb$confusion[,"class.error"]*100, 4)


# Model 5: RF
bag1.error.train <- round(bag.cbb_trmnt$confusion[,"class.error"]*100, 4)
bag1.error.train.full <- c(NA, bag1.error.train)
bag1.error.full <- c(NA, bag1.error)

# Final Table
mini.error.table <- data.frame(mmod3.error.full, mmod3.error.test.full, bag1.error.train.full, bag1.error.full)
mini.error.table <- mini.error.table[c(1, 8, 7, 6, 9, 4, 5, 2, 3),]               # correcting order to match other data
full.error.table1 <- data.frame(binom.error.full, binom.error.test.full,
                                  mmod1.error, mmod1.error.test, bag.error.train,
                                  bag.error)
full.error.table2 <- data.frame(rf.error.train, rf.error, mini.error.table)
names(full.error.table1) <- c("Binom. Train", "Binom. Test", "Multi. Train", "Multi. Test",
                             "Bag Train", "Bag Test")
names(full.error.table2) <- c("RF Train", "RF Test", "Sp. Multi. Train",
                             "Sp. Multi. Test", "Sp. Bag Train", "Sp. Bag Test")

knitr::kable(full.error.table1)
Binom. Train Binom. Test Multi. Train Multi. Test Bag Train Bag Test
No Tournament 12.5356 12.7479 1.5901 1.7544 1.8551 1.4035
R68 NA NA 100.0000 100.0000 100.0000 100.0000
R64 NA NA 65.6250 75.0000 65.6250 65.6250
R32 NA NA 54.6875 75.0000 71.8750 81.2500
S16 NA NA 78.1250 75.0000 84.3750 87.5000
E8 NA NA 68.7500 75.0000 87.5000 50.0000
F4 NA NA 100.0000 100.0000 100.0000 100.0000
2ND NA NA 0.0000 100.0000 100.0000 100.0000
Champions NA NA 0.0000 100.0000 100.0000 0.0000
knitr::kable(full.error.table2)
RF Train RF Test Sp. Multi. Train Sp. Multi. Test Sp. Bag Train Sp. Bag Test
No Tournament 1.5901 1.0526 NA NA NA NA
R68 100.0000 100.0000 81.2500 50.0 100.0000 75.00
R64 69.5312 75.0000 13.2812 25.0 14.0625 18.75
R32 67.1875 81.2500 53.1250 50.0 60.9375 81.25
S16 87.5000 100.0000 75.0000 62.5 84.3750 87.50
E8 93.7500 50.0000 75.0000 75.0 87.5000 50.00
F4 100.0000 100.0000 75.0000 100.0 100.0000 100.00
2ND 100.0000 100.0000 0.0000 100.0 100.0000 100.00
Champions 100.0000 100.0000 0.0000 0.0 100.0000 0.00

Appendix 10. 2020 March Madness Predictions

bag.pred_test_20 <- predict(bag.cbb, test_data_20, type = "class")
mmod.pred_test_20 <- predict(mmod1, test_data_20, type = "class")
final_20 <- data.frame(test_data_20, bag.pred_test_20, mmod.pred_test_20)

summary(final_20[,c("bag.pred_test_20", "mmod.pred_test_20")])
##       bag.pred_test_20     mmod.pred_test_20
##  No Tournament:304     No Tournament:309
##  R64          : 26     R32          : 26
##  R32          : 20     R64          : 15
##  E8           :  2     E8           :  2
##  S16          :  1     2ND          :  1
##  R68          :  0     R68          :  0
##  (Other)      :  0     (Other)      :  0

Appendix 10.1. Late Round Predictions

final_20[which(final_20$bag.pred_test_20=="E8"), c("TEAM", "bag.pred_test_20")]
##      TEAM bag.pred_test_20
## 1  Kansas               E8
## 3 Gonzaga               E8
final_20[which(final_20$mmod.pred_test_20=="E8"), c("TEAM", "mmod.pred_test_20")]
##      TEAM mmod.pred_test_20
## 1  Kansas                E8
## 3 Gonzaga                E8
final_20[which(final_20$mmod.pred_test_20=="2ND"), c("TEAM", "mmod.pred_test_20")]
##     TEAM mmod.pred_test_20
## 4 Dayton               2ND

Appendix 10.2. Big Ten Predictions

final_20[which(final_20$CONF=="B10"), c("TEAM", "bag.pred_test_20", "mmod.pred_test_20")]
##             TEAM bag.pred_test_20 mmod.pred_test_20
## 5   Michigan St.              S16               R32
## 8       Ohio St.              R32               R64
## 14      Michigan              R32               R32
## 15      Penn St.              R32               R32
## 19     Wisconsin              R32               R32
## 23        Purdue    No Tournament     No Tournament
## 26      Maryland              R32               R32
## 27     Minnesota    No Tournament     No Tournament
## 29      Illinois              R32               R32
## 30       Rutgers              R32               R32
## 31          Iowa              R64               R64
## 36       Indiana              R64               R64
## 116 Northwestern    No Tournament     No Tournament
## 159     Nebraska    No Tournament     No Tournament
options(op)

Last Updated: 08/31/2020

Thanks for Reading!

I hope you enjoyed. Have a great day.