By: Richard W. Sharp. Graphics: Patrick W. Zimmerman.
Before the nets come down, the brackets must go up!
And soon: the final play-in games have just ended, and tomorrow the big dance begins. More importantly, I only had until 10:00pm Pacific to put my money where my mouth is, submit a bracket, and hope to get lucky.1
Crunch the numbers, computer! Crunch like you’ve never crunched before!
The Bounty does not accept late submissions.
The Model: Polls + Hot Finish + Rando
And I will get lucky sooner or later, that’s the beauty of a randomized model, and it will be sooner, not later, if I’ve done it right. Now we tried this last year, adding some noise to the result of the polls in order to cause some (not too crazy) upsets, and the results were pretty good for rando. However, the reason for adding noise the model is that it can “cause” upsets, or rather, an upset from the model has no true cause other than luck.
This year we hope to improve on that by adding an actual causal feature: how did a team finish? We hypothesize that a team on a winning streak late in the season has strength going into the tournament: their stars are healthy, the bench is deep, and they click. The feature we used is each team’s win percentage over the last games before the tournament. Now this won’t help out any of the top seeds: they just won their conference tournaments, so of course they finished strong! But it does have potential to boost a lurking 11 seed that had a poor start to the season.
The finish feature is combined with the another part of our model: your poll of polls. Using historical weekly polling data from the second half of the seasons between 2010 and 2017, we trained a composite model to predict the poll position of each team next week. This is the consensus position we start from before adding in finish and random features. It’s our version of playing chalk.
Finally, we added an injury feature. In a hurry, last night. Following reports that No 1 Virginia’s guard De’Andre Hunter is out for the tourney with a broken wrist we decided we needed a quick fix. We kept it simple here and simply set Virginia’s finish score to 0. They’re still highly rated due to their position in the polls, but this opens the door for red hot Cincinnati which has won 8 of the last 10 games including 6 of the last 6.
The Brackets
Less talk, more brackets! Well here they are, published and timestamped on github. We ran several scenarios by choosing combinations of the following factors
- Noise strength: 0.05, 0.15
- Finish strength: 0.1
- Finish length: 6 games, 10 games
- UVA finish penalty: yes, no
For each scenario that includes the random feature, we ran 200 trials. We want a model that can generate great brackets some of the time. This means we accept the fact that it’s going to produce stinkers too. We’re here to win, not to play chalk.
Our 6 scenarios
Click on each image to go to the full bracket
Loud & small finish (0.15 noise, last 6 games)
Loud & big finish (0.15 noise, last 10 games)
Rando Calrissian (0.15 noise, finish not considered)
Quiet & small finish (0.05 noise, last 6 games)
Quiet & big finish (0.05 noise, last 10 games)
Poll of Polls (The Chalk, for our purposes)
What’s next?
Just like last year, we’ll be checking in after the opening round of games, the Sweet 16, and the Final Four to see how our various models performed vs reality and it’s maddeningly small sample size.
Notes:
1 That would be bracket_a01_b15_s10_pen_0003.csv, available here, for the curious.^
No Comments on "Applied Bracketeering, 2018: Streaky Clean"