Applied Bracketeering, 2018: Streaky Clean

  1. March Madness Bracketeering
  2. Applied Bracketeering: Modeling March Madness
  3. Bracketeering update: Mascot randomness is beating the pants off RPI after round 2
  4. Applied Bracketeering: So, who saw that final four coming?
  5. Applied bracketeering wrapup: Highly-rated team wins in shocking finale
  6. Applied Bracketeering: Does our model also work for the NCAA Women’s tournament?
  7. Applied Bracketeering, 2018 Edition: Do streaks matter?
  8. Applied Bracketeering, 2018: Streaky Clean
  9. Bracketeering Sweet 16 update: The Infallible Braculator agrees to never speak of this past weekend again
  10. Bracketeering Final Four update: Round of the Usual Suspects (and Loyola)
  11. Bracketeering Finale: Much ado about nothing or A tale of four regions
  12. What countries punch above their demographic weight at the World Cup (and can this be predictive)?
  13. World Cup Predictions: in a bonkers first round of games, even the best models get just over 50%
  14. World Cup Predictions: Most models underestimate the chance of a tie.
  15. World Cup Predictions: Knockout round madness
  16. World Cup Predictions: The final countdown
  17. World Cup predictions wrap-up: Vive le France!
  18. The Insufferable Braculator™ Strikes Again. Can your NCAA Women’s Tourney predictions beat it?
  19. The Insufferable Braculator models NCAA Women’s basketball, chapter 2: Concerning chalk

By: Richard W. Sharp. Graphics: Patrick W. Zimmerman.

Before the nets come down, the brackets must go up!

And soon: the final play-in games have just ended, and tomorrow the big dance begins. More importantly, I only had until 10:00pm Pacific to put my money where my mouth is, submit a bracket, and hope to get lucky.1

Crunch the numbers, computer! Crunch like you’ve never crunched before!

Bracket deadline at The Bounty
The Bounty does not accept late submissions.

The Model: Polls + Hot Finish + Rando

And I will get lucky sooner or later, that’s the beauty of a randomized model, and it will be sooner, not later, if I’ve done it right. Now we tried this last year, adding some noise to the result of the polls in order to cause some (not too crazy) upsets, and the results were pretty good for rando. However, the reason for adding noise the model is that it can “cause” upsets, or rather, an upset from the model has no true cause other than luck.

This year we hope to improve on that by adding an actual causal feature: how did a team finish? We hypothesize that a team on a winning streak late in the season has strength going into the tournament: their stars are healthy, the bench is deep, and they click. The feature we used is each team’s win percentage over the last games before the tournament. Now this won’t help out any of the top seeds: they just won their conference tournaments, so of course they finished strong!  But it does have potential to boost a lurking 11 seed that had a poor start to the season.

The finish feature is combined with the another part of our model: your poll of polls. Using historical weekly polling data from the second half of the seasons between 2010 and 2017, we trained a composite model to predict the poll position of each team next week. This is the consensus position we start from before adding in finish and random features. It’s our version of playing chalk. 

Finally, we added an injury feature. In a hurry, last night. Following reports that No 1 Virginia’s guard De’Andre Hunter is out for the tourney with a broken wrist we decided we needed a quick fix. We kept it simple here and simply set Virginia’s finish score to 0. They’re still highly rated due to their position in the polls, but this opens the door for red hot Cincinnati which has won 8 of the last 10 games including 6 of the last 6.

The Brackets

Less talk, more brackets! Well here they are, published and timestamped on github. We ran several scenarios by choosing combinations of the following factors

  • Noise strength: 0.05, 0.15
  • Finish strength: 0.1
  • Finish length: 6 games, 10 games
  • UVA finish penalty: yes, no

For each scenario that includes the random feature, we ran 200 trials. We want a model that can generate great brackets some of the time. This means we accept the fact that it’s going to produce stinkers too. We’re here to win, not to play chalk.

Our 6 scenarios

Click on each image to go to the full bracket

Loud & small finish (0.15 noise, last 6 games)

Loud & big finish (0.15 noise, last 10 games)

Rando Calrissian (0.15 noise, finish not considered)

Quiet & small finish (0.05 noise, last 6 games)

Quiet & big finish (0.05 noise, last 10 games)

Poll of Polls (The Chalk, for our purposes)

What’s next?

Just like last year, we’ll be checking in after the opening round of games, the Sweet 16, and the Final Four to see how our various models performed vs reality and it’s maddeningly small sample size.

1 That would be bracket_a01_b15_s10_pen_0003.csv, available here, for the curious.^

About The Author

Richard is a Seattle area data scientist who builds predictive models and the services that deliver them. He earned a PhD in Applied and Computational Math from Princeton University, and left academia for the dark side of science (industry) in 2010, following his wife to the land of flannel. Fan of coffee, beer, backpacking and puns. Enjoys a day on the lake fishing, and, better, cooking up the catch for a crowd.

No Comments on "Applied Bracketeering, 2018: Streaky Clean"

Leave a Comment