Faculty voice:

Anjana Susarla: Analytics behind March Madness bracketology

March 14, 2018

Anjana Susarla is an associate professor in the Department of Accounting and Information Systems in the College of Business

It has been estimated  that at least a quarter of American adults fill in a bracket for March Madness. Yet, given the range of possible match-ups in the weeks of the tournament, it has been suggested that the odds of randomly filling out a completely perfect NCAA tournament bracket are about one in nine quintillion.

Each and every year, statisticians and mathematicians use quantitative insights to predict winning teams and brackets. This could include a variety of metrics, like factoring in the relative frequency, or rarity of upset victories. For instance, the popular website BracketOdds analyzed data and found that until now, a team with a No. 16 seed has never beaten a No. 1 seed in the round of 64. Models like these help statisticians make predictions, but are not perfect.

While there are a number of mathematical models, Masseyratings contends that three critical metrics matter most when filling out your bracket: the score, location and date of every game that really matter.

An alternative strategy bracketologists take is to look at key elements that correlate most closely to winning: shooting, turnovers, rebounding and free throw proficiency.

With a revolution underway in Artificial Intelligence and Machine Learning, and computing powered through the Cloud, anyone could build predictive models that can tweak the above metrics. For instance, Google Cloud, in partnership with the NCAA, allows anyone to ask questions such as, "Do players dunk more if they have 50,000 followers?" which can be answered using machine learning capabilities. Machine learning can also be used to classify "upsets."

In other words, it is possible to take a predictive model with a few key metrics, and tweak it to incorporate additional data sources and additional quantitative insights. Such predictions can be further refined by aggregating multiple forecasts, which is the chosen approach of a Team Rankings.

Once the NCAA announces the seeds - a ranking of the teams in the tournament - we can use historical data from seed teams, along with probabilistic information from betting markets, and combine this with data from sites like Yahoo, where millions of people submit their bracket picks. Using genetic algorithms, Team Rankings tries to pick winners by pitting them against a series of simulated pools and tournament outcomes.

As data suggests, Kansas is a highly-beatable No. 1 seed for 2018 NCAA March Madness. Bracketologists from SportingNews say that the data suggests the Jayhawks look primed for an early exit yet again, ranking 34th in 4FS, doomed by virtually zero production from the free throw line and poor rebounding. Using the same data set, our MSU Spartans will get to the Elite 8 and fall to Duke, and Cincinnati will end up taking the national championship.

You might have a one in nine quintillionth chance of getting a perfect bracket, but following some data science and predictive modeling might get you a few steps closer.