We Made 10 Million Computer-Generated March Madness Brackets. Will Any of Them Be Perfect?

There’s a secret to winning your March Madness pool that Vegas doesn’t want you to know: All you have to do is enter 9,223,372,036,854,775,808 brackets, and you are certain to pick perfectly. Once.

There are 63 games in the NCAA tournament, not including the four play-ins. Filling out a bracket involves making an up-or-down decision for each one. Of course, after the first round of 32 games, not everyone guesses the outcome of the same remaining 31 matchups since it depends on who you’ve predicted will advance.

This means there are, in fact, 9 quintillion — hold on, let me count those digits again — yes, 9 quintillion possible outcomes for the tournament, or 2^63. By way of context, that’s 1.2 billion outcomes per human being on Earth, or 21 times the number of seconds since the Big Bang. Even if you reduced every bracket to just 63 bits, the size of a computer file containing every possible outcome would be about 72,500 petabytes, which is many times larger than the Internet itself, by most estimates.

Which is not to say we can’t still try to crack March Madness with what computer scientists call a “brute-force attack,” in which one tries to solve a problem by testing every possible solution rather than gaming it out methodically. (Like, say, cracking a password by trying thousands and thousands of possibilities until one works.) So rather than making any attempt to fill out a bracket wisely, I stayed up one night writing a short program that generates about 1,000 brackets a second, weighted slightly toward plausibility. By noon the next day I had over 10 million entries, none of them the same.

As the the tournament progresses, I’ll be scoring each of these March Madness brackets and comparing the best-performing entry to public data on how the nation is faring on popular bracket sites. I’m genuinely curious whether any of my entries stand out or whether even a few million attempts is insufficient to break through the noise. (We’re not putting our money where our algorithmically-generated mouth is, of course, seeing as $10 million in entry fees might raise a few questions with the expense department.)

I suspect the odds would be slightly greater in my favor if I had carved out more time to embolden the code with data from the regular season. College basketball dilettantes like USA Today’s Jeff Sagarin have developed techniques over the years that take into account all sorts of factors, like the difficulty of a team’s schedule and each game’s proximity to its home campus.

As it stands, the only factor my code weighs is the difference in the seeds of the two teams. When two teams have the same seed, the algorithm forks, Sliding Doors style, and guesses that both teams will win in parallel brackets.

I definitely wouldn’t want to bet any money against Sagarin, but a computational experiment that TIME ran last year lends some confidence to relying on seeding. As much as people complain about the NCAA selection committee, a 1,000-trial live simulator we developed found that, if you just always choose the higher seed to win and flip a coin when the top seeds face off in the Final Four, over time — a long time — you generally either win or lose a small net amount when pitted against colleagues who rely more on historical data.

Whether this experiment succeeds will probably hinge on the Madness Quotient this year, and the more dramatic the better. Had I tried this last year, about 3 percent of my brackets would have chosen UMBC to defeat Virginia (my alma mater) in the first round. That’s not very many, but that’s the beauty of delegating the work to a computer: It can spend all night guessing and only has to be generally correct once.

Methodology

Every matchup begins with 50-50 odds. For any matchup between unevenly seeded teams, the odds are adjusted to favor the lower-seeded team by about 3 percentage points for every 1-point difference in the seeds. This means a team that is one seed lower than its opponent has 53% odds of winning, while a first-round faceoff between the top-seeded team and the 16-seed will go to the 1-seed 97% of the time.

This generates a sensible spread of championships based on seeds: 10% for the four top-seeds, 6.1% for the 2-seeds, all the way down to between 5 and 7 total victories each for the 16-seeds out of 10 million trials.

Our source code is available on GitHub.

Leave a Reply