Whether you're an avid fan or haven't watched once, it's possible to gain a relative understanding of Ivy League basketball teams without detailed knowledge of the sport.
Looking solely at game data, we set out to forecast this year's Ivy League basketball season. With almost six years of game scores and our own Elo model, we've done just that.
At its roots, the Elo rating system is a simple way of ranking different teams and competitors. Physicist Arpad Elo designed the system in 1939 to rank chess players, and the Elo system has since been applied to everything from football to Scrabble.
In short, the model begins by assuming that all teams are average (in our case, a rating of 1,500). After every game, the winning team takes points from the losing team, proportional to the margin of victory. Upsets by weaker teams result in larger point gains than triumphs by favorites.
Our model begins in 2011. Heavily inspired by FiveThirtyEight's forecasts of the NBA and NFL, our algorithm takes the scoreline of every basketball game involving an Ivy team since the 2011-2012 season and uses it to determine an updated Elo rating for each team.
One of the most notable aspects of the Elo rating system is its ability to predict the likelihood of one team or competitor defeating another, given their difference in rating.
For example, a team like Yale, with 1,614 points, would be expected to beat Columbia, a team with 1,557 points, 58.1 percent of the time on a neutral court. But if the game takes place in Morningside Heights, the model, which accounts for home court advantage, projects the Lions will overcome the Bulldogs 51.2 percent of the time.
We should acknowledge that our model is flawed—all models are (though some more than others). We will elaborate on our individual pitfalls in later updates, but inherently the Elo system is imperfect for basketball. Rankings don't consider injuries, debuting first-years, or anything else that hasn't made its way into the historical data. That being said, the benefits of the system are an avoidance of biased analysis and a reliance on a consistent predictor: past performance.
For those who accept the flaws of our system, our ratings provide an estimation as to how the Ivy League season will go. Currently, our model ranks the Lions as the third-best team in the league, just above Harvard and two spots behind leader Princeton, which has gone 3-0 so far this season.
These grand, sweeping conclusions are based on individual game win expectancies, i.e., calculated probabilities.
What does this mean for our projections?
It means that our model lets us predict a most likely outcome. But in no way does that suggest that this outcome is likely (greater than a 50 percent chance) to happen over the remaining 49 Ivy League games.
Additionally, if we say that Columbia has a 51.2 percent chance of beating Yale and the Lions win, that doesn't mean we "got it right." What it means is that the statistical favorite determined by our model won. So if we ever say we "got it right," it simply means that our projected favorite won.
All stipulations aside, we'll use this system to generate weekly win expectancies and updated season projections. Stay tuned for our "Man versus Machine" updates, in which we will put our sports editor to the test against the algorithm. Then, at the end of the season, we'll be able to look back to see if our projections were accurate, or if we either missed the mark or ran into some luck.