As the 2013 season gets under way, it's a good time to look at how we fared in 2012 and release our latest modifications to the algorithm. A few things elicited strong comments during the year, but on the whole the feedback has been mostly positive.
For starters, we received congratulations after the WFTDA championships for how well we got the final rankings right. Of course it makes us feel good to have everyone agree with us, but it's worth remembering that no mathematical system is ever exactly right. We just got lucky this year in how the championship teams finished up. For a more rigorous analysis of our performance, we look at how the algorithm's predictions fared over all the bouts this year. As I did last year, I construct the following plot by comparing the actual DoS result of each of the 708 bouts in 2012 with the respective predictions from the algorithm (there were actually 743 bouts in 2012, but only 708 that did not contain an unranked team).
The x-axis shows the difference between the actual result and the prediction, from the perspective of the home team. A positive value means that the home team performed better than predicted and a negative value means that the home team performed worse. Of course a zero means that the prediction was exactly right. The height of each bar shows the number of bouts that had that result.
A quick comparison to how we did in 2011 shows that the 2012 version of the algorithm appears to be keeping closer to zero for more of the bouts. So that's good, we're moving in the right direction. The dashed line shows the same distribution from all the previous bouts that were used to train the algorithm – that would be all bouts from 2005 through 2011 Championships. Only the height of the curve was scaled to match the number of bouts in 2012. The fact that it fits the data quite well – for those who geek out on this sort of thing, a chi-squared test gives an 87% probability that these two are describing the same distribution – means that our mathematical assumptions of the statistical variations are holding.
Of course not everything was peaches and cream. A lot of people were concerned about a few glaring anomalies in which newly ranked teams appeared to be rated significantly higher than would seem credible. Internally we referred to this as the "VRDL effect", although I think we actually received more comments about Black-n-Bluegrass. I spent a while studying these and others in detail. Luckily there aren't too many of them and the self-correcting nature of the algorithm eventually gets everything back in balance. That being said, I was able to identify what appears to be a systematic effect whenever a team's first bout is a severe blowout. This can cause a new team to be under-rated as well as over-rated, depending on the direction of the blowout.
Needless to say, identifying an effect is much easier than fixing it. For the coming 2013 season, I've implemented a scaling parameter that appears to minimize the impact. There are still one or two anomalies, but on the whole the system looks to be more robust. At this point, I return to the statement that no mathematical system can ever be perfect. I will be watching along with everyone else to see how this latest version fares over the season.
So that's it. The same algorithm is still going strong and I continue to be amazed at how well it's performing. The only modifications have been to how we handle new/unranked teams. The parameters have been re-tuned with all the latest adjustments. I tried to keep the ratings in the same range as before, but there will be some slight variations, so don't be alarmed if your team goes up or down a few places.
As always, we welcome your feedback and encourage you to keep coming back. We've got some big things building behind the scenes that we'll be bringing out in the next few months.