WFTDA Rankings and FTS Expectations | Roller Derby Stats & Rankings

June 11th, 2013 by N8

Guest blog courtesy of N8 from Charm City.

The WFTDA has finally debuted its new Rankings system, complete with the corresponding Ratings to match. A lot has been discussed about the theories and potential consequences behind this new method, but we can give up all of that speculation, because now we have actual data.

I started playing around with the data and, with the help of Flat Track Stats, have found some interesting trends. Before I get into that, though, I want to mention that this analysis strictly looks at the math behind scheduling opponents, and that is obviously not the only factor that teams consider. In fact, it would appear that teams generally recognize that there is more to be gained from a bout than simply Average Ratings points. Playing against a good opponent is likely going to help a team improve their game, even if it means they might be hurting their potential seeding.

Part I: WFTDA only Trends

First, let's make sure we all understand what goes into the WFTDA Rankings. The mechanics of this system have been explored HERE and HERE, so I'll just talk about the pertinent parts. Teams are ranked, from highest to lowest, based on their Average Rating over the last 12 months. For each bout a team plays, they are awarded Ratings Points, and in general we can assume that if a team earns more Ratings Points than their current average, then their Average Rating will increase (not strictly true, since bouts older than 12 months will "fall off" and that can also affect the average). For regular season bouts, those points are proportional to only two variables: a team's Opponent's Strength and the Proportion of Points that team earned in the bout. The first element (Opponent's Strength) is determined strictly from their opponent's Ranking. This means in any given bout, the only variable is how many points a team earns. For example, if your Opponent's Strength is 1.00 and you want to earn at least a rating of 200 points for the bout, then we can calculate that you need to score at least 2/3rds of the total points.

Let's take a look at some examples. Carolina is currently ranked 31st (out of 155) giving them a Strength of 1.60, and they have an average Rating of 215.43. Silicon Valley is currently ranked 62nd, giving them a Strength of 1.20, and they have an average Rating of 153.24. If Carolina was to play Silicon Valley, Carolina would need to score 60% of the total points or more in order to maintain or raise their Average Rating (300*1.20*.60=216.0). Note that Carolina's number conversely means that they need to keep Silicon Valley to fewer than 40% of the total points. Silicon Valley, being the underdog, would need to score greater than 32% (300*1.60*0.32=153.6) of the points or more in order to maintain or raise their Average Rating. This means if Silicon Valley scores between between 32% and 40%, then both teams will be able to gain points.

Similarly, North Star is currently ranked 93rd, giving them a Strength of 0.80, and they have an average Rating of 109.44. If Carolina was to play North Star, Carolina would need to score 90% of the points or more in order to maintain or raise their Average Rating. North Star would only need to score 23% of the points or more in order to maintain or raise their Average Rating. Note here, however, that Carolina's number conversely means that they need to keep North Star to fewer than 10% of the points. This means if North Star scores between 10% and 23%, then neither team reaches their minimum and both teams will lose points.

It turns out that every possible matchup falls into one of two categories: either it's possible for both teams to increase their Average Rating, or it's possible for both teams' Average Ratings to decrease (obviously it's always possible for one team to gain and the other to lose). We've saved you the effort and worked this out for you, and even set it to a nice color coded image.

Fig 1: Possibilty of both teams increasing/decreasing their Average Rating depending on the outcome of the bout. Green means both can increase. Red means both can decrease. Higher ranks are top and left, lower ranks are bottom and right.

This image is a grid of all possible matchups between WFTDA teams as of the end of April. The upper left indicates the top ranked teams vs top ranked teams, and the lower right indicates the bottom ranked teams vs bottom ranked teams (the diagonal would be a team theoretically playing against themselves). In this image green represents bouts where it's possible for both teams to gain ratings points and red indicates where it's possible for both teams to lose ratings points.

One of WFTDA's stated goals with this system is to encourage teams to play opponents that they are close to in ranking. In general we can see this is true by the green areas being clumped along the diagonal of the graph, but there are some notable deviations from that feature. Right away we can see some obvious patterns. First of all, most of the matchups in the top 40 teams (Division I) fall in the red category. The reason for this is that most or all of those teams have earned some of their ratings points from tournament games, which are weighted more heavily. The result is that in order to keep that average in a regular season bout, they need to score a higher percentage of the points in the matchup. Thus instead of one team needing 55% and the other needing 40% to increase their rating, you get that one needs 55% and the other needs 53%. Next we see that if a highly ranked team plays a lowly ranked team, then it's also in the red. This is because the lower ranked team's Strength is so low that the higher ranked team needs to score a very high percentage of the points. In fact, in many cases, it's impossible to increase their Average Rating even if they score 100% of the points. Another obvious feature is the excess of green located in the bottom right corner of the graph. This extends beyond teams just playing each other, but is a result of the Opponent Strength being minimized at 0.50. So, playing the 155th team is worth the same as playing the 120th team. The red area around the 100 vs 100 region we aren't 100% certain on its explanation. We're pretty sure it has to do with the ratings being "bunched up", where a small change in rating can be a dramatic change in ranking; the ratings difference between 30 and 35 is about the same as the ratings difference between 80 and 105, but the Opponent Strength difference (which is based on Ranking) is about four times greater for the latter.

Part II: Adding Flat Track Stats' Predictions

Up to this point, all of this analysis is entirely from the WFTDA Rankings results. None of this has used any other prediction models. So, what can FTS add for us? Well, FTS has shown itself to be an excellent predictor of likely outcomes. In fact, you can give the model a pair of teams and a Score Proportion and it will tell you the odds that that team will earn that proportion against their opponent. For example, suppose Team A needs to earn 60% of the points against Team B in order to have their Average Rating increase. We can plug in each team's FTS Rating and find that FTS predicts the odds Team A scores at least 60% of the points is 22.4%. This isn't very likely. Meanwhile a different matchup that also needs 60% of the points might have a 79.9% chance. It all depends on which teams and what their relative FTS ratings are.

Similarly, we can also find what Team B needs to earn against Team A in order to increase their Average Rating. If both of these values are greater than 50%, then we would conclude that it is likely for both teams to earn ratings at the same time (it turns out this can only happen if both teams are able to increase in rating). Similarly, if both values are less than 50% then it is likely for both teams to lose ratings at the same time (again, this can only happen if both teams are able to lose in rating).

Now, armed with the FTS prediction model, we can modify our original graph and see which matchups are likely to have both teams gain (green), likely to have both teams lose (red) or likely to have one gain and one lose (white).

Fig 2: Flat Track Stats' prediction of what is likely to occur. Green is both teams increase. Red is both teams decrease. White is one team increases while the other decreases. Higher ranks are top and left, lower ranks are bottom and right.

What can we conclude from this? Well, if you accept FTS's predictions as a reasonable representation of what is likely to happen, then you can see there are some areas where matchups are green. This are where both teams are likely to gain, and if increasing their Average Rating is the only thing the teams are interested in, then they would be okay to play each other. For the top 40 potential matchups, this is true for between 5-10% of matchups. Similarly, the red matchups are where both teams are expected to decrease their Average Rating and if that is their only interest, then they should not play that matchup. Again, note that this is true for about 20-25% of top 40 matchups. Any that are white means that one team is likely to gain and the other is likely to lose. Since the bouts in the regular season are agreed upon by both teams, these should also probably not be scheduled if the only thing teams care about are increasing their Average Ratings (since one team is expected to lose ratings points).

In short, only the green pairs should theoretically play each other if the teams are only concerned with improving their Average Rating. As mentioned in the beginning of this article, this is not the only factor a league might consider when setting their schedule. Obviously all of the exact data will vary from month-to-month, but the overall trend would be generally expected to continue under the new Rankings System.

Comments

Posted by 3j0hn on 06/12/13, 08:25am

This is excellent analysis and visualization!

I would like to see additional shadings for the currently white matchups where the higher ranked team is mathematically certain to decrease no matter the score ratio. (I believe it is always possible for the lower ranked team to increase, and it is never the case that either team will be mathematically certain to increase)

Posted by N8 on 06/12/13, 08:42am

Thanks.

As to your question, I might not be understanding you correctly. The only mathematically certain region will be a subset of the red region. This would be cases where even if one team scores 100% of the points (so opponent scores 0 points) the higher ranked team still sees their Rating decrease. I've made a quick image to show this. Black indicates it is impossible for the higher ranked team to gain. Black = guaranteed to lose ratings points

Posted by Breadman on 06/30/13, 03:14pm

Hey! A rocket! LOL

Posted by nocklebeast on 06/12/13, 09:32pm

I with the axes were labeled on the graphs.

Posted by csmash on 06/13/13, 07:06am

Thank you for this article. I think it will help explain some things to the league!

Posted by JustAnotherRollerFan on 10/09/13, 01:30pm

What, am I missing something? The ranking calculation is done monthly. Although there are not posted, rankings are recalculated every month. It’s kind of a problem that rankings are only released 5 times a year. You posted this on June 11th using the April 30th rankings, correct? WFTDA did not release the June rankings until July 12. With regard to WFTDA points, we can’t, in June, accurately calculate the correct Opponent Power value for teams with the April 30 rankings data. Carolina may have been ranked 31th in the April 30 rankings, but they certantly were not at that spot at the time that you posted on June 11th. Going into any bout, for the teams involved, I don’t see how one will be able to calculate the Proportion of Points each team will need to score to raise the team’s average WFTDA ranking when the bout played say 5 or more days after the latest rankings are released. The data changes every week. When a team bout scheduler get a call requesting a bout, how does that person calculate the odds that the bouts would be a wise one to take?

Posted by N8 on 10/09/13, 01:55pm

"When a team bout scheduler get a call requesting a bout, how does that person calculate the odds that the bouts would be a wise one to take?"

How does a bettor decide which teams to bet on? The odds are given for how you expect a team to perform, and then you take into account all of the extra information that you have access to and attempt to make your best guess. If a team wants to play you, you look at their rankings, and you consider their skill, and you look at your ratings, and then you try to gauge where teams will be. Do you think they're underranked, and thus likely to go up? Or maybe they're overranked, and likely to go down? Maybe you think your team is really clicking this year, and that you'll be able to outperform what the rankings expect you to do?

This certainly isn't a magic 8-ball that can predict the future. There obviously isn't one. But this is an additional tool that can help you out more than simply going by intangibles alone.

Posted by JustAnotherRollerFan on 10/10/13, 07:36am

Yes, it’s a great tool, but the value of the tool is diminished the further we get from the latest release of the official WFTDA rankings. The currently posted rankings are June 30th – 102 days ago! WFTDA have calculated, but not posted, the rankings at the end of every month. I wish we had that information. Take the May 18th Charm vs. Carolina game for example. An excellent bet for Charm given that Carolina was over rated (due to retirements from the previous year). Charm won the bout 347 to 42, but how would we determine the effect of that bout on Charm’s WFTDA score? Prior to the bout, one would have had to use the February 28th rankings, because the April 30th rankings were not released until May 20th – Worthless. OK, on Monday, May 18th, you want to use the April 30th rankings to determine how Charm did given their wins over Carolina, Blue Ridge, and Killamazoo. Oh crap, the April 30th rankings do not take into account the Charm bout against Columbia on May 11th and the impact of every other bout that weekend. The WFTDA rankings are released just too infrequently. Wonder why that is?
Thanks for the article anyway. Going into Champs, I’ll be able to use the rankings data that will be released this month to know ahead of time what each team needs to do in each bout to advance their rankings. It helps me enjoy the game more – especially if it’s not a close bout.

Posted by JustAnotherRollerFan on 10/16/13, 11:32am

Rats - I was wrong! I won't be able to use the Sep 30th rankings to calculate the % of points a team needs to improve their average score. The problem is that the points from last years Champs age-out prior to the 2013 Champs.
By-the-way, the new rankings are posted:
http://wftda.com/rankings

Update: The monthly Rankings calculations are released to the WFTDA membership - So the teams have that information.