Guest blog courtesy of N8 from Charm City.
The WFTDA has finally debuted its new Rankings system, complete with the corresponding Ratings to match. A lot has been discussed about the theories and potential consequences behind this new method, but we can give up all of that speculation, because now we have actual data.
I started playing around with the data and, with the help of Flat Track Stats, have found some interesting trends. Before I get into that, though, I want to mention that this analysis strictly looks at the math behind scheduling opponents, and that is obviously not the only factor that teams consider. In fact, it would appear that teams generally recognize that there is more to be gained from a bout than simply Average Ratings points. Playing against a good opponent is likely going to help a team improve their game, even if it means they might be hurting their potential seeding.
Part I: WFTDA only Trends
First, let's make sure we all understand what goes into the WFTDA Rankings. The mechanics of this system have been explored HERE and HERE, so I'll just talk about the pertinent parts. Teams are ranked, from highest to lowest, based on their Average Rating over the last 12 months. For each bout a team plays, they are awarded Ratings Points, and in general we can assume that if a team earns more Ratings Points than their current average, then their Average Rating will increase (not strictly true, since bouts older than 12 months will "fall off" and that can also affect the average). For regular season bouts, those points are proportional to only two variables: a team's Opponent's Strength and the Proportion of Points that team earned in the bout. The first element (Opponent's Strength) is determined strictly from their opponent's Ranking. This means in any given bout, the only variable is how many points a team earns. For example, if your Opponent's Strength is 1.00 and you want to earn at least a rating of 200 points for the bout, then we can calculate that you need to score at least 2/3rds of the total points.
Let's take a look at some examples. Carolina is currently ranked 31st (out of 155) giving them a Strength of 1.60, and they have an average Rating of 215.43. Silicon Valley is currently ranked 62nd, giving them a Strength of 1.20, and they have an average Rating of 153.24. If Carolina was to play Silicon Valley, Carolina would need to score 60% of the total points or more in order to maintain or raise their Average Rating (300*1.20*.60=216.0). Note that Carolina's number conversely means that they need to keep Silicon Valley to fewer than 40% of the total points. Silicon Valley, being the underdog, would need to score greater than 32% (300*1.60*0.32=153.6) of the points or more in order to maintain or raise their Average Rating. This means if Silicon Valley scores between between 32% and 40%, then both teams will be able to gain points.
Similarly, North Star is currently ranked 93rd, giving them a Strength of 0.80, and they have an average Rating of 109.44. If Carolina was to play North Star, Carolina would need to score 90% of the points or more in order to maintain or raise their Average Rating. North Star would only need to score 23% of the points or more in order to maintain or raise their Average Rating. Note here, however, that Carolina's number conversely means that they need to keep North Star to fewer than 10% of the points. This means if North Star scores between 10% and 23%, then neither team reaches their minimum and both teams will lose points.
It turns out that every possible matchup falls into one of two categories: either it's possible for both teams to increase their Average Rating, or it's possible for both teams' Average Ratings to decrease (obviously it's always possible for one team to gain and the other to lose). We've saved you the effort and worked this out for you, and even set it to a nice color coded image.
Fig 1: Possibilty of both teams increasing/decreasing their Average Rating depending on the outcome of the bout. Green means both can increase. Red means both can decrease. Higher ranks are top and left, lower ranks are bottom and right.
This image is a grid of all possible matchups between WFTDA teams as of the end of April. The upper left indicates the top ranked teams vs top ranked teams, and the lower right indicates the bottom ranked teams vs bottom ranked teams (the diagonal would be a team theoretically playing against themselves). In this image green represents bouts where it's possible for both teams to gain ratings points and red indicates where it's possible for both teams to lose ratings points.
One of WFTDA's stated goals with this system is to encourage teams to play opponents that they are close to in ranking. In general we can see this is true by the green areas being clumped along the diagonal of the graph, but there are some notable deviations from that feature. Right away we can see some obvious patterns. First of all, most of the matchups in the top 40 teams (Division I) fall in the red category. The reason for this is that most or all of those teams have earned some of their ratings points from tournament games, which are weighted more heavily. The result is that in order to keep that average in a regular season bout, they need to score a higher percentage of the points in the matchup. Thus instead of one team needing 55% and the other needing 40% to increase their rating, you get that one needs 55% and the other needs 53%. Next we see that if a highly ranked team plays a lowly ranked team, then it's also in the red. This is because the lower ranked team's Strength is so low that the higher ranked team needs to score a very high percentage of the points. In fact, in many cases, it's impossible to increase their Average Rating even if they score 100% of the points. Another obvious feature is the excess of green located in the bottom right corner of the graph. This extends beyond teams just playing each other, but is a result of the Opponent Strength being minimized at 0.50. So, playing the 155th team is worth the same as playing the 120th team. The red area around the 100 vs 100 region we aren't 100% certain on its explanation. We're pretty sure it has to do with the ratings being "bunched up", where a small change in rating can be a dramatic change in ranking; the ratings difference between 30 and 35 is about the same as the ratings difference between 80 and 105, but the Opponent Strength difference (which is based on Ranking) is about four times greater for the latter.
Part II: Adding Flat Track Stats' Predictions
Up to this point, all of this analysis is entirely from the WFTDA Rankings results. None of this has used any other prediction models. So, what can FTS add for us? Well, FTS has shown itself to be an excellent predictor of likely outcomes. In fact, you can give the model a pair of teams and a Score Proportion and it will tell you the odds that that team will earn that proportion against their opponent. For example, suppose Team A needs to earn 60% of the points against Team B in order to have their Average Rating increase. We can plug in each team's FTS Rating and find that FTS predicts the odds Team A scores at least 60% of the points is 22.4%. This isn't very likely. Meanwhile a different matchup that also needs 60% of the points might have a 79.9% chance. It all depends on which teams and what their relative FTS ratings are.
Similarly, we can also find what Team B needs to earn against Team A in order to increase their Average Rating. If both of these values are greater than 50%, then we would conclude that it is likely for both teams to earn ratings at the same time (it turns out this can only happen if both teams are able to increase in rating). Similarly, if both values are less than 50% then it is likely for both teams to lose ratings at the same time (again, this can only happen if both teams are able to lose in rating).
Now, armed with the FTS prediction model, we can modify our original graph and see which matchups are likely to have both teams gain (green), likely to have both teams lose (red) or likely to have one gain and one lose (white).
Fig 2: Flat Track Stats' prediction of what is likely to occur. Green is both teams increase. Red is both teams decrease. White is one team increases while the other decreases. Higher ranks are top and left, lower ranks are bottom and right.
What can we conclude from this? Well, if you accept FTS's predictions as a reasonable representation of what is likely to happen, then you can see there are some areas where matchups are green. This are where both teams are likely to gain, and if increasing their Average Rating is the only thing the teams are interested in, then they would be okay to play each other. For the top 40 potential matchups, this is true for between 5-10% of matchups. Similarly, the red matchups are where both teams are expected to decrease their Average Rating and if that is their only interest, then they should not play that matchup. Again, note that this is true for about 20-25% of top 40 matchups. Any that are white means that one team is likely to gain and the other is likely to lose. Since the bouts in the regular season are agreed upon by both teams, these should also probably not be scheduled if the only thing teams care about are increasing their Average Ratings (since one team is expected to lose ratings points).
In short, only the green pairs should theoretically play each other if the teams are only concerned with improving their Average Rating. As mentioned in the beginning of this article, this is not the only factor a league might consider when setting their schedule. Obviously all of the exact data will vary from month-to-month, but the overall trend would be generally expected to continue under the new Rankings System.