Last week I examined whether the sum of Pro Football Focus’ (PFF) player grades correlates to a team’s number of wins over a season. It turns out it does, just not as well as other established metrics. This week I’m taking the examination a step further by testing whether, or to what extent, the sum of player grades correlates to single game outcomes. In other words, how often does the team with the higher sum of grades actually win? But first, let me address a few reader critiques from last week’s post.
Some readers mistook (or maybe I mislead) my analysis as a belief that this is what PFF grades are supposed to do: predict, project, or at least correlate to wins. This is not my belief; I simply wanted to test whether they could because I’m a curious cat. It’s why my friends call me "Whiskers". I talk more about this in the analysis below.
Modern Mojo (via owhatagooseiam)
I was also poking a little fun at PFF (more mild LOLs than ROTFLMFAOs) for their claim that "subjective grading allows us to bring intelligence to raw numbers." I don’t know exactly what they mean here... it seems kind of silly. In my opinion the "intelligence" is in the analysis and synthesis of raw numbers, not just the collection of them. I’m not sure the extent to which PFF is doing this, but that’s OK. I enjoy doing this kind of stuff. This does not, however, make me an expert… rather, I’m just a guy who’s probably watched The Best of Will Ferrell too many times.
Lastly, I’d like to reiterate that I’m using PFF’s data from 2008, which anyone can obtain by signing up for their free 30-day trial subscription. Granted, it’s not an ideal dataset to use in 2014, but PFF admits to not changing their "normalization factors" from year-to-year, purposely, in an effort to demonstrate how the "standard of play in the league changes over time." For the record, this is not a practice I agree with (see The Signal and The Noise by Nate Silver), but it allows for a level of analysis that, from my perspective at least, can still be applicable today. And since I’m not paying money for raw data, this is relatively convenient.
"We all know the moon is not made of green cheese…"
Returning to the task at hand, do PFF grades correlate well to single game outcomes? Yes, actually, they do. I looked at this in two different ways. First, since the teams with the most points win, I used linear regression to determine how strong the relationship is between the PFF Grade margin and Points margin. Second, I used logistic regression to test how strongly the PFF Grade margins relate to wins and losses (regardless of points scored). Since time will eternally be an enemy of mine, my dataset includes a random sample of 68 games from 2008, four games from each week (rather than the entire season’s 256-game population).
For the linear regression test, PFF Grade margins correlate moderately well with point margins (r-squared = .434). This is weaker than the relationship that PFF grades have with the number of wins during a season (.603). We see this result because there is more variation within single games than the season as a whole. And to put these numbers in perspective, the correlation between time of possession per drive and point margins is an even weaker .325 (using data from 2009 through 2012). Click the images to enlarge.
"It’s a simple question… Would you eat the moon if it were made of ribs?"
But what if we remove points from the equation? Do PFF grades correlate to a simpler, dichotomous win/loss outcome? Yes, they do. Teams with the higher sum of player grades win 73.5% of the time. To dig even deeper, we need to shift from linear to logistic regression, which produces probabilities of outcomes rather than r-squared values. As it turns out, for every one unit the sum of player grades is greater than an opponent’s, a team increases its chances of winning by 8.7%. (Again, for comparison’s sake we can look at time of possession. For every one minute of possession greater than an opponent’s, a team increases its chance of winning by 22%.) This may not seem like a whole lot, and it’s not. When talking about sums of grades though, one unit doesn’t really tell us much. In practice there is a wider range of PFF grades between teams. So if we create PFF grade margin "buckets" ten units in size, we can see a greater effect.
The results here are quite a bit different. For every ten units the sum of player grades is greater than an opponent’s, a team increases its chances of winning by nearly two-and-a-half times! This is an incredible result and shows how PFF grades can potentially be used aside from ranking players within position groups. It also seems to be an argument in favor of the sum of grades having more meaning.
"Hey… How about this mad cow disease?"
But, there is something wrong here. Notice in the graph above where the PFF Grade margin is zero? The odds of winning are just 23%. Shouldn’t it be closer to fifty? This occurs because in the 68-game sample, there are 15 games in which the PFF Grade margin was within ten units, and during those games only six teams (40%) with the higher grades actually won (evidence, perhaps, that a team is indeed greater than the sum of its parts). In contrast, there are 39 games in which the grade margin is greater than 20, and in these games the teams with the higher grade margins won a whopping 92% of the time.
Since these values represent correlations and not one-way causations, we need to consider "grader bias". Based on the Win Probability graph, it could be possible that graders demonstrate a certain degree of bias (sub-consciously or otherwise), allowing the score of the game to impact their grade assignment. For example, graders may grade more conservatively during games where the winning team is less certain, or they may grade more favorably the players on teams who are in a better position to win. It could also be possible that PFF’s normalization factors, as currently used, are not properly taking into account team performance. Given the strength of these relationships, the PFF team is clearly doing something right, but their intention to not modify their normalization factors seem to be inhibiting the potential of their grades. Ideally, I think, we would want the Win Probability graph to look like this:
Another consideration is the inherent nature of the grades themselves. They are best used to compare players in like positions, not players across them. In other words, a quarterback with a +2 grade will most definitely have a greater impact on the outcome of a game than a right guard with a +2 grade. Yet, these sums don’t make that distinction. A +2 is a +2. In other words, there are no weights assigned.
"We’ve covered a lot of ground, shared a few laughs…"
So what to make of all this? Potential is really the name of the game. Pro Football Focus grades are primarily used to rank players of like positions but there is great potential for so much more. Sums of player grades describe pretty well the outcomes of individual games as well as the success of a season. However, "pretty well" is not "excellent" and certainly not "gospel". The rubrics used for grading are most likely not fool-proof and the normalization techniques need to be routinely revisited. But there is that potential…
Where most advanced NFL metrics fail is with projecting outcomes for future games and success in future seasons. Football Outsider’s DVOA probably does this the best, but it’s still not as good as baseball (sample sizes make a big difference). But in a statistical universe where PFF player grades could potentially be transferred from one team to another after free agency, where grades could be weighted by position and normalized based on team and opponent performance, we could be entering a new frontier of advanced NFL stats. "Could" being the operative word. I could also be the Harry Caray of Eagles cheerleaders...