What is the Big Data Bowl?
The NFL held their second Big Data Bowl during the 2019 NFL season, in which contestants were provided with the NectGen Stats tracking data for all 22 players at the time of hand-off, and build a model to build a probabilty distribution of how many yards we would expect the rusher to gain. More information on the competition is available at the competition homepage. After the competition, they made the data from the winning submission available for public use. This article combines the data with play-by-play data from the nflfastR package to get player and team information. We use this data as an attempt to seperate individual running back performance from their offensive line, situational usage, and scheme, in order to gain an edge in evaluating running backs for fantasy performance in 2020.
Some caveats to keep in mind while reading:
- Data is only from Weeks 1-12, so players that might have broken out over the last 5 weeks (ie Miles Sanders?) or played a significant part of the season injured and got healthy later (Kamara, Barkley) might appear worse by some of these metrics.
- Running Back Speed is a variable used in predicting yards gained at hand-off. So it is possible that a player with higher speed at hand-off helps to elevate their expected yards above the situation around them, but then appear to underperform. However, we believe this effect to have little impact on the results, particularly since ball carrier speed at hand-off is also largely a result of play design (speed at hand-off on a stretch run will be greater than a dive up the middle)
- This says absolutely nothing about a running back’s receiving ability, which is also a very important part of football, both real and fantasy.
Now with that out of the way, let’s dive into the results
For evaluating running back performance, we follow a similar process to that of Michael Lopez (head of NFL Data & Analytics), where we look at running back performance based on probabilty of reaching a certain result, and not yards over/under expected. There are two key reasons for this. First is that yards over expected can be easily skewed by one outlier play, and it is not stable from sample to sample. The other reason is not all yards over expected are created equal. For example, on a well blocked play that already has a high expected yardage gain, gaining an additional 5 yards over expected could have the same probability as gaining an additional two yards on a poorly blocked play. For more detailed overview of the process and reasoning, click on Lopez’s methodology above.
Expected Yards At Hand-Off
First, we take a look at expected yards gained at time of hand-off. This measures how many yards we would expect the running back to gain given the position, speed, acceleration, and direction of all 22 players on the field at time of hand-off. This does not measure what the running back actually did after the carry. We plot the boxplot distributions for expected yards, in descending order of highest median expected yards to lowest
No player had a higher expected yards at hand-off in the AFC than Devin Singletary, however his counterpart Frank Gore was near the bottom of the list. Considering Gore was frequently used in short yardage situations and faced more stacked boxes, this finding makes sense. I point this out first to highlight that Expected Yards is not just an Offensive Line stat, but also a measure of scheme and situation. Also, it creates an interesting analysis for newcomer Zac Moss. If Moss is mostly used where Gore was used, we will unfortunatley be likely to see a low YPC for him. But if he takes a lot of the early down work that belonged to Singletary, he could be set for an efficient rookie season.
Melvin Gordon is walking into a great situation in Denver, as both Bronco rushers were in the top-5 in 2019. With the additions of Jeudy and Hamler, they could spread the defense out even more to create space for Gordon to run. Note that he and Ekeler were 6th and 7th in 2019, so this doesn’t look to be a significant upgrade.
Going to the bottom of the list, no back was given less to work with than Le’Veon Bell in New York. This should come as a surprise to nobody, as they hadone of the worst OL’s and had games where Trevor Siemian and Luke Falk were the starting QBs and defenses could stack the box at will.
I was surprised to see Derrick Henry in the lower half of this list. With the addition of Roger Saffold, and road grader Jack Conklin, I expected him to be near the top of this list. It is worth noting that LT Taylor Lewan (8th in run blocking, PFF) was suspended for 1/3 of this sample size, and Marcus Mariota was starting through week 6. If we split Henry’s stats into two samples, weeks 1-6 and weeks 7-12, then we see that for weeks 1-6 he had a bottom 4 expected yards at hand-off, but from weeks 7-12 he was in the top-10 in the conference. Note: his starting RT Jack Conklin, who was 7th in run blocking per PFF, is now in Cleveland, blocking for Nick Chubb, while Tennessee replace him with first round pick Isaiah Wilson
Both Matt Breida and Raheem Mostert had some of the highest expected yards at hand-off in the NFC. The 49ers clearly set up their RBs to succeed as well as anyone. While Kenyan Drake was in Arizona, he was also given some of the best rushing opportunities. And considering David Johnson is in the top-half as well, it seems clear that Drake is set-up for fantasy success in Arizona as long as he holds on to the starting job.
Miles Sanders and Jordan Howard also received great blocking in Philly, so Sanders should continue to see success on the ground. But note: Eagles guard Brandon Brooks is out for the season with a ruptured Achilles and is being replaced with 38 year old Jason Peters (who is a future HOFer but has also played Tackle his whole career).
The Cowboys Offensive Line – overrated? Zeke is in the bottom half of the league in expected yards at hand-off. How much of this is due to blocking, and how much is due the Jason Garrett’s stale offense, is yet to be seen. But blaming Zeke for lower rushing success, as we will see further below, seems faulty.
Gurley is stepping into a good situation in Atlanta, as Freeman had the second best expeted yards. But the situation in LAR wasn’t exactly bad last year either, as Gurley was still inside the top-7.
Now that we’ve analyzed the situations surrounding the backs, we will next look what the backs actually did with the ball in their hand.
Looking at the AFC first, we see that no running back did more with what they were given than Le’Veon Bell. So perhaps he isn’t washed after all. But we should note that it is possible that given Bell’s patient run style, he might lower his expected yards in this model by taking the hand-off at a slower speed, but then accelerate and finish over his expectation. But given what we know about the Jets offensive situation in 2019, this seems to be overanalyzing the data.
It is interesting to see Marlon Mack at the top of this list after the Colts felt the need to replace him with second round pick Jonathon Taylor. Given the strength of their offensive line, that backfield battle will be interesting to watch.
Not surprisingly, Nick Chubb and Derrick Henry, considered two of the best pure rushers in the NFL, are near the top of this list. Given the Browns’ investments into OL this year, and run-heavy scheme by new HC Kevin Stefanski, Nick Chubb is being set up perfectly to succeed in 2020. Even without much receiving work and competition with Kareem Hunt, he seems poised to be one of the league leaders in rushing, if not the rushing champ. As Derrick Henry showed last year, that is enough for a top-5 fantasy season.
Melvin Gordon was not particularly effective last season, but neither were his two new Bronco teammates. Gordon should easily hold off Royce Freeman, and the coaching staff seems to think of Lindsey as a change of pace/receiving back. As noted above, he is stepping into one of the best situations in the AFC.
I was a little surprised to see Joe Mixon at the bottom of this list, since I consider him to be one of the better backs in the league. But we should note again that this does not include receiving work, and does not include weeks 13-17, when he started to break out last year. With the offensive line and QB upgrades, I still fell safe drafting Mixon around as a tier 2 RB in 2020
Leonard Fournette is also not an effecitve rusher and his expected yards was in the bottom half of the league last year. Caveat Emptor.
As I alluded to earlier, Zeke Elliot was actually really good in 2019. He had the highest median result of anyone in the NFL. The only thing is big plays eluded him. With an improvement in blocking and/or scheme in 2020, he could be set up for his most efficient season since 2016.
Now before you see Kamara and Saquon towards the bottom of this chart and think this means it’s a garbage measure, consider: both these players were fighting through notable injuries for most of 2019, and in regards to Saquon, he’s a bit of a boom bust rusher who tries to bounce a lot of his runs instead of taking what the line gives him. While this makes for some exciting big plays, it also leads to frequent below average results.
Devonta Freeman is dust. He easily had the worst performance of any back in the NFL. This doesn’t really matter for fantasy purposes right now since he is unsigned, but should a team sign him, I still see no appeal to drafting him.
On a similar note, Gurley was not particulary effective in 2019, but he at least wasn’t complete dust. He is stepping into a good situation in ATL, and will be an improvement over Freeman.
On the new Miami backfield: Matt Breida had the highest expected yards at hand-off in the NFC, but comes in the bottom-5 of production. Meanwhile Jordan Howard was in the top-5 of production relative to expectation. While I am a fan of Breida, it looks as though the fantasy community has soured on Jordan Howard, despite him still being an above average rusher. I think there is a good chance he commands most of the early down work in MIA, while Breida becomes the third down or change of pace back.
Kenyan Drake wasn’t anything special in Arizona, at least at the median level. But note this a small sample size for him, only 4 games in ARI are inlcuded. As noted above, Arizona is still an elite fantasy situation for running backs, but this also makes Chase Edmonds a high-end handcuff who might still have standalone RB3 value even if Drake is healthy.
Next we look at breakaway runs, where breakaway is redefined to mean reaching the top-20%tile of expected outcomes. This helps separate from traditional "breakaway run" definitions which are based on yards, and can be highly dependent on scheme and OL. We plot expected yards gained on the x-axis, breakaway rate on the y-axis, and the size of the bubble represents the number of carries in the sample.
It is notable that Kenyan Drake in Arizona had one of the highest breakaway run rates, but was realtively pedestrian in this measure in Miami. Also, Devin Singletary had a high breakaway rate and high expected yards at hand-off. This also highlights how poor of a situation Bell was in last season. But what conclusions can we draw? Well, if this new breakaway rate is stable, then we would want to buy players with higher breakaway rates, particularly if their situation has improved. But if it’s not stable, then it might be a slight edge to fade the outlier breakaways.
So we break the Big Data Bowl results into two random samples, and find very minimal correlation between breakaway runs, even with this new definition, between the samples. This result holds whether we simply break the data into two random samples or if we break it into 1H/2H splits. So this lines up with some traditional fantasy analysis of fading players who were dependent on their big plays for a bulk of their production. And this is using two samples from the same season, let alone comparing from one season to the next. We should note that there is a little bit of a denomiator issue, given the small sample sizes available. But given previous research about the lack of stability of big plays year over year, we feel reasonably confident that similar findings would follow here as well.
But we need to find some measure that is stable, otherwise this would be nothing more than an exercise of descriptive statistics. We find that there might be some stability of median performance, depending on how we sample the data. If we simply split the data into two random samples, we see a moderately strong correlation between median results. But if we split the data into 1H vs 2H, we see a weaker but still existent relationship. This suggests that performance with this metric is stable within the season, and we believe there should be some stability year over year. Injuries obviously play a significant role in a back’s effectiveness, and this does not account for it whatsoever.
Lastly, we look at stability of expected yards at time of hand-off, and find this to be stable regardless of if split into random samples or 1H vs 2H. This isn’t too shocking, since scheme shouldn’t change a lot and it is not dependent on the performance of one player. So the takeaway is to buy effective offensive lines and schemes (I know, shocker).
Stability of Expected Yards gained at time of hand-off: