Batted Balls and Better Data Part II

Batted Balls and Better Data Part I
Imagine a batter hits a long fly ball that’s destined for the right field seats only for the outfielder on the other team to clear the wall and rob him of his home run. In traditional stat sheets, this is treated the same way as any other out and there’s no real way of distinguishing that from a dribbler down the third base line. But intuitively we know that these are two very different things, and a batter who does more of the first is going to end up being more valuable than one who does more of the second. Thus, if we wanted to truly measure how well a player has performed, we need to separate the performance from the results.

The best way of doing that is to break down a batted ball in the most granular way possible and look at the average performance for similar batted balls. In the first post of this two-part series we showed how we can do that using quality of contact data from Fangraphs, and now we will turn to StatCast data to take things one step further. In this post, I will show how data on the type and velocity of batted balls from StatCast allow us to go one step further to more accurately quantify a hitter’s performance than ever before. I’ll walk through some of the theory and methodology, then use Cameron Maybin as an example of how to use the method. Finally, I’ll provide the tools you need to do this work yourself on any player you’d like.

THEORY AND METHODOLOGY
Former Seattle Mariners executive Tony Blengino has been doing a terrific series at FanGraphs showing how batted ball data can be used to better measure a player’s performance. Here’s an excerpt from the introductory post:

    Now that we’ve got the basic categories separated, let’s see how major league batters performed on each batted-ball type during the 2014 season.

    – Popup = .015 AVG-.019 SLG (7.7%)
    – Fly Ball = .275 AVG-.703 SLG (28.0%)
    – Line Drive = .661 AVG-.869 SLG (20.9%)
    – Ground Ball = .245 AVG-.267 SLG (43.4%)
    – All = .323 AVG-.496 SLG (100.0%)

While this general information is good, we know that not all fly balls are the same, and thus should not be treated as such. Blengino expounds upon this by showing how the velocity of the ball in play impacts these expected results, and in his second post, providing an exceptional amount of detail on fly balls. He gives a chart that divides fly balls into buckets of speed and direction, then gives information on the average batting average and slugging percentage for the fly balls in each bucket. Using that info from Tony, along with the batted ball velocities provided by StatCast, we can then calculate a batter’s expected batting average, on base percentage, and slugging percentage given normal results. All of that is simple enough, so let’s go ahead and apply this to one of the most recent addition to our Atlanta Braves.

CAMERON MAYBIN: .300 HITTER
Cameron Maybin has been a pleasant surprise this season, hitting 10 percent better than a league average batter, with four home runs and a 13.6% walk rate. I’d be more than happy with that as is from a center fielder, but I believe he’s been performing considerably better. In fact, were it not for some bad luck, Cameron Maybin would be the second most valuable hitter on our team and one of the top hitting center fielders in all of baseball.

According to StatCast, Maybin is hitting balls at an average of 92 MPH, third on our team behind Freeman and AJ Pierzynski. That is really, really good. His grounders are averaging 89 MPH, his liners 98 MPH, and his fly balls 87 MPH. Not only have his results been terrific, the more in depth data suggests they should be even better. To see just how much better, I applied the method referenced above, looking at all of his plate appearances for this year and calculating his expected AVG and SLG based on the batted ball type and the velocity. If that sounds a little confusing, here’s the results for his first ten plate appearances of the season.

Maybin first 10

On the left is the result of the plate appearance, followed by a descriptor of the batted ball and its velocity. On the right we have two sections: one for actual results and one for expected results. The actual results are the AVG, OBP, and SLG the play produced, and the expected results are the average AVG, OBP, and SLG produced by that batted ball type at the given velocity. 1 At the bottom you see the average of all the events, giving you his actual and expected AVG, OBP, and SLG over those plate appearances.  Doing this for a full season is just a matter of plugging in the rest of the data, yielding the following values. Note that I also converted these lines into wOBA and wRAA values, for the initiated readers.

New Maybin Stuff Updated and All That

It turns out Maybin is significantly under-performing relative to what his batted ball speeds would suggest, and most of it is being driven by his horrible luck on ground balls. He’s hit 23 ground balls this year, but only one has gone for a hit. If we assumed average results given his batted ball velocities on those ground balls, he should have 7 hits so far. Combine that with his expected performances on fly balls and line drives, and Maybin has hit the ball well enough to justify an expected .370 wOBA. Mike Trout, Joc Pederson, Adam Jones, and Charlie Blackmon are the only other center fielders who have as high a wOBA this season. Though he likely won’t hit this well all year, it’s safe to say a lot of folks- myself included- severely underestimated the talent we acquired with this player. Consider me a fan.

DATA LIMITATIONS
Now that we’ve seen how simple and useful this method is, we need to start talking about the limitations of it. First, the StatCast velocities may contain errors, and our current methods assume all 100 MPH ground balls are the same. Let’s take Maybin’s eighth plate appearance of the season as an example. Looking at the table above, we see that this was a ground ball that resulted in a fielders choice out. StatCast tells us it left Maybin’s bat at 100 MPH, and ground balls hit that fast usually result in a 0.532 AVG and 0.583 SLG. However, video of the play tells a slightly different story.

On the one hand, it did go directly to the first baseman. On the other hand, does that really look like 100 MPH off the bat? It sort of looks like it came off the end of his bat, and it went straight into the ground just ahead of the dirt. Are we comfortable saying that’s a ball that should be a hit over half the time? I’m not so sure.

Another issue is that the current data doesn’t include the direction of the batted balls. We can reference the table from earlier to see how much of a difference that makes on fly balls, and if you’ve kept up with the latest shifting fad you know that it makes a big difference on ground balls, as well. I was playing around with these numbers for Jason Heyward earlier, and based on his ground ball velocities we should expect him to be hitting close to .350 on grounders this season, rather than .242. However, when I looked at his spray chart I noticed a large portion of those ground balls were being pulled, which made me wonder how much stock to put into that expected average. The good news is that this shouldn’t be a huge issue for most players, and I believe there are ways to calculate direction using the BaseballSavant data. Blengino has been providing the information for how the direction affects our expected results for each type of batted ball, so in a future version of this tool I’ll incorporate these data.

The final issue is that the data also doesn’t consider a player’s speed. Guys like Dee Gordon and Billy Hamilton can turn a lot of would-be ground outs into singles, as well as would be singles and doubles into doubles and triples. I remember another post from Blengino back in January of 2014 that discussed what the impact on Mike Trout’s value would be if he had average speed, finding that his speed contributed to 16 additional hits than expected based on his batted balls. Most guys aren’t as fast as Trout, and thus won’t see that large of an impact, but until I come up with a way to adjust the expected numbers for this we’re going to have to allow for that small amount of error.

So is this method perfect? Does it really give us a crystal-clear view of how well a player has performed? Of course not. But the thing is, a perfect method of determining that doesn’t exist. There’s still some errors associated with doing this, but there are even larger errors when looking at results alone. Maybe Maybin hasn’t really performed like a .322/.407/.430 hitter this season, but he’s certainly performed better than a .229/.325/.429 hitter. We’re never going to nail down a perfect answer for some of these questions, but with better data we can continue to get closer and closer to it. Baseball has made a step in that direction this year, and I’m excited to see some of the other ways all of this data can be used.

DO IT YOURSELF
In the spirit of openness, I’d like to provide you, the reader, with the tools you need to create these expected slash lines for a player yourself. First, download my Macro-enabled Excel sheet, then head on over to BaseballSavant. When you get to BaseballSavant, click “PITCHf/x Search” at the top. On the left-hand side of the screen you will see a drop down menu next to “Player Type.” Make sure it is set to “Batter” rather than “Pitcher.” On the right-hand side, type in the name of the player you want to do the analysis for. Once you’ve selected a player click “Search PITCHf/x” at the bottom-left of the page. That will bring up the following results, and you will need to click the “Download CSV” button circled below to download the data.

How to Capture

Open both that file and the Excel sheet. In the Excel sheet, go to the “Buttons!” tab and click “Clear Everything.” Now, go to the data file from BaseballSavant, select all of the data (make sure you actually select all of it), and paste it into the “Raw Data” tab in the Excel sheet. Finally, go back to the “Buttons!” tab and click “Do it!!!!” to run the data and get your results. If you have any questions, drop a line on here or on Twitter!

  1. For batted balls that are missing velocity data, I simply assumed average results.

Stephen came up with the idea for this blog shortly after graduating from Tech. Realizing that life is ephemeral, he decided to put (metaphorical) pen to paper and catalogue his thoughts. His thoughts are series of numbers and spreadsheets, casually categorized as “research,” and said research is usually conducted on the margins of what is both relevant and socially acceptable.

Posted in Also Featured, Baseball, Sports Tagged with: , , , , , , ,

Leave a Reply