December 30, 2005
WasWatching.com Stat Glossary
One of the things that I've always wrestled with in doing this blog is the use of some of the sabermetric measures that I like to throw around. It's not me using them that's the issue - it's knowing whether or not that people understand the terms that concerns me.
Should I use the full term or are acronyms OK? Do I need to provide the definition each time that I use them? Stuff like that.
So, I've decided to create an entry here where I can list some of the terms that I use - and link to it at times (when I mention some of these sabermetric measures). It seems like a good Band-Aid now for this issue of mine.
Here are some of the terms that I use here frequently and the skinny on each:
Bases Per Plate Appearance [BPA]
The formula is (TB+BB+HBP+SB-CS-GIDP)/(AB+BB+HBP+SF).
Baserunners Per Nine Innings [BR/9]
The total number of batters reaching base against a pitcher divided by the number of innings pitched and multiplied by nine. It measures how many batters reach base on a per game basis against a pitcher.
A league best figure for this category is typically between 9 and 10.
Blown Saves [BS]
When a relief pitcher enters a game, he may be said to have a save opportunity if his team currently has the lead and he would be awarded a save if he finished the game. If, while he is pitching, his team loses the lead, either by way of the score becoming tied or by falling behind, that pitcher is said to have "blown the save." He is charged with a blown save, even if his team should eventually win the game, because he was entrusted with the responsibility to preserve his team's lead, and he failed to accomplish that.
Command Ratio [K/BB]
(Strikeouts / Walks) - A measure of a pitcher's raw ability to get the ball over the plate. There is no more fundamental a skill than this, and so it is accurately used as a leading indicator to project future rises and falls in other gauges, such as ERA. Command is one of the best gauges to use to evaluate minor league performance. It is a prime component of a pitcher's base performance value.
Benchmarks: Baseball's upper echelon of pitchers will have ratios in excess of 3.0. Pitchers with ratios under 1.0 -- indicating that they walk more batters than they strike out -- have low probability for long term success.
Defensive Efficiency Record [DER]
The rate at which balls put into play are converted into outs by a team's defense.
Game Score [G Sc]
A measure of pitching performance for starting pitchers. Developed by Bill James. The formula consists of eight parts:
1. Start with 50.
2. Add 1 point for each out recorded.
3. Add 2 points for each inning the pitcher completes after the fourth inning.
4. Add 1 point for each strikeout.
5. Subtract 2 points for each hit allowed.
6. Subtract 4 points for each earned run allowed.
7. Subtract 2 points for each unearned run allowed.
8. Subtract 1 point for each walk.
Consider this pitching line:
IP H R ER BB K
8.1 5 2 1 2 7
The game score for the performance shown would be 72 (50+25+8+7-10-4-2-2).
An average start would score 50. One start in 300 reaches a score of 90 or better, and an all-time great performance would reach 100.
Isolated Power [ISO]
A player's slugging average minus his batting average. Bill James provided its current name. Branch Rickey championed the stat, calling it "Power Average." A measure of a player's ability to hit for power considered apart from his ability to hit singles.
ISO = SLG - AVG
For an individual, ISO under .080 means he can be considered a singles hitter; ISO over .200 is very good power.
Neutral Losses [NL]
It is a projection for how many losses a pitcher would have if he was given average run support, considering the amount of actual decisions.
Neutral Wins [NW]
It is a projection for how many wins a pitcher would have if he was given average run support, considering the amount of actual decisions.
Offensive Winning Percentage [OWP]
A player's Offensive Winning Percentage equals the percentage of games a team would win with nine of that player in its lineup, given average pitching and defense. The formula is the square of Runs Created per 27 Outs, divided by the sum of the square of Runs Created per 27 Outs and the square of the league average of runs per game.
Park Factor [PF]
This is an estimate of a ballpark’s effects on batting and pitching and is expressed as either a decimal or a whole number. A neutral ballpark has a park factor of 1.00 or 100. Park factors are those used in many publications include three-year averages unless a ballpark was in use for fewer than three seasons. Park factors are also adjusted to reflect the fact that a batter or pitcher does not face his own team. Thus, different park factors are provided for a team’s batters and pitchers.
Sabermatricians (baseball statisticians) consider the ability to get on base (OBP) and the ability to hit for power (SLG) to be the two most valuable offensive abilities of a player. Thus one measure of a player's prime offensive talents, his "production" or PRO, is to simply combine OBP and SLG.
OPS = OBP+SLG
Pythagorean Winning Percentage [PW%]
Developed by Bill James, is the predicted winning percentage based on runs and runs allowed. The formula is as follows: Runs^2/(Runs^2+Runs Allowed^2)
Here is the calculation for the 1999 Yankees. The Yankees scored 900 runs and allowed 731 runs:
Thus, the Yankees would be predicted to have a .603 winning percentage. In actuality, the Yankees had a .605 winning percentage. A more precise calculation uses a factor of 1.83, but a factor of two works almost as well. From Pythagorean winning percentage it is possible to figure Pythagorean wins (PW) and Pythagorean losses (PL).
This is an unofficial measure of a defensive player's fielding ability. In effect, it indicates how many defensive chances a player is able to convert into outs on a per game basis. Range for 1B, C and pitchers is not a meaningful stat. It is calculated as:
RNG = 9*SC/INN
where SC is successful chances and INN is innings played on defense.
Performance differs by position. Typical season range factors are: 2B-4.5 to 6.0; 3B-2.0 to 3.3; SS-4.0 to 5.3; RF- and LF-1.5 to 2.5; CF-2.3 to 3.2.
Runs Created [RC]
A Bill James statistic. An estimate of the number of runs that a player would produce based on his offensive statistics. Runs created is an attempt to measure total offensive contribution in terms of runs (see also Runs Contributed). Divided by the runs required per win (in professional baseball, approximately 10), runs created becomes the total wins created by this player's offensive performance.
RC = ((H+BB+HBP-CS-GIDP) * (TB+ 0.26*(BB+HBP-IBB) + 0.52*(SB+SH+SF)))/(AB+BB+HBP+SH+SF)
Note: The formula shown here is the modern formula in current use by sabermetricians. Bill James created many variations of the basic formula to adjust for available data and other factors in bygone eras.
RC typically ranges from 0 to 120 in a 162-game season. Only players who play a lot can have a very high season total, since the number is dependent on total stats. For a team, runs created is a projected estimate of the runs the team should have scored given its number of hits (by type), walks, stolen bases, and times caught stealing. Comparing team runs created to actual runs scored gives an indication of other factors at work, factors that effect the efficiency of a team's offense. For instance, high efficiency -- consistently scoring more runs than projected -- could be explained by good clutch hitting, good baserunning, good managing, or good luck (or maybe cheating). The more consistent the two figures, the less luck is probably involved.
Runs Created Above Average [RCAA]
This is a Lee Sinins creation. It's the difference between a player's runs created total and the total for an average player who used the same amount of his team's outs. A negative RCAA indicates a below average player in this category.
Runs Created Per Game [RC/G]
Runs created is an accumulation stat; the more a player bats, the more runs he creates (assuming he makes some positive contribution). Converting runs created into runs created per game provides an indication of how valuable this player is to have in the lineup. RC/G is somewhat like ERA is for pitchers; it recasts the offensive contribution of the player in the context of a nine inning (in this case, 27 out) game. To calculate RC/G, multiply RC by 27 and divide by the number of outs the player is responsible for (OM), thus:
RC/G = 27*RC/OM
[Note: The formula shown here is the modern formula in current use by sabermetricians. Since data is available to account for all outs made, it is appropriate to use 27 outs as the context. In earlier periods, data on some kinds of outs (GIDP and CS are examples) are incomplete or unavailable. Consequently, applying the formula to other eras requires use of 25.5 or 26 outs per game.]
One way to look at RC/G is to imagine a lineup with the same player batting in every spot. A team made up of nine 1992 model Barry Bonds, for example, would be expected to score 11.34 runs per game on average. (Bonds had 147 runs created in 1992.)
Runs Saved Against Average [RSAA]
This is a Lee Sinins creation. It is the amount of runs that a pitcher saved versus what an average pitcher would have allowed. It is similar to the statistic Pitching Runs detailed in Total Baseball - except (1) both have different ways of park adjustments and (2) Total Baseball added a procedure to take into account the amount of decisions the pitcher had while RSAA does not. A negative RSAA indicates a below average player in this category.
Secondary Average [SEC]
Developed by Bill James to measure a player's offensive contributions beyond batting average. Secondary Averages of leagues are always very similar to the league batting average, but player secondary averages run from .100 (for truly inept offensive players) to upwards of .600. The formula is: (Total Bases-Hits+Walks+Stolen Bases)/(At Bats)
Total Average [TA]
A ratio of the bases a player accumulates for his team and the outs he costs his team. Total Average is a Thomas Boswell statistic included in his book "How Life Imitates the World Series."
TA = (TB+HBP+BB+SB)/(AB-H+CS+GIDP)
If a player has a TA over 1.000, that's very good.
Win Shares [WS]
A Bill James creation that aims towards allowing player evaluation across positions, teams and eras. It measures the total sum of a player's contribution expressed as one number.
Zone Rating [ZR]
STATS Inc. devised their own system of zones to track locations of batted balls. They use this data to measure a fielder's range in the field. Zone Rating areas of responsibility do not span the entire field -- some areas (for example, deep in the gap between CF and RF) are considered to be a "no man's land" that is ordinarily beyond the reach of fielders, and thus a ball hit there is not considered an opportunity.
Posted by Steve Lombardi at December 30, 2005 12:13 PM
Thanks Steve, this helps a lot. I will be printing these items out so I can refer to them and keep using them as I talk baseball.
Posted by: Scott Coulter at December 30, 2005 12:47 PM
Think you didn't include a definition of runs created.
Also, I'm feeling a little unclear about the usefulness of a linear comparison. One run created per game would be worth a lot more in the Gibson era than now.
Posted by: rilkefan at December 30, 2005 01:59 PM
That is, one should have a sense of how the distribution varies before using it.
Posted by: rilkefan at December 30, 2005 02:01 PM
Thanks for putting this up Steve. I was thinking of doing something like this myself but I'll just piggyback on yours and link over to this - hope you don't mind.
Posted by: James Varghese at December 30, 2005 02:17 PM
James, no problem at all.
Don't worry guys, I'll be adding to this as I go along.
rilkefan - you're right on RC. That's why RCAA is better.
Posted by: Steve Lombardi at December 30, 2005 03:48 PM
Steve, my point is valid I think for RCAA. The stat I'd like to see would be # of sigma(rcaa). It isn't hard to create 10 RCAA now, when many runs are created (I'm assuming the width is approximately proportional to the mean), but back in Gibson's day 10 was probably worth a lot, and a lot harder than now to achieve.
If I understand it right, it's equal. Since "above average" means there's a "base line" of "average" then it makes no difference that there's more R/G now than in other eras.
Posted by: Steve Lombardi at January 1, 2006 11:22 PM
DER and RC have just been added to the glossary.
Posted by: Steve Lombardi at March 26, 2006 10:39 PM