In the 1985 Baseball Abstract, Bill James introduced two very
important tools for predicting the future performance of baseball
players. The first was the Major League Equivalency, or MLE, which
demonstrated that it was possible to use AAA stats to predict
how a player is most likely to perform at the major-league level.
The second was the detailed description of his Brock2 system for
projecting future performance based on past performance and the
aging process (improving in the early years, peaking in the middle,
and declining thereafter).
Since then, many others have taken these ideas and implemented
projection systems of their own, most often for the purpose of
helping fantasy baseball players prepare for their fantasy drafts.
You can now buy projections from fantasy advisor services and
you can find them online and in many books and periodicals.
When we began our work on our 1998 Projection Disk, we had
every intention of licensing projected stats from one of the established
providers. There didn't seem to be much point in reinventing the
wheel when we could instead focus on the game software, the team
rosters, and manager profiles. But we quickly learned that these
projections didn't meet the special needs of a full-season simulation,
for a variety of reasons:
- we needed projections for over 1100 players, including many
players who have yet to make their major-league debuts, and most
of the other sources don't do that many
- we wanted to use stats from the majors, AAA and AA, to make
sure we had as much playing time as possible on which to base
our projections. Some other projection systems ignore minor league
stats or go down only as far as AAA.
- to support a full-blown simulation, we needed to project
many more statistical categories than the others provide
- we needed to project left/right splits as well as overall
totals for all batters and pitchers
So we decided to build our own projection system. Actually,
we expanded on a system that we originally developed in 1994.
When that season came to a premature end, the TOPPS baseball card
company hired us to simulate the missing games so they could produce
CyberCards with full-season stats (real life stats through August
11 and simulated stats for the rest of the season). To their credit,
they wanted to include prominent minor-leaguers (Derek Jeter was
one) who would have been called up in September had the season
continued. So we developed a method for projecting major-league
performance from minor-league statistics.
The Diamond Mind projection system is based on Bill James'
MLE and aging ideas, though it uses different and more advanced
formulas than those in the 1985 Baseball Abstract. We can
do better because of the explosion in available data from both
the major and minor league level. Several companies now compile
play-by-play data for major-league games. And Howe Sportsdata
now compiles minor-league park factors in addition to the official
stats they've collected for decades. Here are the key elements
in our system:
- we use both minor-league and major-league statistics from
the past three seasons, ensuring that virtually all players have
a large amount of playing time on which to base the projections
- we use both AA and AAA statistics, and adjust both to their
major-league equivalents. Bill James' published formulas cover
AAA adjustments only, so we created our own AA adjustments.
- all stat lines are evaluated with respect to league averages.
This does two important things. First, it makes sure that stats
from hitter-friendly leagues such as the Pacific Coast League
are suitably deflated. Second, it ensures that pitchers who faced
the DH and those who didn't are evaluated properly. The DH adds
roughly a half-run a game, and if one doesn't take this into
account, NL pitchers would be rated better than their AL counterparts
of equal ability.
- all stat lines are adjusted for ballpark effects, including
the minor-league parks. The published Bill James methods do not
take minor-league park factors into account because they simply
weren't available at the time.
- recent performances are weighted more heavily. What a player
did in 1997 is much more important than what he did three years
ago.
- performances at higher levels are weighted more heavily.
What a player did in the majors is much more important than what
he did in AA ball.
- stat lines with more playing time are weighted more heavily.
If someone batted .375 in 24 atbats, that doesn't matter nearly
as much as what he did in 400 atbats at some other stop along
the way.
- the individual league- and park-adjusted stat lines are averaged
(using the weights just discussed), then age-adjusted to produce
a set of projected stats that are league- and park-neutral
- these neutral projections are then applied to the league
and park in which the player will compete in the coming season
- the overall stats are converted into left/right splits based
on each player's composite splits for the past three seasons
- the distribution of fly balls and ground balls is based on
actual ratios compiled by each player in the past three seasons
That's the essence of the system. The other projection systems
we looked at make some of these adjustments, but we're not aware
of any that make them all. And we think it's necessary to make
them all in order to evaluate past performance correctly and to
support a realistic simulation of a season.
Of course, only time will tell whether the added detail of
this approach will produce meaningful gains in predictive power.
We did pretty well in our first season, and we believe the added
depth and precision of our system is a step forward, but we can't
prove it until we have several years of experience with the system.
So we'll keep at it, monitoring the results and refining the system
over time, and we'll report back to you from time to time as we
go along.