Jump to content

tiger337

Winter of Sabermetrics

Recommended Posts

The traffic has been getting a little better the last couple of days.

Check out billfer's blog too. :happy:

You really do look like Rony Seikaly. Are you Lebanese like him?

Share this post


Link to post
Share on other sites

You really do look like Rony Seikaly. Are you Lebanese like him?

No, I'm Irish! Can't you tell? Actually I'm half Irish and half Greek. Seikaly is also half Greek.

Share this post


Link to post
Share on other sites
No, I'm Irish! Can't you tell? Actually I'm half Irish and half Greek. Seikaly is also half Greek.

No that you mention it, the red hair and freckles do give it away.

Share this post


Link to post
Share on other sites
Yes - check out his blog. Lee rocks, and if he gets discouraged because nobody is reading his blog he might quit - and that will suck.

We are lucky to have people like billfer and Lee on this site. Visiting their blogs is a real treat. Do it!

Share this post


Link to post
Share on other sites
We are lucky to have people like billfer and Lee on this site. Visiting their blogs is a real treat. Do it!

I've visited and bookmarked Rony Seikaly's. Its good stuff.

Share this post


Link to post
Share on other sites

Tiger337 - i really appreciate the ERC analysis. German is so much worse than his ERA really shows... i wonder what we do with this guy now? i have to think that he gets replaced this winter, if we are serious about improving our bullpen. if we are lucky, maybe we can convince some team that he still has potential and get something in return, but i'm not counting on it.

Share this post


Link to post
Share on other sites
Yes - check out his blog. Lee rocks, and if he gets discouraged because nobody is reading his blog he might quit - and that will suck.

I read Lee's blog all the time.

(But don't tell him it's because I think his picture is dreamy...)

Oops! Stupid writing out loud...

Share this post


Link to post
Share on other sites
I read Lee's blog all the time.

(But don't tell him it's because I think his picture is dreamy...)

You know, the first few days after I put my up my picture, I had a significant drop in hits.

Share this post


Link to post
Share on other sites
In case you are wondering, they didn’t do similarity scores for Chris Shelton and Curtis Granderson because their careers are too short at this point.

Here are their BP PECOTA projections, which are a little different than the sim scores:

Note also that these are not updated so the analysis does not include their 2005 seasons.

Shelton:

1) Ed Sprague

2) Greg Goossen

3) Mo Vaughn

4) Adam Piatt

5) Bob Watson

6) Joe Cunningham

7) George Scott

8) Ken Singleton

9) Darrell Evans

10) Mike McFarlane

Some nice names there including Mo Vaughn and Bob Watson. It's probably a little conservative, as Shelton's nice 2005 in the bigs is not included in the analysis.

Granderson's PECOTA comparables are nothing to write home about except for:

11) Bobby Abreu

15) Ray Lankford

Which will also change after the update as well

Share this post


Link to post
Share on other sites
Your question could not be completely answered by stats but somebody could start by looking at results for batters with Podsednik (or some other runner) on base and Podsednik not on base. Do hitters have different results in those two situations?

I'm not sure if this strategy would work, because if a runner was on base already, more often than not the batters will be facing a subpar pitcher who let them get on base in the first place, thereby biasing the results. A better way to do this would be to compare only batters with runners on first base, and see if there is a difference between when the runner is a stolen base threat and when he is not. Of course, you would have to define which baserunners constitute a SB threat, then. But it is definitely doable.

Share this post


Link to post
Share on other sites
I'm not sure if this strategy would work, because if a runner was on base already, more often than not the batters will be facing a subpar pitcher who let them get on base in the first place, thereby biasing the results. A better way to do this would be to compare only batters with runners on first base, and see if there is a difference between when the runner is a stolen base threat and when he is not. Of course, you would have to define which baserunners constitute a SB threat, then. But it is definitely doable.

That's a good point. You could do it your way but I still think that you have somehow make a comparison between what happens with a runner on and a runner not on. You could do it my way but make adjustments for the quality of pitcher. I think I'd want to look at it your way and my way. There are different ways you could answer the question about what happens. I don't think there is any way just by looking at play by play data

as to why things happen.

Share this post


Link to post
Share on other sites
Your question could not be completely answered by stats but somebody could start by looking at results for batters with Podsednik (or some other runner) on base and Podsednik not on base. Do hitters have different results in those two situations? It would be a lot of work but the data exists to look at all sorts of similar situations. This would not answer your specific question about "hitters hitting bad pitches" but it would give you some clue as to whether hitters hit better or worse with Podsednik on base. If you did find out that batters hit better with Podsednik on base, the next question would be why? That's a question that would require input from the players playing the game.

I think a third variable..........."results for batters with someone other than Podsenik on base" is also necessary. If you just look at Podsednik on base and Podsednik not on base it lumps in situations when NOBODY is on base and situations where there are runners on base.

There's a possibility that just having runners on base can have an effect on performance. I think to isolate the Podsednik effect, the third category should be added.

Share this post


Link to post
Share on other sites
So they are doing some small ball things and those things are standing out in the national media. However, they are a team that relies heavily on homeruns. They may be able to manufacture runs but that's not what they are about. Their offense is about homeruns more than anything else.

Nah, it's pitching... They play in a homer friendly park, but that goes for both teams. It's pitching. The Tigers aren't winning due to the increase in home runs alone.

That said, I just got into the whole Sabermetrics thing recently. I've been studying statistics at the graduate school level for a bit now and thought it would be fun to apply it to my favorite pasttime. In any case, enough can't be said about the limitations of the data. Up to a point, it just doesn't have that much explanatory power. Perhaps it provided an advantage a few seasons ago, but now everyone has more or less the same expertise. But I still look forward to it, if only to improve my own intuition of the game.

Share this post


Link to post
Share on other sites

That said, I just got into the whole Sabermetrics thing recently. I've been studying statistics at the graduate school level for a bit now and thought it would be fun to apply it to my favorite pasttime. In any case, enough can't be said about the limitations of the data. Up to a point, it just doesn't have that much explanatory power. Perhaps it provided an advantage a few seasons ago, but now everyone has more or less the same expertise. But I still look forward to it, if only to improve my own intuition of the game.

I think sometimes too much is said about the limitations of data. It depends on who is doing the talking. Anyway, it is a great way to learn more about the game. I applied stats to baseball in grad school as well. It was at about the same time I learned about Bill James. It definiely changed the way I looked at the game.

Also, I know the White Sox did it with pitching last year just like the Tigers are this year. I was saying that the White Sox offense scored runs more because of homeruns than anything else.

Share this post


Link to post
Share on other sites
I think sometimes too much is said about the limitations of data. It depends on who is doing the talking. Anyway, it is a great way to learn more about the game. I applied stats to baseball in grad school as well. It was at about the same time I learned about Bill James. It definiely changed the way I looked at the game.

Also, I know the White Sox did it with pitching last year just like the Tigers are this year. I was saying that the White Sox offense scored runs more because of homeruns than anything else.

I have noticed that the team with the better pitcher on the mound usually wins. I know that's not real scientific but it seems to play out like that. ANd you know my old 70% theory. But is their a way you could rank each phase of the game and it's value to victory's over a 162 game season. An interesting study would be, what is the perfect makeup of a 25 man roster going into it. Do you need this ERA? This fielding percentage. This amount of home runs, etc. This OPS.

Share this post


Link to post
Share on other sites
Up to a point, it just doesn't have that much explanatory power.

This is probably fair, but it's much pretty true about any way you want to evaluate the game. When I read hardball times, though, or some of the other "stats" publications I sometimes do get the feeling that they get away from the fact that it's still going to come back to your theory and your assumptions. And if you're theory and your assumptions don't make sense (like their article ranking baseball GMs) then you're data isn't going to fix it for you.

-Tony

Share this post


Link to post
Share on other sites

The 36th annual SABR convention just was held in Seattle.

Here is a link to a presentation from there titled:

“Do Players Outperform In Their Free-Agent Year?"

http://www.philbirnbaum.com/

If there are others that are posted online, I will relay them.

Share this post


Link to post
Share on other sites

Not sure if any of you all get it, but on Discovery Science, there was a one hour show on Sabermetrics. I don't remember the company name, but they showed a company that specializes in sports stats (I would believe mainly baseball) and trying to gather more relevant stats on players. Pretty interesting stuff, even for only watching about the last ten or fifteen minutes of it. Math just isn't my subject, but from the looks of it, the math involved doesn't seem that hard.

Share this post


Link to post
Share on other sites

I was rooting around my Sinins encyclopedia when I ran across something I thought was rather interesting, if inconsequential.

Know how we always take for granted that the AL is the hitting league, and the NL is the pitching-and-defense league? Well, if you take pitchers out of the equation, you find that in the last several years, the NL has routinely outhit the AL.

Here is how the two leagues have fared against one another since WWII:

YEAR	AL OPS	NL OPS	Adv'ge	By
1946 .717 .706 AL .011
1947 .720 .756 NL .036
1948 .753 .741 AL .012
1949 .756 .749 AL .007
1950 .783 .765 AL .018
1951 .741 .747 NL .006
1952 .720 .724 NL .004
1953 .743 .773 NL .030
1954 .729 .769 NL .040
1955 .743 .757 NL .014
1956 .757 .748 AL .009
1957 .733 .748 NL .015
1958 .728 .759 NL .031
1959 .730 .755 NL .025
1960 .740 .733 AL .007
1961 .750 .757 NL .007
1962 .745 .747 NL .002
1963 .716 .696 AL .020
1964 .721 .712 AL .009
1965 .705 .712 NL .007
1966 .698 .722 NL .024
1967 .677 .700 NL .023
1968 .659 .666 NL .007
1969 .715 .714 AL .001
1970 .726 .748 NL .022
1971 .706 .706 AL .000
1972 .671 .704 NL .033
1973 .710 .722 NL .012
1974 .694 .714 NL .020
1975 .707 .720 NL .013
1976 .681 .703 NL .022
1977 .735 .747 NL .012
1978 .711 .715 NL .004
1979 .743 .734 AL .009
1980 .731 .716 AL .015
1981 .693 .704 NL .011
1982 .730 .715 AL .015
1983 .728 .722 AL .006
1984 .724 .711 AL .013
1985 .733 .715 AL .018
1986 .737 .726 AL .011
1987 .759 .758 AL .001
1988 .715 .696 AL .019
1989 .709 .699 AL .010
1990 .715 .728 NL .013
1991 .724 .711 AL .013
1992 .713 .706 AL .007
1993 .745 .749 NL .004
1994 .779 .771 AL .008
1995 .771 .761 AL .010
1996 .795 .761 AL .034
1997 .770 .767 AL .003
1998 .773 .764 AL .009
1999 .788 .795 NL .007
2000 .794 .797 NL .003
2001 .763 .780 NL .017
2002 .757 .763 NL .006
2003 .762 .771 NL .009
2004 .773 .778 NL .005
2005 .756 .767 NL .011
2006 .778 .785 NL .007

What I find interesting here are the patterns. The NL clearly dominated from the early 50s to the late 70s, and by frequently substantial margins. That all changed in what appears to have been 1979, when the AL began to dominate. I think I can sort of explain NL dominance in the earlier period -- they were the league that absorbed the great minority players quicker. But I'm not really as clear on why the AL dominated 17 of the next 20 years. When I look at the top 25 hitters of the period (minimum 5,000 PAs):

OWP                             OWP      PA     
1 Frank Thomas .787 5501
2 Barry Bonds .760 8100
3 Jeff Bagwell .741 5071
4 Mark McGwire .738 6314
5 Edgar Martinez .724 5259
6 Mike Schmidt .703 6231
7 Ken Griffey Jr. .703 5982
8 Rickey Henderson .702 11530
9 George Brett .673 8510
10 Albert Belle .673 5329
11 Will Clark .673 7482
12 Wade Boggs .671 10406
13 Pedro Guerrero .668 6107
14 Tim Raines .667 9972
15 Jack Clark .665 6966
16 Keith Hernandez .662 6599
17 Fred McGriff .662 7299
18 Tony Gwynn .661 9534
19 Darryl Strawberry .657 6260
20 Craig Biggio .651 6686
21 Dwight Evans .643 7785
22 Rafael Palmeiro .641 7590
23 Danny Tartabull .641 5842
24 Eric Davis .640 5460
25 Lenny Dykstra .639 5282

I don't look at this list and say, well, this is overwhelmingly American League, so that explains it. It looks evenly split among players who did most of their time in the AL, those who did most of their time in the NL, and a few who did a lot of time in both.

Then, suddenly in 1999, we take see a turn toward the NL. At first I would knee-jerk attribute this to the Barry Bonds effect, but when you review Bonds' effect on the equation, you see that it is no better than .0025 to .0035 in his best years, and the NL advantage is greater than that just about every year.

As I think this through, two things occur to me:

- The balance of hitting shifted after expansion in each league. The AL expanded in 1977, and took over hitting dominance in 1979. The NL expanded in both 1993 and 1998, and they took over dominance in 1999. It's not a perfect theory, since Colorado started in 1993 and the AL still retained hitting supremacy. I suppose I could do more stats stuff to figure it out.

- There were a lot of bandbox ballparks opening up in the NL in the past several years. In addition to Colorado, new hitters' paradises opened up in Philadelphia, Houston, Cincinnati and Arizona. In the AL, only in Texas and Tampa Bay have hitters' parks opened. This might be a better explanation of the difference.

Another variable, of course, is the pitching and fielding. Have the combination of these gotten markedly worse in the NL, more so than the AL? That's certainly a possibility, maybe even probability. It would take a whole lot of number crunching to figure that out. Or maybe there's a syndicated source I can get this, I don't know. I'll need to look.

Just a tidbit to get Winter of Sabermetrics 2006-07 going.

Share this post


Link to post
Share on other sites

In the new 2007 Bill James Handbook, there is a great article about Manufactured Runs. So many "baseball men" talk about its importance, but how important is it to scoring runs, anyway?

Bill James and Steve Moyer (from Baseball Info Solutions) discussed this at length and determined that a "manufactured run" (MR) is "at least one-half created by the offense doing something other than playing station-to-station baseball". By definition, it does not involve extra-base hits, although it can involve a string of singles.

There are two types of MR:

MR1: results from deliberate acts such as bunts and stolen bases;

MR2: results from other things like infield hits, taking advantage of the defense, advancing extra bases on hits, advancing and outs and throws, etc.

The key difference, as far as I can tell, is that MR1 generally result from manager (strategic) decisions, while MR2 result from player (tactical) decisions.

They also developed 12 rules as to what constitutes an MR, which i won't go into here. However, the book did list the 30 teams and how many manufactured runs they scored in 2006:

Lg	Team	MR	MR1	MR2
AL MIN 224 84 140
AL LAA 190 80 110
AL KCA 186 57 129
AL BAL 184 74 110
AL SEA 176 46 130
AL TAM 166 63 103
AL NYA 163 59 104
AL CHA 160 49 111
AL BOS 147 39 108
AL OAK 142 37 105
AL TEX 142 36 106
AL TOR 139 42 97
AL CLE 139 43 96
AL DET 124 41 83

NL COL 198 81 117
NL WAS 185 85 100
NL CHN 175 99 76
NL FLA 175 69 106
NL LAN 172 72 100
NL ARI 169 56 113
NL PIT 167 58 109
NL STL 164 43 121
NL HOU 161 67 94
NL NYN 158 61 97
NL MIL 151 44 107
NL PHI 148 33 115
NL SFN 144 53 91
NL ATL 143 50 93
NL SDN 141 68 73
NL CIN 135 57 78

The Twins were the best in the AL at manufacturing runs; the Tigers were last. It's not that the Tigers didn't try their share of little ball -- they did -- but they also clearly didn't score on the player-oriented tactical stuff as much as the others.

OK, so beyond this basic table -- does MR correlate to scoring lots of runs? Let's find out.

Here is the correlation of MR, MR1 and MR2 to Runs Scored:

Lg	Team	Runs	MR	MR1	MR2
AL LAA 766 190 80 110
AL OAK 771 142 37 105
NL HOU 735 161 67 94
AL TOR 809 139 42 97
NL ATL 849 143 50 93
NL MIL 730 151 44 107
NL STL 781 164 43 121
NL CHN 716 175 99 76
AL TAM 689 166 63 103
NL ARI 773 169 56 113
NL LAN 820 172 72 100
NL SFN 746 144 53 91
AL CLE 870 139 43 96
AL SEA 756 176 46 130
NL FLA 758 175 69 106
NL NYN 834 158 61 97
NL WAS 746 185 85 100
AL BAL 768 184 74 110
NL SDN 731 141 68 73
NL PHI 865 148 33 115
NL PIT 691 167 58 109
AL TEX 835 142 36 106
AL BOS 820 147 39 108
NL CIN 749 135 57 78
NL COL 813 198 81 117
AL KCA 757 186 57 129
AL DET 822 124 41 83
AL MIN 801 224 84 140
AL CHA 868 160 49 111
AL NYA 930 163 59 104

Correl to runs
Runs MR MR1 MR2
Majors 1.00 (0.19) (0.36) 0.12
AL Only 1.00 (0.42) (0.38) (0.02)
NL Only 1.00 (0.15) (0.40) 0.17

Interesting -- looks like there is a decent negative correlation between Runs and MR1 -- the scoring from bunts and stolen bases. That is, the more runs a team scores from MR1, the fewer runs they score overall. That's likely because these managerial decisions generally trade outs, or risk outs, for single runs, and the more a team employs these tactics, the fewer runs they will score overall -- because the more outs they are making on purpose.

There does not seem to be much if any correlation between MR2 and runs scored, since these are player decisions generally made on the fly, based on their judgment, and are risking outs much less than MR1 does.

While I broke out each league as well as looked at the majors in general, I did not see a significant difference between the two leagues.

The other key thing I wanted to try to determine was: do certain types of teams tend more toward MR, MR1 and MR2, based on their batting average, on base percentage and slugging percentage? I did the same correlation analysis for these attributes, and here are the overall results:

Correl	   AVG	   OBP	   SLG	  iOBP	   ISO
MR 0.23 (0.01) (0.36) (0.26) (0.60)
MR1 (0.02) (0.20) (0.37) (0.22) (0.45)
MR2 0.34 0.20 (0.11) (0.12) (0.36)

In addition to AVG/OBP/SLG, I also looked at isolated slugging percentage (ISO, or SLG-AVG), and what I am calling iOBP (no, it's not a new MP3 player -- it's a shorthand way at getting at isolated on-base percentage, or OBP-AVG)

What we see here is pretty clear, I think: a high batting average can correlate softly to MR, particularly MR2 -- but high on base and high slugging teams do not correlate highly to MR. And it is high OBP and high SLG that correlate to run scoring in general -- which is the opposite of the negative correlation of MR to runs scored.

Conclusion: team that spend a lot of time trying to manufacture runs will not score a lot of them. Perhaps lack of ability to score many runs is the cause of small ball -- but then again, it could be the effect of small ball as well. Someone who knows regression analysis better than I could take that one on.

Share this post


Link to post
Share on other sites

- There were a lot of bandbox ballparks opening up in the NL in the past several years. In addition to Colorado, new hitters' paradises opened up in Philadelphia, Houston, Cincinnati and Arizona. In the AL, only in Texas and Tampa Bay have hitters' parks opened. This might be a better explanation of the difference.

This was my first re-action when I saw the chart. I think ballparks are often a big reason for changes in run production throughout baseball history. As you say, they've been adding a lot of hitter friendly parks to the NL in recent years.

Share this post


Link to post
Share on other sites

As for the manufactured run analysis, I'd like to see them break it down by game situation. Manufacturing runs early in the game might not be a good idea whereas manufacturing runs late in a close game might be a good thing (maybe).

Just introducing the concept is a great start though. This has the potential for very interesting analysis. For example, which kinds of teams benefit more from manufacturing runs?

Share this post


Link to post
Share on other sites

okay, I am going to sound like a dummy in this thread where I still only understand about half of the concepts you guys talk about. Someone last spring made a projection (on another thread) of how many hits the Tigers would have in 2006 and it blew everyone away. Of course that was including a few guys (Nook Logan for one) who was not on the team. I did not count up the hits for the 2006 season (but I am sure one of you has that with a click of a mouse). Where did we actually fall according to that projection that was made? Also, with the addition of Soriano, have you guys started making any projections for 2007? I love this stuff, just don't have a clue about much of it.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...