I am a huge baseball fan. It is, by far and away, my favorite sport. On some nights, I’ll watch an East Coast game followed by a West Coast game on MLBtv (which, by the way, is incredible. No, I do not work for them). I also listen to baseball podcasts: one on the way to work and one on the way back. In short, I’m a bit fanatical about the sport. While I was listening to one of these podcasts the other day, I heard the host and his guest talking about regression towards the mean. At first, they were doing it in the way that pretty much everyone else does: “Player A is really hot right now, and is playing way over his head. We’re bound to see some regression towards the mean.” This is not wrong, per se, and I’ll get to more on that in a bit. Here’s what you never hear: “Player A is really underperforming right now, hitting way below his historical average. We’re bound to see some regression towards the mean.” Why don’t we ever hear that line? Simply put, it is because most people don’t understand the difference between “regression” and “getting worse when you’ve been playing really, really well.”
You might be saying to yourself: “Okay, smart guy, If that’s not regression then what is?”
In a nutshell, regression is using one or more known variables (predictor variables) and their associated values (in addition to error terms) to create an equation that predicts one or more scores on an outcome variable (the criterion variable).
By itself, regression as a standalone term doesn’t actually mean that much. That’s because there are many types of regression, including (but not limited to): multiple, ordinal and binomial logistic regression. Generally speaking, when people talk about “regressing towards the mean” they are (sort of) referring to some form of the general linear model. I won’t go into the finer points of what that is, but what is usually actually happening is that someone is making a mostly heuristic decision about a player’s future performance based on their past and present performance. There is nothing wrong with doing this. Without knowing it, most of us engage in some kind of decision making or evaluation like this, and we do it all the time. It is not correct, though, to call our decisions or predictions about an outcome “regression” when what we’re really talking about is a decrease in performance (or whatever the case is).
I’m just sayin’. Okay, back to baseball.