After going through another agonizing application, interview and match process to land myself a post doctoral fellowship (got a great one!), I took a long, hard earned hiatus from all non-essential academic activities. This included the ongoing process of getting my dissertation published. Now I’m back to it.
The depth and breadth of my dissertation is such that I divided it into two (maybe eventually three) documents, both of which will be submitted to a very competitive neuropsychology journal, which shall remain nameless for the time being. The manner in which I divided it meant that, unfortunately, I had to re-run many, many statistical analyses. Thankfully, I ALWAYS save my syntax, so most of it was a matter of merely adjusting some code and – Presto! – pivot tables galore (as a quick aside, to say that I recommend saving your syntax is an understatement, and it is far more accurate for me to say that it is requisite. Those of you who have/do not do this know all too well the pains of trying to recreate an analysis you conducted in the past using just drop-down menus, radio buttons, and the like). Okay, back on track. On top of re-running certain data, I also had to run some all new analyses. When I was finished with the primary stuff, I examined the outcomes and began writing them up. In short, one side of my hypothesis turned up ns (or non-significant for the uninitiated), with the other being significant at p < .05. No problem, this stuff happens all the time, and I’m not one of those risk-averse data fisherman given to “adjusting” my hypotheses so that the outcomes are significant. What I was concerned with, however, was whether or not either half of my findings were real. Sure, my p levels said some of it was, but as Jacob Cohen would likely point out, ‘so what?’.
Enter Power Analyses.
If you’re scratching your head at that last sentence, fear not – you’re not alone. In fact, you’re in the majority. A fairly recent study examined dozens of publications from a very high end, peer-reviewed neuropsychology journal and found that over 75% failed to report effect sizes (which are closely related to power statistics) of any kind [see: Schatz, P., Jay, K., Mccomb, J., & Mclaughlin, J. (2005). Misuse of statistical tests in publications. Archives of Clinical Neuropsychology, 20 (8), 1053-1059.]. So just what is “Power”? Mathematically, power is 1-β, where 1 represents the absolute certainty of an outcome, and β is the probability of committing a Type II error. A Type II error is when your findings tell you that n0 = n1, or that there is no statistically significant difference between the null (n0) and test (n1) hypotheses. Unfortunately for you, the “error” part of Type II error means that you are wrong, and a difference actually does exist. It all makes sense now, right? No, of course it doesn’t. More simply put, power reflects the difference between “Yes there is no question that I’ve found a difference between my groups (remember “1”?)” and “There is no difference between my groups. At least, I’m pretty sure there isn’t (β)”. Even more simply put, power levels the playing field for all the raw data you’re feeding into these analyses and gives you one nice, tidy, standardized metric. This allows you to compare outcomes from very different samples that are being tested along roughly the same dimensions. Put in the most uncomplicated of ways, power takes small cherries, medium-sized oranges, and titanic gourds, then makes them all into Granny Smith apples of measurably different masses. Now you can take your apples, put them on a scale one by one, weigh them, and know with real precision the differences between them. From a statistical standpoint, power analysis affords you the ability to look at completely discrete samples, even when drawn from several different studies, and compare them in an extraordinarily simple, yet powerful way. This is exactly what happens when researchers conduct what is known as a Meta Analysis.
Back to my dissertation stuff.
So, I was unsure about whether or not the regressions I was looking at (one was significant, the other was not) were actually firing on all cylinders. Did the ns model come out the way it did because of inadequate sample size? This is certainly a major cause of low power, and could have resulted in a poor predictive model. Incidentally, a small sample was my prime suspect, as I had 30 in the ns group, and 105 in the significant group. So, I fired up G*Power and got to work. After five minutes and four analyses, I had my answer. The regression model for the ns group was indeed underpowered, and I needed 16 more cases to get to a more acceptable power of 0.9 (at that level it basically means that I am 90% sure that any observed differences are real and true). Given that my dissertation was an archival analysis, I couldn’t just go back and test 16 more people, so the results were as final as they were ever going to be. I’m fine with that. Non-significant findings are every bit as important as significant findings since at the very least these outcomes tell other researchers where not to look and/or what not to do. This is a critical step in the scientific method.
At the end of the day, power analyses serve as the sharpest tool in the shed for comparing many types of analyses with a high degree of simplicity and certainty. I recently discovered that there’s no real metric for comparing parametric to non-parametric distributions (at least none that I’ve found), so that’s a kind of a problem. So far I’m not losing any sleep over it, though. In fact, I’m well rested, thanks to some…power naps. Hilarious, I know.