My take on the Decline Effect

So I’ve read Jonah Lehrer’s New Yorker piece now several times. I take it seriously. The policy implications, particularly with regards to the use of pharmaceuticals, are incredibly disturbing. I’m less concerned with the Rhine’s ESP research in the 1930’s.

I should point out that there are many areas of science ranging from molecular biology to astrophysics where I don’t believe there is any evidence at all for such a “decline effect”.  The disciplines affected by the problem are those that generally depend on to a greater extent on parametric statistics (t-tests and the like) rather than categorical “yes-no” results (e.g. a gene sequence, the timing of an eclipse, a band in a gel).

So what about the causes? First, yes there is experimenter bias. Experimenters are (still) human and hence are imperfect.

But much more interesting to me is the problem of replicability.  As a journal editor myself, I have to make difficult decisions about what to publish and the reality of today’s scientific marketplace is that negative results have a hard time making it past editors and into print. So another real part of the problem is that when many studies are compared for replicability (meta-analysis), this type of research itself is inherently biased by the “dark matter” of unpublished negative results.

Is something else spooky going on here? I don’t think so. Science, I’m pleased to say, has not yet been seriously targeted by deconstructive criticism.

4 thoughts on “My take on the Decline Effect

  1. To illustrate, take a “simple” task: search. Assume we have a developer who claims to have a machine that can infer what the human analyst will do by some mechanism (generally EEG), but faster and with no loss in accuracy. How does one test that?

    To cut to the chase, when the experimenter can select the test images, the targets tend to be unambiguous, and novice performers almost invariably do better with the machine than without it. However, we also have cases where experts have manually outperformed the machines hooked up to other experienced human subjects. We have numerous examples where the differences across multiple subjects are not statistically significant (using paired t-tests in most cases), and others where they are.

    If we just looked at the positive results, it would be simple to build a case that we should buy one or more of these machines, grease people up every day, strap the gear on their heads, and do search that way. On the other hand, we also have results that most of these brain results fall off in accuracy with scale, image quality, obliquity, and clutter. Using just those results, it would be equally simple to lambast the entire class of technology as a waste of time and effort.

    In other words, I can prove anything I want by cherry-picking results that I have at hand. To me, this comes down to a situation where we have not yet identified all the variables that need to be controlled, and have not yet run the correct experiments (or operational tests) because we still do not have a complete idea of what cognitive results depend on which random factors among those variables (crucial to defining the proper null conditions).

    This is a case where I have knowledge of all the results, positive, negative, and in between. Imagine the case where we only have the positive results.

    Like

  2. To illustrate, take a “simple” task: search. Say you are looking for a tank on an image. Manually, that amounts to marking an area on the screen corresponding to each tank that is found. Those all have locations that can be extracted and used for other things, but the search task is to find where those tanks are. If we have a computer do it (automatic target recognition), it will create a similar map that can be compared. Computers tend to have a false positive problem, in that they identify far more tanks than are there, and it takes longer to sort out the false positives than to just do the search manually.

    What about assisting the human analyst with some kind of machine acting as a cognitive prosthetic? Assume we have a developer who claims to have a machine that can infer what the human analyst will do by some mechanism (generally EEG), but faster and with no loss in accuracy. How does one test that?

    To cut to the chase, when the experimenter can select the test images, the targets tend to be unambiguous, and novice performers almost invariably do better with the machine than without it. However, we also have cases where experts have manually outperformed the machines hooked up to other experienced human subjects. We have numerous examples where the differences across multiple subjects are not statistically significant (using paired t-tests in most cases), and others where they are.

    If we just looked at the positive results, it would be simple to build a case that we should buy one or more of these machines, grease people up every day, strap the gear on their heads, and do search that way. On the other hand, we also have results that most of these brain results fall off in accuracy with scale, image quality, obliquity, and clutter. Using just those results, it would be equally simple to lambast the entire class of technology as a waste of time and effort.

    In other words, I can prove anything I want by cherry-picking results that I have at hand. To me, this comes down to a situation where we have not yet identified all the variables that need to be controlled, and have not yet run the correct experiments (or operational tests) because we still do not have a complete idea of what cognitive results depend on which random factors among those variables (crucial to defining the proper null conditions).

    This is a case where I have knowledge of all the results, positive, negative, and in between. Imagine the case where we only have the positive results.

    Like

  3. Greetings….I wonder if you might expand on setting up the null hypothesis. Are you finding a hidden bias there? Do experimenters tend to set up the null hypothesis to yield a “positive” result?

    Like

  4. My office has also been discussing this article with some interest and concern. We do a great deal of research and development with what amounts to advanced photogrammetry that does not seem to be affected. However, we also do a great deal of research on “analytics” where we try to infer what human analysts do as they work with intelligence problems. Parametric statistical tests depend directly on how one sets up the null hypothesis, and that is the piece we are looking at. The bias against publishing negative results doesn't help, as we are basically on our own as a result.

    David Cooper, *GMU 2010
    Office of Basic and Applied Research
    National Geospatial-Intelligence Agency

    Like

Comments are closed.