But this one really takes the biscuit. And then it takes the tin. And relieves itself in it: A New Population-Enrichment Strategy to Improve Efficiency of Placebo-Controlled Clinical Trials of Antidepressant Drugs.
Don't worry - it's from a big pharmaceutical company (GlaxoSmithKline), so I don't have to worry about hurting feelings.
It's is full to bursting with colourful graphs and pictures, but the basic idea is very simple. As in "simpleton".
Suppose you're testing a new drug against placebo. You decide to do a multicentre trial, i.e. you enlist lots of doctors to give the drug, or placebo, to their patients. Each clinic or hospital which takes part is a "centre". Multicentre trials are popular because they're an easy way of quickly testing a drug on a large number of patients.
Anyway, suppose that the results come in, and it turns out that the drug didn't work any better than placebo, which unfortunately is what happens rather often in modern trials of antidepressants. Oh dear. The drug's crap. That's the end of that chapter.
...or is it?!? say GSK. Maybe not. They have a clever trick. Look at the results from each centre individually. Placebo response rates will probably vary between centres: in some of them, the placebo people don't get better, in others, they get lots better.
Now, suppose that you just chucked out all of the data from centres where the people on placebo got much better, on the grounds that there must be something weird going on in those ones. They reanalyzed the data from 1,837 patients given paroxetine or placebo, across 124 centres. In the dataset as a whole, paroxetine barely outperformed placebo. However, in the centres where people on placebo only improved a little, the drug was much better than placebo!
Well, of course it was. Imagine that the drug has no effect. Some people just get better and others don't. Let's assume that each person randomly gets between 0 and 25 better, with an equal chance of any outcome. Half are on drug and half are on placebo, but it makes no difference.
Let's further assume that there are 50 centres, with 20 people per centre (1000 people total). I knocked up a "simulation" of this in Excel (it took 10 minutes). Here's what you get:
The blue dots show, for each imaginary centre, drug improvement vs. placebo improvement. There's no correlation (it's random), and, on average, there is no difference: both average out at 12 points. The drug doesn't work.
The red dots show the "Treatment Effect" i.e. [drug improvement - placebo improvement]. The average is 0 - because the drug doesn't work. But there's a strong negative correlation between Treatment Effect and the placebo improvement - in centres where people improved lots on placebo, the drug worked worse.
This is exactly what Glaxo show in Figure 1a (see above). They write:
The analysis of the surface response indicated the predominant role of center specific placebo response as compared with the dose strength in determining the Treatment Effect of paroxetine.But of course they correlate. You're correlating placebo improvement with itself: the "Treatment Effect" is a function of the placebo improvement. It's classic regression to the mean.
Of course if you chuck out the centres where people on placebo do well (the grey box in my picture), the drug seems to work pretty nicely. But this is cheating. It is cherry-picking. It is completely unscientific. (To give the authors their due, they also eliminated the centres where the placebo response was very low. This could, under some assumptions, make the analysis unbiased, but they don't show that this was their intention, let alone that it would eliminate all of the bias.)
The authors note that this could be a source of bias, but say that it wouldn't be one if it was planned out in advance: "in order to overcome the bias risk, the enrichment strategy should be accounted for and pre-planned in the study protocol." This is like saying that if you announce, before playing chess, that you are going to cheat, it's not cheating.
To be fair to the authors, assuming the drug does work, this method would improve your chances of correctly detecting the effect. Centres with very high placebo responses quite possibly are junk. Assuming the drug works.
But if we're assuming the drug works, why are we bothering to do a trial? The whole point of a trial is to discover something we don't know. The authors justify their approach by suggesting that it would be useful for drug companies who want to do a "proof-of-concept" trial to find out whether an experimental drug might work under the most favourable conditions, i.e. whether they should bother continuing to research it.
They say that such trials "are inherently exploratory in their conception, aimed at signal detection, open to innovation..." - in other words, that they're not meant to be as rigorous as late-stage trials.
Fair enough. But this method is not even suitable for proof-of-concept, because it would (as I have shown above in my 10 minute simulation) increase your chance of finding an "effect" from a drug that doesn't work.
Whatever the truth is, this method will give the same result, so it's not useful evidence. It's like saying "Heads I win, tails you lose". You've set it up so that I lose - the coin toss doesn't tell us anything.
All of the author's results are based on trials in which the drug "should have worked": they do not appear to have simulated what would happen if they used this method on trials where it didn't work, as I just did. So I'm doing Pharma a big favour by writing this post, because if they adopt this approach, they're more likely to waste money on drugs that don't work.
They should be paying me for this stuff.
Merlo-Pich E, Alexander RC, Fava M, & Gomeni R (2010). A New Population-Enrichment Strategy to Improve Efficiency of Placebo-Controlled Clinical Trials of Antidepressant Drugs. Clinical Pharmacology and Therapeutics PMID: 20861834