|Robert L. Blum, MD, PhD|
Does Drug X Really Work?
Evaluating Medical Evidence
The internet is filled with ads promoting various drugs, vitamins, and supplements.
How do you ever really know that they work? In this essay I will show you
how scientists answer that question.
I just spent the past week writing a review of TRANSCEND, a new health book
by Ray Kurzweil and Terry Grossman. The key question that it raises is
"should one believe the claims in it or not?" How does one arrive at pharmacologic truth?
My PhD thesis project at Stanford (the RX Project) was devoted to precisely this topic.
RX was an early experiment in automated data mining.
It took as input a huge collection of observations
that had been made on thousands of patients over a decade
and combed that database for possible causal links.
A causal link or relationship means that A causes B.
A could be a treatment, for example a drug.
B might be a side-effect or a desired effect (like longer life).
In designing the RX Project I led a small team of statisticians and computer scientists
to address precisely the question of this essay: how do we ever know that A causes B?
(One of Tom Jech's delightful cartoons on fallacious thought.)
How would one know that a given drug or food or habit or even exercise works?
The obvious answer is to try it. If it makes you feel better, stronger, faster, calmer,
more energetic then it works. If it doesn't or harms you, then dump it.
This evidence is direct, incontrovertible, and not lightly dismissed.
It is using your body as the original scientist.
Unfortunately, life is not that simple. How about drugs that make you feel great now
but are rapidly destructive: cocaine, amphetamines, or narcotics?
How about drugs that make you feel good now but are destructive long-term: nicotine or alcohol? And, of course, many drugs fall into the category"neutral to negative feeling now"
in exchange for possible long-term benefit. The negative feeling might be having to swallow
cod-liver oil or having to pay hundreds of dollars a year for a drug, herb, vitamin, or supplement. Also, the long-term benefit may be imperceptible: bones that fracture less readily,
arteries that are open, or less risk of cancer.
What is true of drugs also applies to foods, habits, and even exercise.
As I walk the aisles Safeway I find less and less that is healthy,
although every product has been carefully designed to taste good. We spend billions of dollars for foods that are convenient and taste great, but ultimately contribute to the obesity
and health care crisis in the United States and elsewhere.
Even the seemingly incontrovertible habit of EXERCISE cannot automatically
be assumed to be beneficial. Some of my friends run or bike over a hundred miles a week.
While that is testimony to their glowing health, it is not clear that it promotes their longevity.
How about the "wear and tear" factor? How about free radicals wrecking havoc?
How about thousands of calories pouring through their arteries?
So, how do we find out whether something is beneficial?
The obvious answer is to do a study. Have a thousand people take vitamin C for ten years
and see whether they are healthier or have lived longer than a control group.
Or, look at folks who have run marathons for decades and compare them to a control group. Easy,huh?
No. It is not easy. The basic problem is that there is infinite variability that may "explain" why one person who is fat and smokes lives to be a hundred and another
who is a lean vegetarian dies at age 30. Each of our ten trillion cells is different
from everyone else's.
So, how do you arrive at medical truth? In this little essay I can just scratch the surface.
My aim is to show you the kind of evidence that health scientists require to evaluate
a drug or other treatment. Absent that level of evidence you must be SKEPTICAL
of every health claim you see or hear.
RATING MEDICAL EVIDENCE
How do you grade or rate medical evidence?
(How about - it works ! Or, dump it !),
If you must reduce the evidence to a single expression
my favorite scale is to assign it a letter grade like this.
A Strong Scientific Evidence that the drug works
B Good Scientific Evidence
C Unclear or conflicting scientific evidence
D Fair negative scientific evidence
F Strong negative scientific evidence (the drug does NOT work)
This scale is called the Jadad scale or score and has these merits:
1) It is widely used. 2) Its correspondence to school grades is easy to understand.
3) A five point scale is just enough (like Goldilocks). (My RX Project used a ten point validity scale, because, as you'll see, the "C Grade" is a huge grab bag.)
Note that "negative scientific evidence" (rating D or F) does NOT mean
proof that the drug is harmful. It DOES mean that we have strong proof
that the drug does NOT work. The question of harm is a separate matter.
Also, we may simply be unable to evaluate the effectiveness of a drug
because of a lack of human data. Note that a lack of evidence means we cannot make any claim whatsoever (although the internet pitchmen do it anyway.)
This scale addresses the AMOUNT of scientific evidence, the QUALITY of that evidence, the EFFECT SIZE, and the CONSISTENCY of the evidence.
Note that the main focus is on FORMAL STUDIES on PEOPLE as opposed to
anecdotal reports, folklore, animal studies, or even what experts think.
Where are we at so far? Here's where. Forget anecdotal reports
(it worked for my friends, I saw it on tv or on the internet).
Even practice standards may be wrong (4 out of 5 doctors recommend Bayer).
Folklore, a testimonial, or even test tube verification is just the
first step in a thousand mile journey toward a scientific conclusion.
Note that even Grade A evidence may always be overturned,
and a Grade C drug may later be upgraded by more study.
Scientific evidence is not religious dogma. It is always capable
of being falsified or modified by further evidence.
One bunny rabbit skeleton in billion year old rock would be headline news, because it flies in the face of the theory of evolution.
Also the accumulated evidence may not be relevant to you.
You might be older or younger than the study group or different in other ways.
The gold standard for proof that a drug works is the double blind
randomized control trial (RCT).
The researchers assign patients to two groups: the study which gets the treatment
and the control group which gets a placebo (a look-a-like treatment). Double blind means that neither the patients nor the researchers know which patients are in which group.
Randomized means the assignment is done using computer-generated random numbers.
Nothing can replace an RCT. Here's why. Unless people are randomly assigned in a clinical study, it is always possible that some outside factor may account for the difference in outcome between the study group (given the drug) versus the control group (taking the placebo).
In a sufficiently large randomized trial ALL extraneous factors - both known and unknown -
are automatically equalized between the groups (at least in theory).
It is easiest to illustrate this by considering a much weaker kind of study: a cohort study.
Consider a study comparing marathon runners to couch potatoes to find out whether the marathoners live longer.
For every marathoner in the study include in the control group a person of the same age, sex, race, and health history. Now follow them for twenty years and count the number of deaths in each group or the number of heart attacks or whatever. Even if you were to show that the marathoners enjoyed longer lifer or fewer heart attacks, how could you ever refute the claim
that the difference was due to genetics or hardiness or cleaner living or special diet or supplements or occupation or any of an infinitude of variables.
That infinitude of confounding variables can only be controlled by random allocation.
By randomly allocating people to the marathon group you control for genetics and hardiness and all the other spurious variables. Randomization is the only means for demonstrating that the benefit was conferred by the study variable alone. (Good luck trying to randomly assign people to marathoning versus sitting on a couch, and doing it double blind. It can't be done.)
Criteria for Assessing Strength of Scientific Evidence
And now here are the criteria for the various levels of evidence shown in the above table.
This table and that above were copied from the Natural Standard: Grading Explanation (more about them in a moment).
If you look carefully at the above table, you see that the quantity and quality
of the evidence is roughly U-shaped. It takes a lot of high quality evidence to demonstrate a benefit from a drug as well as to shoot it down. A weak study or lack of evidence
is not a reason to believe that a drug or other intervention does not work.
The low point in the U-shape is the Grade C evidence: unclear or conflicting evidence.
Look at the final two clauses in that giant OR statement that defines Grade C evidence:
evidence only from one or more non-randomized studies WITHOUT supporting evidence
from basic science (test tubes), animal studies, or theory …
OR evidence ONLY from basic science (test tubes) , animal studies, or theory (with NO studies in human beings).
Unfortunately this category "Grade C evidence" is a huge scrap heap that includes
every worthless drug, vitamin, supplement, or other intervention that you read about on the internet.
In Category C are the snake oils that make up the kit bag of every medical charlatan
in history. (Scientific lingo and a white coat are not what distinguish scientists
from snake oil merchants. The key distinguishing feature is their use of evidence.)
A Natural Standard Review as an Example: The Reishi Mushroom
I've based this essay on the methodology of the Natural Standard because they do their reviews exactly right. It is not simply that they claim to be "The Authority on Integrative Medicine" (they ARE), rather it is that they understand how to evaluate alternative medicine health claims - the flakiest, most quack-ridden arena of medical practice.
(Unfortunately, their reviews are not freely available on the internet like those of the quacks.
The reviews by NCCAM are available and are equally authoritative (see the end of this essay).
Fortunately for us one of Natural Standard's reviews is freely inspectable on the internet. Look at their review of Reishi mushrooms. I had never heard of these, by the way.
It just happens to be the one review that they chose to make public.
Perhaps it's because "Royalty considered it precious and used it in hopes
of obtaining immortality."
Note first: 80 references - this must be the entire medical literature
on this species of mushroom.
Next note: each of 20 reviewers read this literature, evaluated each study,
and wrote up this review. Then the entire review was itself reviewed by the overall
Editorial Board of The Natural Standard without knowing who had prepared the review.
If the review had deficiencies, the Board's editorial hand would not be restrained
because their buddies happened to be among the review's writers.
If you want to see the BOTTOM LINE for a drug or supplement (like Reishi mushroom) go right to the little table at the top of the review
"Scientific Evidence for Common/Studied Uses." Here, that table has 9 rows
(one for each disease treated with the mushroom) and 9 corresponding letter grades.
For each disease the mushroom gets a "C," weak or inconclusive evidence. Not great.
This definitely "throws a wet blanket" on immortality bestowed by the mushroom.
THE EVIDENCE TABLE
Make Everything as Simple as Possible, But Not Simpler -- Albert Einstein
The idea in the Einstein quote applies here.
Basically, each of the 80 studies in the citation list is already a summary of data
collected on each of the thousand or so patients that participated in these studies.
Rather than combining all of those studies into a single letter grade,
I prefer to look at the big EVIDENCE TABLE itself, located in the middle of the mushroom review.
SCROLL WAY DOWN UNTIL YOU GET TO THAT TABLE.
Note: This is the big table with about a dozen rows that start with disease conditions:
Cancer, Chronic Hepatitis, Coronary Artery Disease, etc.
The EVIDENCE TABLE has ten columns:
the condition or disease for which the treatment was given;
the study design (ie was it an RCT or Case Series or Cohort Study etc.);
the literature citation;
the number of patients in the study;
whether the result was statistically significant (ie could it have simply been due to chance);
the overal QUALITY of the study;
the Magnitude of Benefit (ie, was the effect size large, medium, or small)
the Absolute Risk Reduction (if cancer was reduced by half in the treated group, this would be 50%)
Number Needed to Treat;
Comments (usually dosage and duration of the trial)
Their complete explanation of this table appears here (Evidence Table Explanation).
Here note their standard method for calculating the Quality of each study (each row) in the evidence table.
Now, if you haven't already, take a look at that big evidence table itself.
Click open the Reichi Mushroom review and now look at the large table in the middle.
I'm immediately bothered by the fact that half of the studies had the same lead author - Dr Gao -
and appeared in only one journal ( a specialty journal with limited readership).
However, three of his group's studies were RCTs with moderate numbers of patients
(71, 170, and 90 patients) and two showed a medium effect size.
Here, despite this promising evidence, the Quality of These Studies (column 6)
rises only to a 2 (poor) for Dr. Gao's RCT for mushroom treatment of Coronary heart disease
(row 5 in the table) and only a 1(poor) for his group's RCT on treatment of diabetes.
So, what should you as a patient do with this level of evidence?
I think the answer is clearly "wait. Hold off." The evidence simply does not justify
taking this treatment for the studied indications. For each of the listed diseases
there are treatments that work better. Also, we have not even begun
to address the side-effects or cost of this treatment.
For any given treatment what we're really hoping to see is a letter grade A or B
and repeated high quality evidence of large beneficial effects. Examples of these would be
insulin for diabetes or penicillin for pneumonia - large irrefutable effects in study after study.
Unfortunately, for each Nobel prize winning therapy like insulin or penicillin
there are thousands of drugs that fall into the Grade C "unclear or conflicting" scrap heap.
Most may gradually fall into disuse or disrepute: a very few may graduate to a Grade B
or even A as further evidence accumulates.
For readers looking for more on the topic of evidence-based medicine, I recommend that Wikipedia article. There I notice that Adrian Smith, President of the Royal Statistical Society, recommends evidence-based medicine as an exemplar for all public policy.
Before closing, I'd like to tip my hat to the National Center for Complementary and Alternative Medicine (NCCAM). NCCAM is a branch of the National Institutes of Health (NIH) whose most important function is to collect and disseminate information on complementary and alternative medicines. Occasionally they will also conduct large scale clinical trials of drugs
that have caught the attention of the public. For example, they conducted a large trial of
glucosamine and chondroitin (previously used to alleviate joint pain and preserve knee cartilage.)
Their study showed that the combo drug did not work, although there may have been an effect
For anyone looking for unbiased, scientific information on specific alternative medicine therapies, I recommend NCCAM's website. Here is their Health Topics A-Z index.
Finally, in the next article - TRANSCEND DRUGS - I will show the evidence grades that
the National Standard assigned to some of the drugs, vitamins, and supplements mentioned
in the Kurzweil/Grossman health book, TRANSCEND.