Robert L. Blum, MD, PhD




AI: Future of Humanity

Sphere of Interest

WebBrain: AI
neurosci psych

Stanford Brain Lecture Notes

The RX Project:
Robotic Discovery

CV Biblio (1985)


Index of Essays

Psychology &
Neuroscience brain-icon

Computer Science,
Robotics, and AI

Health & Biotech

Earth Wisdom: Universe

Be Saved by Bob!!!
(And Other Balms )

Optimal Nutrition:
Are Fats Killers
or Saviors?


Consciousness Video:
Who, What, When?

Stan Dehaene's
Consciousness & Brain

Near Death Experiences: In the Desert With Pim Van Lommel

Fine-Tuned for Life?

Neuron Videos Say
Forget Realistic AI

EUV 2014 - Future of Moore's Law

BAM: Brain Activity Map of Spikes

Beating Jeopardy!
What is Watson?
AI Overlord or Tool?

SETI: Search for Extraterrestrial Intelligence

KEPLER Seeks Earth-like Worlds

STEVE PINKER in the Amazon: photos

Billion Year Plan:
AI Formulation

AI Awakens

CONSCIOUSNESS as Global Resonance

SEAN's Accident

Coronary Artery
CT Scan: Yes!

Book Review: TRANSCEND

Book Review:
Create a Mind

Does Drug X


Total Recall:
Everything, Always

Ralph Triumphs:
Elbot Cheers

Scientists &
Evangelicals Unite

Thomas Berry,
Geologian: Obituary

Calorie Restriction
Works in Monkeys!

TheBrain &
WebBrain: Review


Does Drug X Really Work?

Evaluating Medical Evidence


The internet is filled with ads promoting various drugs, vitamins, and supplements.

How do you ever really know that they work?  In this essay I will show you

how scientists answer that question.


            I just spent the past week writing a review of TRANSCEND, a new health book

by Ray Kurzweil and Terry Grossman.  The key question that it raises is

"should one believe the claims in it or not?" How does one arrive at pharmacologic truth?


            My PhD thesis project at Stanford (the RX Project) was devoted to precisely this topic.

RX was an early experiment in automated data mining.


            It took as input a huge collection of observations

 that had been made on thousands of patients over a decade

and combed that database for possible causal links.


            A causal link or relationship means that A causes B.

 A could be a treatment, for example a drug.

 B might be a side-effect or a desired effect (like longer life).


            In designing the RX Project I led a small team of statisticians and computer scientists

to address precisely the question of this essay: how do we ever know that A causes B?
Never mistake correlation (ie, association) for causation!

Causal-Bulb Cartoon by Tom Jech


 (One of Tom Jech's delightful cartoons on fallacious thought.)
(Also note: when trying to get into Pandora's Box, be sure to cover your fallacy with a conundrum!)


How would one know that a given drug or food or habit or even exercise works?

The obvious answer is to try it.   If it makes you feel better, stronger, faster, calmer,

more energetic then it works.  If it doesn't or harms you, then dump it.

This evidence is direct, incontrovertible, and not lightly dismissed.

 It is using your body as the original scientist. 

            Unfortunately, life is not that simple.  How about drugs that make you feel great now

but are rapidly destructive: cocaine, amphetamines, or narcotics?

How about drugs that make you feel good now but are destructive long-term: nicotine or alcohol?  And, of course, many drugs fall into the category"neutral to negative feeling now"

in exchange for possible long-term benefit.   The negative feeling might be having to swallow

cod-liver oil or having to pay hundreds of dollars a year for a drug, herb, vitamin, or supplement.  Also, the long-term benefit may be imperceptible:  bones that fracture less readily,

arteries that are open, or less risk of cancer.


            What is true of drugs also applies to foods, habits, and even exercise.

As I walk the aisles Safeway I find less and less that is healthy,

although every product has been carefully designed to taste good.   We spend billions of dollars for foods that are convenient and taste great, but ultimately contribute to the obesity

and health care crisis in the United States and elsewhere.


            Even the seemingly incontrovertible habit of EXERCISE cannot automatically

be assumed to be beneficial. Some of my friends run or bike over a hundred miles a week.

While that is testimony to their glowing health, it is not clear that it promotes their longevity.

How about the "wear and tear" factor? How about free radicals wrecking havoc?

How about thousands of calories pouring through their arteries?


            So, how do we find out whether something is beneficial?

The obvious answer is to do a study. Have a thousand people take vitamin C for ten years

and see whether they are healthier or have lived longer than a control group.

Or, look at folks who have run marathons for decades and compare them to a control group.  Easy,huh?


            No.  It is not easy.  The basic problem is that there is infinite variability that may "explain" why one person who is fat and smokes lives to be a hundred and another

who is a lean vegetarian dies at age 30. Each of our ten trillion cells is different

from everyone else's.


            So, how do you arrive at medical truth?  In this little essay I can just scratch the surface.

My aim is to show you the kind of evidence that health scientists require to evaluate

a drug or other treatment. Absent that level of evidence you must be SKEPTICAL

of every health claim you see or hear.





            How do you grade or rate medical evidence?

(How about - it works ! Or, dump it !),

If you must reduce the evidence to a single expression

my favorite scale is to assign it a letter grade like this.


Letter Grade


A          Strong Scientific Evidence that the drug works

B          Good Scientific Evidence

C          Unclear or conflicting scientific evidence

D          Fair negative scientific evidence

F          Strong negative scientific evidence (the drug does NOT work)


            This scale is called the Jadad scale or score and has these merits:

 1) It is widely used. 2) Its correspondence to school grades is easy to understand.

 3) A five point scale is just enough (like Goldilocks).  (My RX Project used a ten point validity scale, because, as you'll see, the "C Grade" is a huge grab bag.)


            Note that "negative scientific evidence" (rating D or F) does NOT mean

proof that the drug is harmful.   It DOES mean that we have strong proof

that the drug does NOT work.  The question of harm is a separate matter.


            Also, we may simply be unable to evaluate the effectiveness of a drug

 because of a lack of human data.  Note that a lack of evidence means we cannot make any claim whatsoever (although the internet pitchmen do it anyway.)


            This scale addresses the AMOUNT of scientific evidence, the QUALITY of that evidence,  the EFFECT SIZE, and the CONSISTENCY of the evidence.


Note that the main focus is on FORMAL STUDIES on PEOPLE as opposed to

 anecdotal reports, folklore, animal studies, or even what experts think.


            Where are we at so far? Here's where.  Forget anecdotal reports

(it worked for my friends, I saw it on tv or on the internet).

 Even practice standards may be wrong (4 out of 5 doctors recommend Bayer).   

Folklore, a testimonial, or even test tube verification is just the

first step in a thousand mile journey toward a scientific conclusion.


            Note that even Grade A evidence may always be overturned,

and a Grade C drug may later be upgraded by more study.

Scientific evidence is not religious dogma.   It is always capable

of being falsified or modified by further evidence.

One bunny rabbit skeleton in billion year old rock would be headline news, because it flies in the face of the theory of evolution.


            Also the accumulated evidence may not be relevant to you.

You might be older or younger than the study group or different in other ways.




            The Randomized Control Trial (RCT)


            The gold standard for proof that a drug works is the double blind

randomized control trial (RCT).


The researchers assign patients to two groups: the study which gets the treatment

 and the control group which gets a placebo (a look-a-like treatment).  Double blind means that neither the patients nor the researchers know which patients are in which group.

Randomized means the assignment is done using computer-generated random numbers.


            Nothing can replace an RCT.  Here's why.  Unless people are randomly assigned in a clinical study, it is always possible that some outside factor may account for the difference in outcome between the study group (given the drug) versus the control group (taking the placebo).

In a sufficiently large randomized trial ALL extraneous factors - both known and unknown -

are automatically equalized between the groups (at least in theory).


            It is easiest to illustrate this by considering a much weaker kind of study: a cohort study.

Consider a study comparing marathon runners to couch potatoes to find out whether the marathoners live longer.


For every marathoner in the study include in the control group a person of the same age, sex, race, and health history. Now follow them for twenty years and count the number of deaths in each group or the number of heart attacks or whatever. Even if you were to show that the marathoners enjoyed longer lifer or fewer heart attacks, how could you ever refute the claim

that the difference was due to genetics or hardiness or cleaner living or special diet or supplements or occupation or any of an infinitude of variables.


            That infinitude of confounding variables can only be controlled by random allocation.

 By randomly allocating people to the marathon group you control for genetics and hardiness and all the other spurious variables. Randomization is the only means for demonstrating that the benefit was conferred by the study variable alone. (Good luck trying to randomly assign people to marathoning versus sitting on a couch, and doing it double blind.  It can't be done.) 



Criteria for Assessing Strength of Scientific Evidence


            And now here are the criteria for the various levels of evidence shown in the above table.

This table and that above were copied from the Natural Standard: Grading Explanation (more about them in a moment).


Level of Evidence Grade


A (Strong Scientific Evidence)

Statistically significant evidence of benefit from >2 properly randomized trials (RCTs), OR evidence from one properly conducted RCT AND one properly conducted meta-analysis, OR evidence from multiple RCTs with a clear majority of the properly conducted trials showing statistically significant evidence of benefit AND with supporting evidence in basic science, animal studies, or theory.

B (Good Scientific Evidence)

Statistically significant evidence of benefit from 1-2 properly randomized trials, OR evidence of benefit from >1 properly conducted meta-analysis OR evidence of benefit from >1 cohort/case-control/non-randomized trials AND with supporting evidence in basic science, animal studies, or theory. This grade applies to situations in which a well designed randomized controlled trial reports negative results but stands in contrast to the positive efficacy results of multiple other less well designed trials or a well designed meta-analysis, while awaiting confirmatory evidence from an additional well designed randomized controlled trial.

C (Unclear or conflicting scientific evidence)

Evidence of benefit from >1 small RCT(s) without adequate size, power, statistical significance, or quality of design by objective criteria,* OR conflicting evidence from multiple RCTs without a clear majority of the properly conducted trials showing evidence of benefit or ineffectiveness, OR evidence of benefit from >1 cohort/case-control/non-randomized trials AND without supporting evidence in basic science, animal studies, or theory, OR evidence of efficacy only from basic science, animal studies, or theory.

D (Fair Negative Scientific Evidence)

Statistically significant negative evidence (i.e., lack of evidence of benefit) from cohort/case-control/non-randomized trials, AND evidence in basic science, animal studies, or theory suggesting a lack of benefit.This grade also applies to situations in which >1 well designed randomized controlled trial reports negative results, notwithstanding the existence of positive efficacy results reported from other less well designed trials or a meta-analysis. (Note: if there is >1 negative randomized controlled trials that are well designed and highly compelling, this will result in a grade of "F" notwithstanding positive results from other less well designed studies.)

F (Strong Negative Scientific Evidence)

Statistically significant negative evidence (i.e. lack of evidence of benefit) from >1 properly randomized adequately powered trial(s) of high-quality design by objective criteria.*

Lack of Evidence

Unable to evaluate efficacy due to lack of adequate available human data.


            If you look carefully at the above table, you see that the quantity and quality

 of the evidence is roughly U-shaped.  It takes a lot of high quality evidence to demonstrate a benefit from a drug as well as to shoot it down. A weak study or lack of evidence

is not a reason to believe that a drug or other intervention does not work.


            The low point in the U-shape is the Grade C evidence: unclear or conflicting evidence.

Look at the final two clauses in that giant OR statement that defines Grade C evidence:

evidence only from one or more non-randomized studies WITHOUT supporting evidence

from basic science (test tubes), animal studies, or theory …

OR evidence ONLY from basic science (test tubes) , animal studies, or theory (with NO studies in human beings).


            Unfortunately this category "Grade C evidence" is a huge scrap heap that includes

every worthless drug, vitamin, supplement, or other intervention that you read about on the internet.


In Category C are the snake oils that make up the kit bag of every medical charlatan

 in history. (Scientific lingo and a white coat are not what distinguish scientists

from snake oil merchants. The key distinguishing feature is their use of evidence.)


            A Natural Standard Review as an Example: The Reishi Mushroom


                        I've based this essay on the methodology of the Natural Standard because they do their reviews exactly right. It is not simply that they claim to be "The Authority on Integrative Medicine" (they ARE), rather it is that they understand how to evaluate alternative medicine health claims - the flakiest, most quack-ridden arena of medical practice.

 (Unfortunately, their reviews are not freely available on the internet like those of the quacks.

The reviews by NCCAM are available and are equally authoritative (see the end of this essay).


            Fortunately for us one of Natural Standard's reviews is freely inspectable on the internet.  Look at their review of Reishi mushrooms. I had never heard of these, by the way.

It just happens to be the one review that they chose to make public.

Perhaps it's because "Royalty considered it precious and used it in hopes

of obtaining immortality."


            Note first: 80 references - this must be the entire medical literature

on this species of mushroom.


Next note: each of 20 reviewers read this literature, evaluated each study,

 and wrote up this review. Then the entire review was itself reviewed by the overall

Editorial Board of The Natural Standard without knowing who had prepared the review.

If the review had deficiencies, the Board's editorial hand would not be restrained

because their buddies happened to be among the review's writers.


            If you want to see the BOTTOM LINE for a drug or supplement (like Reishi mushroom) go right to the little table at the top of the review

"Scientific Evidence for Common/Studied Uses."  Here, that table has 9 rows

(one for each disease treated with the mushroom) and 9 corresponding letter grades.

For each disease the mushroom gets a "C,"  weak or inconclusive evidence.  Not great.

This definitely "throws a wet blanket" on immortality bestowed by the mushroom.






Make Everything as Simple as Possible, But Not Simpler -- Albert Einstein


            The idea in the Einstein quote applies here.

Basically, each of the 80 studies in the citation list is already a summary of data

collected on each of the thousand or so patients that participated in these studies.

Rather than combining all of those studies into a single letter grade,

I prefer to look at the big EVIDENCE TABLE itself, located in the middle of the mushroom review.



Note: This is the big table with about a dozen rows that start with disease conditions:

Cancer, Chronic Hepatitis, Coronary Artery Disease, etc.


The EVIDENCE TABLE has ten columns: 


the condition or disease for which the treatment was given;

the study design (ie was it an RCT or Case Series or Cohort Study etc.);

the literature citation;

the number of patients in the study;

whether the result was statistically significant (ie could it have simply been due to chance);

the overal QUALITY of the study;

the Magnitude of Benefit (ie, was the effect size large, medium, or small)

the Absolute Risk Reduction (if cancer was reduced by half in the treated group, this would be 50%)

Number Needed to Treat;

Comments (usually dosage and duration of the trial)
















Study Design

Author, Year


Statistically Significant?

Quality of study

Magnitude of Benefit

Absolute Risk Reduction

Number Needed to Treat




            Their complete explanation of this table appears here (Evidence Table Explanation).

Here note their standard method for calculating the Quality of each study (each row) in the evidence table.


            Now, if you haven't already, take a look at that big evidence table itself.

Click open the Reichi Mushroom review and now look at the large table in the middle.


I'm immediately bothered by the fact that half of the studies had the same lead author - Dr Gao -

and appeared in only one journal ( a specialty journal with limited readership).

However, three of his group's studies were RCTs with moderate numbers of patients

 (71, 170, and 90 patients) and two showed a medium effect size.


            Here, despite this promising evidence, the Quality of These Studies (column 6)

rises only to a 2 (poor) for Dr. Gao's RCT for mushroom treatment of Coronary heart disease

(row 5 in the table) and only a 1(poor) for his group's RCT on treatment of diabetes.


So, what should you as a patient do with this level of evidence?

 I think the answer is clearly "wait. Hold off."  The evidence simply does not justify

taking this treatment for the studied indications. For each of the listed diseases

there are treatments that work better. Also, we have not even begun

to address the side-effects or cost of this treatment.


            For any given treatment what we're really hoping to see is a letter grade A or B

and repeated high quality evidence of large beneficial effects.  Examples of these would be

insulin for diabetes or penicillin for pneumonia - large irrefutable effects in study after study.



            Unfortunately, for each Nobel prize winning therapy like insulin or penicillin

there are thousands of drugs that fall into the Grade C "unclear or conflicting" scrap heap.

 Most may gradually fall into disuse or disrepute: a very few may graduate to a Grade B

or even A as further evidence accumulates.


            For readers looking for more on the topic of evidence-based medicine, I recommend that Wikipedia article. There I notice that Adrian Smith, President of the Royal Statistical Society, recommends evidence-based medicine as an exemplar for all public policy.


            Before closing,  I'd like to tip my hat to the National Center for Complementary and Alternative Medicine (NCCAM). NCCAM is a branch of the National Institutes of Health (NIH) whose most important function is to collect and disseminate information on complementary and alternative medicines.  Occasionally they will also  conduct large scale clinical trials of drugs

that have caught the attention of the public.  For example, they conducted a large trial of

glucosamine and chondroitin (previously used to alleviate joint pain and preserve knee cartilage.)

Their study showed that the combo drug did not work, although there may have been an effect
in a small subset.)


            For anyone looking for unbiased, scientific information on specific alternative medicine therapies, I recommend NCCAM's website.  Here is their Health Topics A-Z  index.


            Finally, in the next article - TRANSCEND DRUGS - I will show the evidence grades that

the National Standard assigned to some of the drugs, vitamins, and supplements mentioned

in the Kurzweil/Grossman health book, TRANSCEND.