Healthy surprises: how randomised trials can challenge conventional wisdom and debunk dogma'
I acknowledge the Wurundjeri people of the Kulin nation, the traditional owners of these lands, and pay respects to all First Nations people present.
Professor Goldfeld, Murdoch Children’s Research Institute staff, and researchers across laboratory, clinical, public health, and health services: Thank you for inviting me here today.
My thanks to the Institute for what you do each day to help improve the lives of children through rigorous research. I also pay tribute to Professor David Danks (1931-2003) who established Victoria's first genetics health service, which became this Institute, after whom today’s Seminar is named. As one obituary described him, ‘David was a remarkable doctor, scientist, teacher, mentor, family man, friend, and champion of good causes’ (Choo, 2003).
My focus today is on randomised trials – a central tool in medicine, but underutilised in policy. In particular, I want to focus on the way in which randomised trials in medicine can upend conventional wisdom, producing results that improve patient outcomes and extend lifespans. You might call these ‘healthy surprises’.
Expect the Unexpected
Let’s start with Rapid Response Teams, set up to prevent cardiac arrests in hospitals. The hypothesis was that if a group of clinical professionals – usually nurses – are on standby to monitor vital signs of patients and intervene before things get bad, then cardiac arrests could be prevented, and survival chances increased.
It seemed like a sensible and simple idea. Single-centre before-and after studies showed good outcomes. So, rapid response teams were rolled out across hospitals in the United States (Prasad et al, 2015).
A 2005 Australian randomised trial put the idea to the test. Twenty-three hospitals randomised the adoption of rapid response teams. The trial found that there were no discernible differences between the treatment and control groups in terms of “the number of cardiac arrests, unplanned ICU admissions, and unplanned deaths” (Prasad et al, 2015).
Interestingly, however, both treatment and control groups saw a reduction in cardiac arrests over the course of the experiment. So it seems that rapid response teams do not work, but that the medical system is steadily getting better of treating these patients (Prasad et al, 2015).
Like cardiac arrests, many people can now recognise the signs of a stroke. One morning, an American woman by the name of Anita Kramer woke up and could not move her left arm. Kramer called 911 and during her evaluation doctors discovered she had a narrowing in a major blood vessel in her brain. She had an intercranial stent inserted. Six days later, a second stroke left her more disabled than the first (Prasad et al, 2015).
Stenting was first approved by the US Food and Drug Administration in 2005, following a promising study that did not use a control group. The stroke rate was better than expected, so the procedure was approved. Thousands of patients received stents.
Six years later, in 2011, the New England Journal of Medicine published the results of a randomised trial. It found that those who got a stent were more than twice as likely to have a stroke in the next month than patients in the control group (who were assigned to medical therapy). Five patients in the treatment group died, compared to one in the control group. The results were so dramatic that the study was terminated early (Chimowitz et al 2011; see also Kolata, 2011).
Randomised trials have also shed light on how best to manage patients’ blood sugar levels. In 2002, a preliminary, unblinded, single-centre randomised trial found that using insulin to reduce blood sugar levels could improve patient survival in a surgical intensive care unit. The findings were embraced by professional societies embracing strict blood-sugar targets, and hospitals implemented practices to meet and enforce the standard.
Then, in 2009, a large twenty-four site randomised trial with a sample of over 6,000 patients tested these intensive targets against a less restrictive strategy. It found that the intensive intervention increased death by 2.6 percentage points at 90 days. In practice, that meant that for every 40 patients who received the intensive strategy, one died. Strict blood sugar targets sounded good in theory, but were harmful in practice. The result was surprising, and hospital practice changed accordingly.
Like many of you, I’ve had an appendectomy. In my own case, it was the most pleasant surgical operation I’ve ever had. I went under anaesthetic with a pain in my abdomen, and woke up pain-free.
Since the 1800s, an appendectomy has been the standard treatment for appendicitis. It’s the most common emergency surgical procedure in Australia, with around 30,000 appendectomies performed annually. Once upon a time, it was thought that going straight to surgery was the best option. Then four randomised trials compared surgery with a second intervention: antibiotics as the first option, with surgery reserved for those whose symptoms got worse. In the antibiotic arm, sixty-five percent of patients never got the surgery. To my surprise, the studies found that the rate of life-threatening outcomes and time in hospital were the same for the two groups. Moreover, antibiotics are less expensive and less invasive (Prasad et al, 2015). No-one has to cover up their antibiotic scar.
Knee and back pain are common conditions, and randomised trials have provided surprises into what works. In the case of treating osteoarthritis of the knee, it was previously common practice for surgeons to either clean out the knee using arthroscopic devices, or to inject saline into the knee to irrigate it, in a procedure known as ‘lavage’. In 2002, the results of a randomised study showed that those who had either of these procedures did no better than people who had sham surgery (Carroll, 2014).
Randomised trials have also shown that arthroscopic knee surgery for meniscus tears was no better in reducing pain or increasing functioning than physical therapy. Another trial pitted arthroscopic knee surgery for meniscus tears against sham surgery. Much to the surprise of many surgeons, sham surgery did equally well (Prasad et al, 2015).
Another popular surgery for a time involved injecting medical grade cement into osteoporotic fractures of the spine to treat chronic back pain. Alas, a randomised trial showed that it was ineffective (Prasad et al, 2015).
Randomisation has been used to test the effectiveness of countless other interventions. Along the way, they have disproved many clinical hypotheses.
Starting in the 1880s, US surgeon William Halsted pioneered the approach of treating women with breast cancer with a “radical mastectomy” where huge sections of their tissue were carved out in the hope of removing the cancer. Halsted convinced others not by the strength of his data, which was less than solid, but through his persuasive rhetoric and charismatic personality. His extreme self-assurance, possibly intensified by his addiction to cocaine, allowed him to dismiss his critics, criticising them for their lack of courage.
It was not until 1981 that a randomised trial in Canada and the United States showed that the “radical mastectomy” did not reduce mortality when compared to a lumpectomy, which involved removing only the cancerous tissue. From the 1880s to the 1980s, half a million women underwent the harm of disfiguring surgeries that did nothing to increase their odds of survival (Leigh 2024).
You don’t need to face major surgery or a life-threatening condition to be the beneficiary of insights from randomised trials.
Tamiflu is a widely used medication to treat influenza. In a 2014 Cochrane meta-analysis, it was found to only decrease flu symptoms from seven days to six days, while at the same time causing nausea and vomiting. It did not prevent virus transmission or reduce hospitalizations. There was no evidence that it reduced deaths. Over the past two decades, governments have spent around US$7 billion on Tamiflu stockpiles (Po, Farndon and Palmer 2009). The Internet coined the term “Scamiflu” to describe the whole affair (Prasad et al, 2015).
In complementary medicine, randomised trials have shown many widely used interventions to be ineffective. In a review of ten trials, glucosamine and chondroitin for joint health was found to have no effect on joint pain. Echinacea turned out not to reduce the duration of common cold. Acupuncture, when tested compared to sham acupuncture, has been found ineffective in reducing pain. Multivitamin randomised trials have found no benefit on survival, heart disease, or cancer (Prasad et al, 2015).
In Praise of Surprises
Not everyone likes surprises. But there are five reasons why it can be beneficial for medical researchers to encounter surprising results.
First, surprises promote scientific inquiry. Surprising results often indicate that there is something about the current understanding that may be incomplete or incorrect. This prompts further investigation, leading to a deeper and more accurate understanding of the subject matter. This can include identifying new risk factors for diseases, uncovering unforeseen side effects of medications, or discovering novel therapeutic targets.
Second, surprises encourage open-mindedness. Being surprised by results reminds researchers to not become too entrenched in prevailing theories or expectations. This mindset is crucial for scientific progress, as it allows for the consideration of alternative explanations and hypotheses.
Third, surprises can improve research design. Unexpected outcomes can highlight flaws or limitations in the design of a study, leading researchers to develop more robust methodologies. This can improve the quality of future research, making findings more reliable and generalisable.
Fourth, surprises can improve problem-solving skills. Dealing with unexpected results requires creativity and critical thinking, as researchers must figure out why their results deviated from expectations. This can enhance problem-solving skills and lead to innovative solutions.
Fifth, surprises can stimulate outside interest. Surprising findings can attract attention from the scientific community, the public, and funding bodies. Increased interest can lead to additional resources being allocated to explore the new findings further, potentially accelerating scientific progress in that area. They remind us that science and medicine are never perfect, and can always be improved.
Randomised Trials Expand
The number of clinical trials has increased over time, along with the number of published authors in the field. More countries and institutions are involved in publishing results, too (Vinkers et al, 2022). Randomised clinical trials have been embraced across the world.
A 2017 analysis of cumulative randomised trials published in high-ranked medical journals shows that studies have been published by lead authors in 153 countries. Although most are in the United States. Australia ranks in the top ten (Catalá-López et al, 2020).
You all know this all too well here at the Murdoch Children's Research Institute, where randomised clinical trials are used regularly to figure out what works for children’s health.
The twenty-first century is one of improving outcomes through evidence-based medicine, rather than “eminence-based medicine”.
Hundreds of years ago, medicine relied merely on the hope of a good idea to find a cure and change the trajectory of a life.
Twenty-first century evidence-based medicine relies on hope and rigour in the form of the randomised trial to work out how we can improve outcomes.
Randomised Policy Trials
Across the social sciences, researchers have been using the same idea to test the effectiveness of social policy ideas. In the United States, the Perry Preschool early childhood randomised experiment began in 1962. The Negative Income Tax Experiments began in 1968. The RAND Health Insurance Experiments began in 1971. The Project STAR class size experiment began in 1985. The Moving to Opportunity housing experiment began in 1994 (see also Chelwa et al, 2019).
In 2019, Abhijit Banerjee, Esther Duflo and Michael Kremer received the Nobel Prize in Economics for their work applying the method in development economics. In her Nobel Prize Lecture, Esther Duflo discussed the example of randomised trials of microcredit programs (Duflo 2020). These randomised trials demonstrated only modest impacts of microcredit on business formation – largely driven by existing business owners starting more businesses, rather than new households becoming entrepreneurs. Contrary to the claims of microfinance advocates, the randomised trials showed no evidence that microcredit affected spending on education or health, or made women feel more empowered. To those who had claimed that microcredit would bring about a radical transformation, this was an unexpected finding.
In Australia, randomised trials have shown that drug courts are a cost-effective way of reducing recidivism, that intensive caseworker support for long-term homeless people does not increase short-term employment rates, and that nurse home visit programs for pregnant women experiencing adversity have a positive impact, but are not cost-effective (Leigh 2018; Mudiyanselage et al 2021).
Some of these findings may surprise you, and that’s a good thing.
To expand the quantity of randomised policy trials, the Australian Government last year created the Australian Centre for Evaluation. The Australian Centre for Evaluation was established to help put evaluation evidence at the heart of policy design and decision-making. We seek to improve the volume, quality, and use of evaluation evidence to support better policy and programs that improve the lives of Australians.
Based in Treasury, the Australian Centre for Evaluation works with agencies across the Commonwealth Government to design and implement trials to answer challenging social and economic policy questions.
The Centre’s first trials have been developed in collaboration with the Department of Employment and Workplace Relations. They are seeking to understand what works to support people find jobs.
The Centre’s ambition goes beyond individual trials, to embedding good evaluation principles and practices across government and fostering an evaluative culture that supports continuous learning about what works, why, and for whom.
We do it because we believe that all Australians should have the best chance in life. We believe that our social and economic programs, just like those in medicine, should change and evolve over time. Effective interventions should replace ineffective ones.
How can we improve outcomes if we don’t know what works?
Australians should have effective supports when they are faced with life’s challenges, whether it be unemployment or illness. For that, we need to know which programs work and for whom.
It requires us to ask hard questions of our own ideas – putting our favourite programs under the microscope and submitting them to rigorous evaluation.
It is not an easy thing to do. But it is especially vital so that we can serve the most vulnerable in our communities, who often rely on government services the most.
We can anticipate important results for children, as researchers at the University of Melbourne and this Institute know well. In 2011, I spoke in Parliament about Australia's first randomised trial of an early childhood program – the Early Years Education Program (Leigh, 2011). The program ran from 2011 to 2016.
If randomly assigned to the treatment group, children were offered three years of tailored education and care. In the control group, children “received ‘usual care’, a mix of parental and guardian care as well as care and education provided by other childcare centres or kindergartens” (Murray, 2024).
Children in the treatment group had IQ scores that were up to seven points higher than those in the control group, showed improved resilience, and were less likely than those in the control group to require clinical attention for social-emotional development (Murray, 2024).
Rigorous evaluation can help children from a disadvantaged background start school on an even level with their peers.
Policy Surprises are Healthy Too
A couple of years ago, I served as a Commissioner on the Global Commission on Evidence to Address Societal Challenges (Global Commission on Evidence to Address Societal Challenges 2022). Our report led to three implementation priorities.
‘First, formalise and strengthen domestic evidence-support systems → conduct or participate in a rapid evidence-support system assessment for your country and find ways to act on the lessons learned if one has already been conducted.
‘Second, enhance and leverage the global evidence architecture → encourage funders and donors – both in your own country and those operating globally – to be part of the solution and encourage impact-oriented evidence producers – especially those producing global public goods like living evidence syntheses – to work in more coordinated ways and to build connections to domestic evidence-support networks and units.
‘Third, put evidence at the centre of everyday life → support citizen-serving non-government organisations and citizen leaders to take action in your country.’
The point is that rigorous evaluation is a mindset for which we can all be ambassadors. Each of us can ask the question: how are we collectively using rigorous methods to answer the most important questions in public policy?
We know that there are many talented individuals trained in randomised methods across Australia. But we still need to take more opportunities to test big ideas rigorously and develop the evidence base which improves outcomes.
The Australian Centre for Evaluation is creating a platform in government for this, collaborating across the federal government.
In the United Kingdom, a network of twelve ‘What Works Centres’ serve to conduct randomised trials and distil evidence on areas ranging from school education to improving life for older people. The What Works Centres are funded by a combination of government and non-government sources including the Economic and Social Research Council (their equivalent of the Australian Research Council), the Dormant Assets Scheme, and the National Lottery Community Fund. At some point down the track, it is possible to imagine something similar evolving in Australia. But for now, the Australian Centre for Evaluation is the prime agency for conducting randomised policy trials in Australia.
Just as ethics review is at the heart of randomised medical trials, so too randomised policy trials conducted by the Australian Centre for Evaluation will be carried out within a rigorous ethical framework. Building trust is vital as we work to expand the quality and quantity of policy evaluation across Australia. It is vital that evaluations are conducted ethically, carefully and transparently.
Alongside the production of evidence, the Australian Centre for Evaluation is working to improve the use of evidence – encouraging the use of high-quality evaluations over low-quality evaluations, and meta analyses over single studies.
Thank you for all that you do to improve the evidence base on what works. In policy, as in medicine, it is healthy to be surprised, and helps shape a better world.
*My thanks to Frances Kitt for valuable drafting assistance, and Jon Baron, Julian Elliott, Harriet Hiscock and Frank Oberklaid for providing several of the examples used in this speech.
References
Catalá-López, F., Aleixandre-Benavent, R., Caulley, L., Hutton, B., Tabarés-Seisdedos, R., Moher, D., & Alonso-Arroyo, A. (2020). Global mapping of randomised trials related articles published in high-impact-factor medical journals: A cross-sectional analysis. Trials, 21(1), 34.
Carroll, Aaron (2014), ‘The Placebo Effect Doesn’t Apply Just to Pills’, The Upshot, New York Times, 6 October.
Chelwa, G., Hoffmann, N. and Muller, S.M. (2019) How randomised trials became big in development economics, The Conversation. 9 December.
Chimowitz, M.I., Lynn, M.J., Derdeyn, C.P., Turan, T.N., Fiorella, D., Lane, B.F., Janis, L.S., Lutsep, H.L., Barnwell, S.L., Waters, M.F. and Hoh, B.L., 2011. Stenting versus aggressive medical therapy for intracranial arterial stenosis. New England Journal of Medicine, 365(11), pp.993-1003.
Choo K. H. A. (2003). David M. Danks, M.D., A.O. (June 4, 1931–July 8, 2003): Founder, Murdoch Childrens Research Institute. American Journal of Human Genetics, 73(5), 981–985.
Dattani, S. (2022) Why randomized controlled trials matter and the procedures that strengthen them, Our World in Data. Available at: https://ourworldindata.org/randomized-controlled-trials.
Duflo, E., 2020. Field experiments and the practice of policy. American Economic Review, 110(7), pp.1952-1973.
Global Commission on Evidence to Address Societal Challenges. 2022. The Evidence Commission report: A wake-up call and path forward for decisionmakers, evidence intermediaries, and impact-oriented evidence producers. Hamilton: McMaster Health Forum.
Kolata, Gina (2011). ‘Study Is Ended As a Stent Fails To Stop Strokes’, New York Times, 8 September, p.A1
Leigh, Andrew (2011). ‘Australia’s First Early Childhood Randomised Trial’, House of Representatives Hansard, 15 September.
Leigh, A. (2018) Randomistas: How Radical Researchers Changed Our World. Black Inc, Melbourne.
Leigh, A., 2024. The Best Medicine: Lessons From Health for Policy Randomistas. Evaluation Journal of Australasia, 24(1), pp.6-13.
Mudiyanselage, S.B., Price, A.M., Mensah, F.K., Bryson, H.E., Perlen, S., Orsini, F., Hiscock, H., Dakin, P., Harris, D., Noble, K. and Bruce, T., 2021. Economic evaluation of an Australian nurse home visiting programme: a randomised trial at 3 years. BMJ Open, 11(12), p.e052156.
Murray, A. (2024) Early years education program: 24 month results released, Faculty of Business and Economics. Available at: https://fbe.unimelb.edu.au/newsroom/early-years-education-program
Po, A.L.W., Farndon, P. and Palmer, N., 2009. Maximizing the value of drug stockpiles for pandemic influenza. Emerging Infectious Diseases, 15(10), p.1686.
Prasad, V.K. and Cifu, A.S. (2019) Ending Medical Reversal : Improving Outcomes, Saving Lives. Baltimore: Johns Hopkins University Press.
Vinkers CH, Lamberink HJ, Tijdink JK, Heus P, Bouter L, Glasziou P, et al. (2021) The methodological quality of 176,620 randomized controlled trials published between 1966 and 2018 reveals a positive trend but also an urgent need for improvement. PLoS Biology 19(4): e3001162.