Why RCTs Are Not a Promising Tool for Development
He was a bold man that first ate an oyster.
— Jonathan Swift
If there has been a“next big thing” in the field of economics during the past decade, it is the application of techniques from medical research — specifically, “randomized controlled trials,” or RCTs — to assess the effectiveness of development or other government-initiated projects.
The general thrust of the work is as simple as it is seemingly brilliant: rather than employ complex, and often unreliable, econometric methods to tease out the extent to which a project or policy actually had a beneficial impact on intended beneficiaries, why not follow the tried and true methods employed by pharmaceutical researchers to assess the efficacy of medical treatments? The steps are:
- Randomly divide the experimental population into a treatment group that receives “benefits” from the program, and a control group that doesn’t.
- Assess outcomes for both groups.
- Determine whether or not a significant difference exists between the two groups.
This approach, championed by MIT economists Esther Duflo and Abhijit Banerjee (recent recipients of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, also known as the Nobel Prize in economics, and the coauthors of the book Poor Economics) among others, seems a self-evidently fabulous improvement on the status quo in development aid, which not infrequently in the past involved assessing program effectiveness by simply checking whether or not the money was spent. Clearly, a world in which aid money is allocated to projects that actually benefit people is preferable to one in which money goes to whoever most effectively maneuvers to get the money to start with and then most reliably manages to spend it. In this way, the use of RCTs in development arguably advances the goal of aid effectiveness, famously championed by NYU economist William Easterly in a series of books and a popular blog titled Aid Watch whose slogan is “Just Asking that Aid Benefit the Poor.”
Describe your favorite entrepreneurial initiative to one of the many “development” economists influenced by this line of thinking and the most probable response you’ll receive is: “Sounds wonderful, but where’s the evidence of effectiveness?” This question will be followed by the following admonition: “In order to find out whether or not it worked, you really should do an RCT.”
So what is the problem with applying RCTs to development?
Aside from their expense (administering an RCT generally costs upwards of $100,000), the Achilles heel of RCTs is a little thing known to the statistically inclined as “external validity” — a phrase that translates informally to “Who cares?”
The concept of external validity is straightforward. For any assessment, “internal validity” refers to the mechanism of conducting a clinical trial, and the reliability of results on the original setting. A professionally conducted RCT that yields a high level of statistical significance is said to be “internally valid.” However, it is fairly obvious that an intervention rigorously proven to work in one setting may or may not work in another setting. This second criterion — the extent to which results apply outside the original research setting — is known as “external validity.” External validity may be low because the populations in the original and the new research setting are not really comparable — for example, results of a clinical trial conducted on adults may not apply to children. But external validity may also be low because the environment in the new study setting is different in some fundamental way, not accounted for by the researcher, from the original study setting. Econometric studies that seek to draw conclusions about effectiveness from data that span large geographical areas or highly varied populations thus typically have lower levels of internal validity, but higher levels of external validity.
The fundamental issue is not the purity of the methodology employed (as exciting as such methodological purity is to the technically inclined) but rather the inherent complexity of the world being studied.
As (actual development economist) Ricardo Haussman states it:
Another method that looses its appeal in a world of high dimensionality is the randomized trial approach. A typical program, whether a conditional cash transfer, a micro-finance program or a health intervention can easily have 15 relevant dimensions. Assume that each dimension can only take 2 values. Then the possible combinations are 2^15 or 32,768 possible combinations. But randomized trials can only distinguish between a control group and 1 to 3 treatment groups.
Or as Don Berwick states in the context of public health interventions:
How can accumulating local reports of effectiveness of improvement interventions, such as rapid response systems, be reconciled with contrary findings from formal trials with their own varying imperfections? The reasons
for this apparent gap between science and experience lie deep in epistemology. The introduction of rapid response systems in hospitals is a complex, multicomponent intervention — essentially a process of social change. The effectiveness of these systems is sensitive to an array of influences: leadership, changing environments, details of implementation, organizational history, and much more. In such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect. (Emphasis added)
RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not “what works,” but “why things work”.
Those who most vociferously and naïvely advocate that we apply techniques from medical research to economics make a fundamental error: They fail to appreciate the fact that, when it comes to external validity, medical research is the exception that proves the rule. Indeed, in aid-led development in general, of the few real historical successes, nearly all are in public health. Outside of public health, few of the large-scale, top-down development programs have in fact succeeded.
Why is this? Multiple conjectures are possible. But one persuasive one is this: when it comes to biophysical function, people are people. For this reason, a carefully developed medical protocol proven to be effective for one population is highly likely to work for another population. The smallpox vaccine tested on one population tended to work on other populations; this made it possible to eradicate smallpox. Oral rehydration therapy tested on one group of children tended to work of other groups of children; millions of children have been spared preventable deaths because the technique has been adopted on a global basis. Indeed, medical protocols have such a high level of external validity that, in the United States alone, tens if not hundreds of thousands of lives could be saved every year through a more determined focus on adherence to their particulars.
These huge successes were achieved, and continue to be achievable, though bold action taken by public health officials. They are rightly celebrated and encouraged, but — outside of other public health applications — not easily replicated. Successes in medicine contrast sharply with failures in other domains. Decades of efforts to design and deploy improved cook-stoves — with the linked aims of reducing both deforestation and the illness and death due to indoor air pollution — have so far primarily yielded an accumulation of Western inventions maladapted to needs and realities in various parts of the world, along with locally developed innovations that cannot be expanded to meet the true scale of the challenge. For development programs in general, and RCTs in particular, public health is the exception that proves the rule.
What does work in areas outside of public health? How is it possible to design, test, and implement effective solutions in environments where complexity and volatility are dominant?
The general principle applies: Success requires adaptability as well as structure, flexibility as well as structure — a societal capacity to scale successful efforts combined with an engrained practice of entrepreneurial exploration. As the uniquely insightful Mancur Olson wrote in his classic Power and Prosperity:
Because uncertainties are so pervasive and unfathomable, the most dynamic and prosperous societies are those that try many, many things. They are societies with countless thousands of entrepreneurs who have relatively good access to credit and venture capital.
What works in development, according to Olsen, is experimentation. Why? Because we don’t know what works.
To relieve the tedium created by repeated uses of the words “validity” and “economics,” let’s talk about food.
Now (bear with me for a moment here!) let’s say you’re back in the historical paradise of planning, namely the Soviet Union. Every day, you have to eat the same glorious institutional food provided in your communal cafeteria.
You’re sick of it, but you can’t find a way out.
Then one day, a leader arrives, with a banner that reads “Just asking that the food not suck!” You cheer! You hoist your comrade on your shoulders! At last, you are fighting back against the system. The battle for better cafeteria food is on!!
But what is the opportunity that is missed here? What is the thing you really need, that you’re not going to get from the “Just Asking that Aid Benefit the P…” — I, mean, the “Just Asking that the Food not Suck” campaign?
What you’re not getting, and what you really need, is some new restaurants. Yes, that would be just the thing. Some options. You would like another place to go to eat.
There is a general rule here: What really drives change isn’t protest, but genuine competition driven by entrepreneurial entry. Think about food in airports 25 years ago, if you were alive then. All Sodexo monopoly. Uniformly terrible and expensive. Now, with entry and competition for licenses, the food in the airport is at least as good as what you get outside the airport.
To close the loop on the above discussion of assessment methodology, it is worth noting that development driven by entrepreneurship (also known as “development”) is comprised of randomized out-of-control trials. That would be — yes! — the opposite of randomized controlled trials.
Again, why does this matter? Stop and think. In what U.S. industry do clinical trials dominate? That would be pharmaceuticals. And in what industry are markups higher, and barriers to entry greater, than they are in the pharmaceuticals industry? The answer to that question is, of course, no other industry. When it comes to persistent oligopoly, the pharmaceutical beats them all.
A very big part of the reason for this is that large-scale clinical trials are expensive. But you can’t sell a drug without them. (For mostly good reasons, I might add, in the case of medicines.) So even successful biotech companies have had great difficulty breaking into the business of conducting their own clinical trials; instead they often partner with “Big Pharma” on the last mile of drug development.
Now I’m not saying we should abolish the Food and Drug Administration. I’m just asking this question: Is the increasingly widespread use of RCTs a move in the direction of a Food and Drug Administration for development, if not in a hardwired, institutional sense then instead in the sense of customs, standards, and expectations? Will “higher standards of evidence” not only distort resource allocation (if outcomes are improperly defined) but also create barriers to entry? Won’t this favor incumbents, or outside consultants flown in to do the work? Might not all of these “secondary” effects more than outweigh any benefit gained from “better” standards of evaluation?
Furthermore, might we not do better by studying the work of those exceptional entrepreneurs who do a particularly remarkable job in creating social value, and putting our resources into supporting the nascent efforts of others like them, using an approach to evaluation that is actually appropriate to entrepreneurship?
Instead of putting our faith in randomized controlled trials whose beneficial impacts are uncertain, shouldn’t we bet on the process of randomized out-of-control trials (a.k.a. entrepreneurship and innovation) that has been the very definition of development and growth pretty much everywhere in the world for five centuries?
A set of three principles applies:
- If solutions are known, people need money. When external validity is high and the potential for future gains is large, the most significant constraint is inadequacy of investments — easily proxied by cash dollars
- If solutions are knowable, people need assessment. In settings where effective general solutions have not yet been discovered, but are in principle knowable, due to stability within, and similarly among, environments in which the particular problem presents itself.
- If solutions are evolving, people need entrepreneurship. Everywhere else — where complexity and volatility dominate and creative adaptation is requirement — progress will be attainable through entrepreneurship.
Very good cooks may be able to profit from their skill by, for example, charging for the meals that they prepare. (Ergo, restaurants.) Alternately, they may painstakingly document their techniques so that others can duplicate their skill. (Ergo, recipes.) Yet, even when a recipe is well spelled out, dimensions of discretion that inevitably fall to the cook can lead to very different outcomes. As Joseph Schumpeter observed a century ago in The Theory of Economic Development,
The necessity of making decisions occurs in any work. No cobbler’s apprentice can repair a shoe without making some resolutions and without deciding independently some question, however small.
In economic life, managers are cooks. Some are better, some are worse. Good ones can make money from their skill. Bad ones botch even the easiest recipes. Variance among managers makes it difficult to sort out, when assessing a project (as when eating a meal) whether any shortcomings experienced are due chef’s lack of skill, or to the recipe employed.
The remarkable persistence of markets for cookbooks suggests that nourishment depends on much more than ensuring that existing recipes are properly prepared — as important as that can be. It also, and perhaps more fundamentally, depends on the creation of new recipes.
Or, to get past the metaphor: sustained prosperity depends on more than capable managers. It depends on more than blueprints, manuals, and franchises. It depends on more than projects with goals, targets, and timetables.
It is obviously the case that society can subsist without both culinary and economic innovators. Indeed, depending on one’s approach to measuring human well-being, the sum total of organizational and technological advance since the time of the first writing — and the first recipes — has at some level been unnecessary. Humanity certainly lived in greater harmony with nature 6,000 years ago than we do today. Antibiotics, air travel, and modern methods of agriculture may be delightful along some dimensions, but they have, each in their own way, also contributed to increasing humanity’s vulnerability to climatic, viral, and ecological calamities.
Yet the fact of the matter is that, six millennia following the advent of writing, humans have more options on the menu than just Meat Assyrian Style and Sumerian Beer.
Human progress in the long run — by which I mean sustained and sustainable prosperity — is fundamentally not about routines, but about novelty. It depends on invention in the face of change. It depends on creativity with limited resources.
It depends on entrepreneurship.
Parts of this post were adapted from The Coming Prosperity (Oxford University Press, 2012.)
Also recommended: Sanjay Reddy, “Economics’ Biggest Success Story Is a Cautionary Tale.”