G2TT
来源类型Report
规范类型报告
Do randomized controlled trials meet the ‘gold standard’?
Alan Ginsburg; Marshall S. Smith
发表日期2016-03-15
出版年2016
语种英语
摘要Key Points The What Works Clearinghouse (WWC) identifies studies that provide credible and reliable evidence of a given intervention’s effectiveness. Yet there is a high potential for serious estimation bias when a Randomized Control Trial (RCT) involves a complex intervention, such as the implementation of an education curriculum. Data from an analysis of the usefulness of the 27 RCT mathematics studies meeting minimum WWC standards show that 26 of the 27 RCTs have multiple serious threats to their usefulness. To address these threats, the Institute of Education Sciences should review analyses of the RCTs, examine other curriculum studies and RCTs in the WWC, improve evaluations of education materials and practices, and support a study by a panel of unbiased experts and users to examine the quality of RCT studies in noneducation sectors. Executive Summary The What Works Clearinghouse (WWC), which resides in the Institute of Education Sciences (IES), identifies studies that provide credible and reliable evidence of the effectiveness of a given intervention. The WWC gives its highest rating of confidence to only well-implemented Randomized Control Trial (RCT) designs. RCTs are clearly the “gold standard” to minimize bias in outcomes from differences in unmeasured characteristics between treatment and comparison populations. Yet when the treatment is a complex intervention, such as the implementation of an education curriculum, there is a high potential for other sources of serious estimation bias. Our analysis of the usefulness of each of the 27 RCT mathematics studies (grades 1–12) meeting minimum WWC standards identifies 12 nonselection bias threats, many of which were identified in a 2004 National Research Council (NRC) report. These nonselection bias threats are not neutralized by randomization of students between the intervention and comparison groups, and when present, studies yield unreliable and biased outcomes inconsistent with the “gold standard” designation. Threats to the usefulness of RCTs include: In 12 of the 27 RCT studies (44 percent), the authors had an association with the curriculum’s developer. In 23 of 27 studies (85 percent), implementation fidelity was threatened because the RCT occurred in the first year of curriculum implementation. The NRC study warns that it may take up to three years to implement a substantially different curricular change. In 15 of 27 studies (56 percent), the comparison curricula are either never identified or outcomes are reported for a combined two or more comparison curricula. Without understanding the comparison’s characteristics, we cannot interpret the intervention’s effectiveness. In eight of nine studies for which the total time of the intervention was available, the treatment time differed substantially from that for the comparison group. In these studies we cannot separate the effects of the intervention curriculum from the effects of the differences in the time spent by the treatment and control groups. In 19 of 20 studies, a curriculum covering two or more grades does not have a longitudinal cohort and cannot measure cumulative effects across grades. In 5 of 27 studies (19 percent), the assessment was designed by the curricula developer and likely is aligned in favor of the treatment. In 19 of 27 studies (70 percent), the RCTs were carried out on outdated curricula. Moreover, the magnitude of the error generated by even a single threat is frequently greater than the average effect size of an RCT treatment. Overall, the data show that 26 of the 27 RCTs in the WWC have multiple serious threats to their usefulness. One RCT has only a single threat, but we consider it serious. We conclude that none of the RCTs provide sufficiently useful information for consumers wishing to make informed judgments about which mathematics curriculum to purchase. As a result of our findings, we make five recommendations. Note that all reports stemming from the five recommendations should be made public. Recommendation 1: IES should review our analyses of the 27 mathematics curriculum RCTs and remove those that, in its view, do not provide useful information for WWC users. The IES should make their judgments and rationale public. Recommendation 2: The IES should examine the other curriculum studies and curriculum RCTs in the WWC. The review should be based on the same criteria as in recommendation 1, and the IES should remove those studies that, in their view, do not provide useful information. Recommendation 3: The IES should review a representative sample of all the other noncurricula RCT intervention studies in the WWC. The review should use the same criteria and standards as in recommendations 1 and 2. Studies that do not meet the standards established for the reviews of the curriculum studies should be removed from the WWC. Recommendation 4: Evaluations of education materials and practices should be improved. First, the IES should create an internal expert panel of evaluators, curriculum experts, and users (for example, teachers and administrators) to consider how, in the short term, to improve the current WWC criteria and standards for reviewing RCTs in education. Second, the IES and the Office of Management and Budget (OMB) should support an ongoing, five-year panel of experts at the NRC or the National Academy of Education to consider what would be an effective evaluation and improvement system for educational materials and practices for the future. It should also consider how this system might be developed and supported and what the appropriate role of the federal government should be in designing, creating, and administering this system. Recommendation 5: OMB should support a three-year study by a panel of unbiased experts and users convened by the NRC to look at the quality of RCT studies in noneducation sectors. We see no reason to expect that RCTs funded out of the Labor Department, HUD, Human Services, Transportation, or USAID would be immune from many of the flaws we find in the mathematics curriculum RCTs in the WWC.   Introduction The What Works Clearinghouse (WWC), instituted in 2002 as part of the Institute of Education Sciences (IES) within the US Department of Education, describes its mission as thus: “The goal of the WWC is to be a resource for informed education decision-making. To reach this goal, the WWC reports on studies that provide credible and reliable evidence of the effectiveness of a given practice, program, or policy (referred to as ‘interventions’).”[1] The purpose of our review is to determine how useful randomized controlled trials (RCTs) in the WWC might be in helping teachers and school administrators make accurate, informed decisions about their choice of mathematics curricula. The WWC compiles high-quality evidence on curricula’s effectiveness and makes it available online, but for that evidence to be useful, it must present an accurate picture of each curriculum’s effectiveness.[2] To analyze this, we examine all intervention studies of mathematics curricula in elementary, middle, and high schools using RCT methodology that were reported on the WWC website on December 1, 2014. We have no quarrel with the powerful logic and overall potential of the methodology of RCTs. A well-implemented RCT is an important tool for finding an unbiased estimate of the causal effect of a deliberate intervention. RCTs have been held in high esteem and actively used since the 1920s, when R. A. Fischer used controlled experiments to improve farming, as well as today, as the National Institutes of Health uses RCTs to evaluate the effectiveness of drugs and medical procedures.[3] Experimental psychologists and scientists also make good use of RCTs, which work especially well in highly controlled settings where the character of the intervention and the control groups are very clear. Over the past two decades, many organizations and US government officials have touted RCTs as the best way to produce serious evidence of the effects of various social and educational interventions, labeling RCTs as the “gold standard” for evaluating government programs.[4] Yet not every statistician and scholar has unilateral faith in RCTs. The first concern is that while a single well-done RCT has internal validity, it is carried out with a particular intervention, for a particular sample, at a particular time, and in a particular place, and it produces a valid estimate of the intervention’s effect only in that setting. Therefore, most single RCTs do not have external validity.[5] To address this issue, William Shadish, Thomas Cook, and Donald Campbell propose that a meta-analysis of multiple trials could help establish external validity.[6] Others argue that the strength of the instructional design and learning theory embedded in the intervention can help guide which RCTs are carried out to establish external validity.[7] But no one argues that the results of a single RCT will necessarily generalize to different populations at different times and places. Second, a single RCT establishes only one data point within the distribution of estimates of a true “effect” size. The single data point may be atypical, even in the most carefully designed studies. Advocates and skeptics of RCTs urge replication of RCT studies, as Donald Campbell commented in 1969: “Too many social scientists expect single experiments to settle issues once and for all. This may be a mistaken generalization from the history of great crucial experiments. In actuality the significant experiments in the physical sciences are replicated thousands of times.”[8] A National Research Council (NRC) report of curricular effectiveness recommends strongly that RCTs should have at least one replication before the evidence from the original RCT is used.[9] This caution is almost always followed in the health science field but is all too often ignored in education and other social science areas.[10] Third, ensuring that an RCT study has internal validity requires more than randomization, appropriate statistical methodology, and replication. Many scholars have identified problems with inferences drawn from RCTs that are used to determine the effectiveness of interventions in settings sensitive to a wide variety of design and implementation threats. Again returning to Campbell: “We social scientists have less ability to achieve ‘experimental isolation,’ because we have good reasons to expect our treatment effects to interact significantly with a wide variety of social factors many of which we have not yet mapped.”[11] Implementing a curriculum is complex. Effectively using a curriculum requires deep understanding of its content and pedogogy and of the various instructional needs of the 20 to 30 different students in a classroom. Moreover, a school or classroom environment is complex and subject to minor and major disruptions. These characteristics produce special evaluation challenges. Complex interventions have interacting components within the intervention and the environment and require behaviors that are difficult to implement effectively. Because of these components, good experimental design requires rigorous attention to numerous internal validity threats to the study’s usefulness. The British Medical Research Council’s evaluation recommendations for complex interventions include: “A good theoretical understanding is needed of how the intervention causes change” and that “lack of effect may reflect implementation failure (or teething problems) rather than genuine ineffectiveness; a thorough process evaluation is needed to identify implementation problems.”[12] This report examines 12 potential threats to the usefulness of the 27 RCT mathematics curriculum studies (grades 1–12) that were in the WWC on December 1, 2014. From our examinations of possible threats, we ask whether the RCTs offer credible and reliable evidence and useful knowledge for making decisions about state, school, or classroom curricula.[13] We conclude with some general observations and five recommendations to the IES and the Office of Management and Budget (OMB) in the federal government. Read the full report.  See the appendix.  Notes
主题K-12 Schooling
标签education ; K-12 education ; School curriculum
URLhttps://www.aei.org/research-products/report/do-randomized-controlled-trials-meet-the-gold-standard/
来源智库American Enterprise Institute (United States)
资源类型智库出版物
条目标识符http://119.78.100.153/handle/2XGU8XDN/206228
推荐引用方式
GB/T 7714
Alan Ginsburg,Marshall S. Smith. Do randomized controlled trials meet the ‘gold standard’?. 2016.
条目包含的文件
文件名称/大小 资源类型 版本类型 开放类型 使用许可
Do-randomized-contro(467KB)智库出版物 限制开放CC BY-NC-SA浏览
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Alan Ginsburg]的文章
[Marshall S. Smith]的文章
百度学术
百度学术中相似的文章
[Alan Ginsburg]的文章
[Marshall S. Smith]的文章
必应学术
必应学术中相似的文章
[Alan Ginsburg]的文章
[Marshall S. Smith]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Do-randomized-controlled-trials-meet-the-gold-standard.pdf
格式: Adobe PDF

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。