Hendricks, & Fuchs, D. (2020). Are Individual Differences in Response to Intervention Influenced by the Methods and Measures Used to Define Response? Implications for Identifying Children With Learning Disabilities. Journal of Learning Disabilities, 53(6), 428–443. https://doi.org/10.1177/0022219420920379
Measuring response to intervention is important for capturing change in a student’s progress over time. However, there are numerous ways to measure change, highlighting a need to understand how different methods and measures are used to claim that a student has indeed changed as a result of the intervention. In this study, the authors compared three measures (near-transfer vs. mid-transfer vs. far-transfer) and two methods (final status vs. growth method) to evaluate response to their reading intervention:
Briefly, in their reading intervention, students who had poor reading skills were tutored 3x/week for 14-15 weeks. In each 45-minute session, students were taught strategies for understanding texts related to social studies or science topics. Students were then tested on their knowledge by answering multiple choice questions related to the passage they read.
Measures:
1. Near-transfer: Near-transfer refers to applying the knowledge learned from the intervention to a very closely related task. Near-transfer passages used the same social studies or science topics, but different passages than those used in the intervention. After reading the passage, students answered multiple-choice questions, similar to the structure of the intervention.
2. Mid-transfer: Mid-transfer passages were about topics not addressed in the program (e.g., geography), but following the same format. Students were assessed with both multiple-choice questions and fill-in-the-blank questions.
3. Far-transfer: Far-transfer would indicate that learning applied beyond the trained tasks. In this study, standardized reading comprehension tests were used as a measure of far-transfer. This included the Wechsler Individual Achievement Test and Gates-MacGinitie Reading Test.
Methods:
1. Final status: The final status (also called “normalization method”) was captured in two ways for the standardized vs. experimental tasks. On the far-transfer tests, students were identified as having changed if they received a final, post-treatment score above a standard score of 100 (50th percentile). On the near- and mid-transfer tests, students were classified as having changed if they had final scores of 75% and 87.5% correct on these experimental tasks.
2. Growth method: The growth method was also captured in two ways. The authors were able to calculate the reliable change index (RCI) for the standardized tests. RCI compares pre- and post-treatment outcomes; students who changed would have an RCI of greater than 1.96 (corresponding to a cut off on the standardized normal curve indicating the 5% tail of the distribution) . A “limited norm criterion” was used to capture change in the experimental tasks. Here, the authors determined the average change score for the group and students who made a reliable change improved their post-treatment score at/above that value. For example, students would have to improve their score by 3.5 on the near-transfer task to be classified as having changed.
The results revealed that different measure-method combinations captured change differently. Rates of change ranged from 19% (far-transfer using RCI) to 80% (near-transfer using final status) depending on the method being used. Accordingly, there was low and inconsistent agreement on who improved across the methods of identifying change. On most measures, it was revealed that students with higher pretreatment scores were more likely to be classified as changed. However, on some growth measures, a slightly different pattern emerged: children with lower pretreatment scores (e.g., on all transfer-related measures) were more likely to have improvements following the intervention.
The authors conclude that while capturing change is important to evaluate the effectiveness of intervention more broadly, the methods for doing so are currently arbitrary. Having benchmarks to compare responsiveness among measures would be useful for clinical decision making and we await future research to come to a consensus on how we should define change.