Cristóbal Ruiz-Tagle Coloma: Performance Gaps in High-Stakes Testings: the Role of Textual Context

Seminars - PhD JM Practice Talk - Applied Micro
Speakers
Cristobal Ruiz-Tagle Coloma, Bocconi University
12:30pm - 1:45pm
Alberto Alesina Seminar Room 5-E4-SR04 - Floor 5 - via Roentgen 1

Abstract

Standardized tests are crucial for determining educational and professional opportunities, but they are under increasing scrutiny for replicating existing educational inequalities. While much research has examined how testing environments contribute to performance gaps, the impact of textual content within test questions has received less attention. This paper investigates how performance disparities related to socioeconomic status (SES), gender, and ethnicity are predicted by the contextual features of questions in Brazil’s ENEM—the second-largest college admission test in the world. Using data from over 3.8 million senior high-school test-takers across 13 years (2010–2022), I analyze question-specific performance gaps and link them to the multidimensional space of words used in each question. Through bag-of-words and topic modeling combined with penalized regressions, I identify specific words and topics in the question text that strongly predict these gaps. Hypotheses are generated independently by interpreting the common patterns in these words and topics, with the interpretation provided by ChatGPT. Six hypotheses are produced: two each for SES, gender, and ethnicity, focusing on widening and reducing channels of performance. These are tested using a rich set of fixed effects at the individual-question level. The results reveal that SES gaps increase by 1.4 percentage points (23% of the overall SES gap) when questions feature financial concepts, especially among higher-ability test-takers. Gender gaps widen by 1.1 percentage points (30% of the average gender gap) when questions are framed using abstract scientific contexts, but this effect only emerges among high-ability female test-takers and is not driven by domains where females generally underperform. The presence of female characters tends to offset the widening effect, whereas the presence of underprivileged characters amplifies the SES gap. Additionally, practical problem-solving scenarios narrow gender gaps by 0.6 percentage points across all ability levels. No significant textual features were found to affect ethnic gaps. These findings offer crucial insights for test design and suggest a data-driven approach to improving fairness in other testing contexts.

For Information contact angela.baldassarre@unibocconi.it or giulia.zenoni@unibocconi.it.