- Benjamin L. Castleman
Search EdWorkingPapers by author, title, or keywords.
Benjamin L. Castleman
Data science applications are increasingly entwined in students’ educational experiences. One prominent application of data science in education is to predict students’ risk of failing a course in or dropping out from college. There is growing interest among higher education researchers and administrators in whether learning management system (LMS) data, which capture very detailed information on students’ engagement in and performance on course activities, can improve model performance. We systematically evaluate whether incorporating LMS data into course performance prediction models improves model performance. We conduct this analysis within an entire state community college system. Among students with prior academic history in college, administrative data-only models substantially outperform LMS data-only models and are quite accurate at predicting whether students will struggle in a course. Among first-time students, LMS data-only models outperform administrative data-only models. We achieve the highest performance for first-time students with models that include data from both sources. We also show that models achieve similar performance with a small and judiciously selected set of predictors; models trained on system-wide data achieve similar performance as models trained on individual courses.
Despite decades and hundreds of billions of dollars of federal and state investment in policies to promote postsecondary educational attainment as a key lever for increasing the economic mobility of lower-income populations, research continues to show large and meaningful differences in the mid-career earnings of students from families in the bottom and top income quintiles. Prior research has not disentangled whether these disparities are due to differential sorting into colleges and majors, or due to barriers lower socioeconomic status (SES) graduates encounter during the college-to-career transition. Using linked individual-level higher education and Unemployment Insurance (UI) records for nearly a decade of students from the Virginia Community College System (VCCS), we compare the labor market outcomes of higher- and lower-SES community college graduates within the same college, program, and academic performance level. Our analyses show that, conditional on employment, lower-SES graduates earn nearly $500/quarter less than their higher-SES peers one year after graduation, relative to higher-SES graduate average of $10,846/quarter. The magnitude of this disparity persists through at least three years after graduation. Disparities are concentrated among non-Nursing programs, in which gaps persist seven years from graduation. Our results highlight the importance of greater focus on the college-to-career transition.
Non-traditional students disproportionately enroll in institutions with weaker graduation and earnings outcomes. One hypothesis is that these students would have made different choices had they been provided with better information or supports during the decision-making process. We conducted a large-scale, multi-arm field experiment with the U.S. Army to investigate whether personalized information and the offer of advising assistance affect postsecondary choices and attainment among non-traditional adult populations. We provided U.S. Army service members transitioning out of the military with a package of research-based information and prompts, including quality and cost information on a personalized set of matched colleges, messages targeted at addressing veteran-specific concerns or needs, and reminders about key stages in the college and financial aid application process. For a randomly selected subset of the experimental sample, we also provided service members with opportunities to connect with a college advisor. We find no overall impact of the intervention on whether service members enroll in college, on the quality of their college enrollment, or on their persistence in college. We find suggestive evidence of a modest increase in degree completion within the period of observation, with these impacts mainly driven by increased attainment at for-profit institutions. Our results suggest that influencing non-traditional populations’ educational decisions and outcomes will require substantially more intensive programs and significant resources.
The COVID-19 pandemic led to an abrupt shift from in-person to virtual instruction in Spring 2020. We use two complementary difference-in differences frameworks, one that leverages within-instructor-by-course variation on whether students started their Spring 2020 courses in person or online and another that incorporates student fixed effects. We estimate the impact of this shift on the academic performance of Virginia’s community college students. With both approaches, we find modest negative impacts (three to six percent) on course completion. Our results suggest that faculty experience teaching a given course online does not mitigate the negative effects. In an exploratory analysis, we find minimal long-term impacts of the switch to online instruction.
Recent state policy efforts have focused on increasing attainment among adults with some college but no degree (SCND). Yet little is actually known about the SCND population. Using data from the Virginia Community College System (VCCS), we provide the first detailed profile on the academic, employment, and earnings trajectories of the SCND population, and how these compare to VCCS graduates. We show that the share of SCND students who are academically ready to reenroll and would benefit from doing so may be substantially lower than policy makers anticipate. Specifically, we estimate that few SCND students (approximately three percent) could fairly easily re-enroll in fields of study from which they could reasonably expect a sizable earnings premium from completing their degree.
Nearly half of students who enter college do not graduate. The majority of efforts to increase college completion have focused on supporting students before or soon after they enter college, yet many students drop out after making significant progress towards their degree. In this paper, we report results from a multi-year, large-scale experimental intervention conducted across five states and 20 broad-access, public colleges and universities to support students who are late in their college career but still at risk of not graduating. The intervention provided these “near-completer” students with personalized text messages that encouraged them to connect with campus-based academic and financial resources, reminded them of upcoming and important deadlines, and invited them to engage (via text) with campus-based advisors. We find little evidence that the message campaign affected academic performance or attainment in either the full sample or within individual higher education systems or student subgroups. The findings suggest low-cost nudge interventions may be insufficient for addressing barriers to completion among students who have made considerable academic progress.
We combine a large multi-site randomized control trial with administrative and survey data to demonstrate that intensive advising during high school and college leads to large increases in bachelor's degree attainment. Novel causal forest methods suggest that these increases are driven primarily by improvements in the quality of initial enrollment. Program effects are consistent across sites, cohorts, advisors, and student characteristics, suggesting the model is scalable. While current and proposed investments in postsecondary education focus on cutting costs, our result suggest that investment in advising is likely to be a more efficient route to promote bachelor's degree attainment.
In-person college advising programs generate large improvements in college persistence and success for low-income students but face numerous barriers to scale. Remote advising models offer a promising strategy to address informational and assistance barriers facing the substantial majority of low-income students who do not have access to community-based advising, yet the existing evidence base on the efficacy of remote advising is limited. We present a comprehensive, multi-cohort experimental evaluation of CollegePoint, a national remote college advising program for high-achieving low- and moderate-income students. Students assigned to CollegePoint are modestly more likely (1.3 percentage points) to attend higher-quality institutions. Results from mechanism experiments we conducted within CollegePoint indicate that moderate changes to the program model, such as a longer duration of advising and modest expansions of the pool of students academically eligible to participate, do not lead to larger program effects. We also capitalize on across-cohort variation in whether students were affected by COVID-19 to investigate whether social distancing required by the pandemic increased the value of remote advising. CollegePoint increased attendance at higher-quality institutions by 3.2 percentage points for the COVID-19-affected cohort. Acknowledgements.
Colleges have increasingly turned to predictive analytics to target at-risk students for additional support. Most of the predictive analytic applications in higher education are proprietary, with private companies offering little transparency about their underlying models. We address this lack of transparency by systematically comparing two important dimensions: (1) different approaches to sample and variable construction and how these affect model accuracy; and (2) how the selection of predictive modeling approaches, ranging from methods many institutional researchers would be familiar with to more complex machine learning methods, impacts model performance and the stability of predicted scores. The relative ranking of students’ predicted probability of completing college varies substantially across modeling approaches. While we observe substantial gains in performance from models trained on a sample structured to represent the typical enrollment spells of students and with a robust set of predictors, we observe similar performance between the simplest and most complex models.