Andrew D. Ho

The Reliability of Classroom Observations and Student Surveys in Non-Research Settings: Evidence from a Middle-Income Country

Alejandro J. Ganimian, Andrew D. Ho, Alejandra Campos Quintero. April 2026

Topics: Teacher and Leader Development

Tags: Assessment, International and comparative, Teacher hiring and retention

We present one of the first Generalizability studies of non-test measures of teaching effectiveness administered by practitioners in a middle-income country. The reliability of observations varies widely (from 0 to 0.75 on a 0-1 scale) and depends upon their context (whether they are conducted… more →
Download 04/2026
The reliability of classroom observations and student surveys in non-research settings: Evidence from Argentina

Alejandro J. Ganimian, Andrew D. Ho, Alejandra Campos Quintero. November 2025

Topics: Teacher and Leader Development

Tags: Assessment, International and comparative, Teacher hiring and retention

There is a growing consensus on the need to measure teaching effectiveness using multiple instruments. Yet, guidance on how to achieve reliable ratings derives largely from formal research in high-income countries. We study the reliability of classroom observations and student surveys conducted… more →
Download 11/2025
Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes

Joshua B. Gilbert, Zachary Himmelsbach, Luke W. Miratrix, Andrew D. Ho, Benjamin W. Domingue. August 2025

Topics: Methods

Tags: Assessment, Efficacy

Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the… more →
Download 08/2025
Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials

Ishita Ahmed, Masha Bertling, Lijin Zhang, Andrew D. Ho, Prashant Loyalka, Hao Xue, Scott Rozelle, Benjamin W. Domingue. April 2023

Topics: Methods

Tags: Assessment, Instructional practices, Learning environments

Researchers use test outcomes to evaluate the effectiveness of education interventions across numerous randomized controlled trials (RCTs). Aggregate test data—for example, simple measures like the sum of correct responses—are compared across treatment and control groups to determine whether an… more →
Download 04/2023
Ordinal Approaches to Decomposing Between-group Test Score Disparities

David M. Quinn, Andrew D. Ho. November 2020

Topics: Methods

Tags: Assessment, Equity

The estimation of test score “gaps” and gap trends plays an important role in monitoring educational inequality. Researchers decompose gaps and gap changes into within- and between-school portions to generate evidence on the role schools play in shaping these inequalities. However, existing… more →
Download 11/2020
How Can Released State Test Items Support Interim Assessment Purposes in an Educational Crisis?

Emma M. Klugman, Andrew D. Ho. September 2020

Topics: Standards, Assessment, and Curriculum

Tags: Covid-19 recovery, Assessment

State testing programs regularly release previously administered test items to the public. We provide an open-source recipe for state, district, and school assessment coordinators to combine these items flexibly to produce scores linked to established state score scales. These would enable… more →
Download 09/2020

Search and Filter

Andrew D. Ho

The Reliability of Classroom Observations and Student Surveys in Non-Research Settings: Evidence from a Middle-Income Country

The reliability of classroom observations and student surveys in non-research settings: Evidence from Argentina

Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes

Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials

Ordinal Approaches to Decomposing Between-group Test Score Disparities

How Can Released State Test Items Support Interim Assessment Purposes in an Educational Crisis?