Wilkerson, M. H., Erickson, T., Lee, H. S., & Finzer, W. (2026). How to be “Choosy”: Wrangling big datasets for the classroom. Teaching Statistics, 48(1), 76-96. doi: 10.1111/test.70022
Supplemental materials and code available at GitHub.
Educators are being encouraged to teach with “big” datasets that have more cases and attributes than are typically used in the classroom. When introduced carefully, these types of datasets can allow students to engage in complex and self-directed reasoning, develop data management and inquiry skills, and experience data analysis in a way that is more authentic to professional practice. However, “big data” is also often unwieldy. It can overwhelm students, overload their software tools, and interfere with planful analysis. How can we make such datasets manageable? This paper presents pedagogical strategies and technical methods for educators, educational designers, and young investigators to reduce the number of cases or attributes in a large dataset in ways that do not unduly compromise their analyses. We illustrate these strategies with interactive demonstrations in two freely-available open source tools: the Choosy plugin in CODAP, and a Python Jupyter notebook.
