How Prior Information from National Assessments can be used when Designing Experimental Studies  without a Control Group

Don Van den Bergh; Nina Vandermeulen; Marije Lesterhuis; Sven De Maeyer; Elke Van Steendam; Gert Rijlaarsdam; Huub Van den Bergh

doi:10.17239/jowr-2023.14.03.05

Authors

Don Van den Bergh University of Amsterdam
Nina Vandermeulen University of Antwerp
Marije Lesterhuis University of Antwerp
Sven De Maeyer University of Antwerp
Elke Van Steendam KULeuven
Gert Rijlaarsdam University of Amsterdam
Huub Van den Bergh University of Utrecht

DOI:

https://doi.org/10.17239/jowr-2023.14.03.05

Keywords:

Prior information, Baseline comparison, Bayesian inference

Abstract

National assessments yield a description of the proficiency level in a domain while accounting for differences between tasks. For instance, in writing assessments the level of proficiency is typically evaluated with a variety of topics and multiple tasks. This enables generalizations from specific tasks to a domain. In (quasi-)experimental research, however, writing skills are often evaluated with a single task. Yet, conclusions about the effectiveness of the treatment are formulated on the level of the domain, which is, euphemistically put, quite a stretch. Although conclusions drawn about the effect of the treatment are specific to the task administered, they are often generalized to the domain without any form of reservation. This raises the question whether we can use the results of national assessments about differences between tasks in the analyses of experimental studies. In this paper, we demonstrate how the information of a baseline data set can be used as a kind of control condition in the analysis of an experimental study.

References

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-.-J.,Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D.,Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A.,Easwaran, K., Efferson, C., . . . Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10. https://doi.org/10.1038/s41562-017-0189-z

Blok, H. (1986). Essay rating by the comparison method. Tijdschrift voor onderwijsresearch, 11 (4), 169–176.

Bouwer, R., Koster, K., & van den Bergh, H. (2017). Leren schrijven met tekster: Een wetenschappelijk beproefde lesmethode voor het basisonderwijs [Learning to write with tekster: A scientifically proven teaching method for elementary schools]. Pedagogische studiën, 94 (4), 304–329.

Bouwer, R., Béguin, A., Sanders, T., & Van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32 (1), 83–100.

https://doi.org/10.1177/0265532214542994

Braun, H., Kirsch, I., & Yamamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th-grade naep reading assessment. Teachers college record, 113 (11), 2309–2344. https://doi.org/10.1177/016146811111301101

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80 (1), 1–28. https://doi.org/10.18637/jss.v080.i01

Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological bulletin, 54 (4), 297. https://doi.org/10.1037/h0040950

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of statistical software, 76 (1). https://doi.org/10.18637/jss.v076.i01

De Smedt, F., Van Keer, H., & Merchie, E. (2016). Student, teacher and class-level correlates of flemish late elementary school childrens writing performance. Reading and writing, 29 (5), 833–868. https://doi.org/10.1007/s11145-015-9590-z

Efron, B., & Morris, C. (1977). Stein’s paradox in statistics. Scientific American, 236, 119–127.

Fan, X. (2001). Statistical significance and effect size in education research: Two sides of a coin. The Journal of Educational Research, 94 (5), 275–282.

https://doi.org/10.1080/00220670109598763

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper). Bayesian analysis, 1 (3), 515–534.

Graham, S. E., & Harris, K. R. (2014). Conducting high quality writing intervention research: Twelve recommendations. Journal of Writing Research, 6 (2), 89–123. https://doi.org/10.17239/jowr-2014.06.02.1

Hojat, M., & Xu, G. (2004). A visitor’s guide to effect sizes–statistical significance versus practical (clinical) importance of research findings. Advances in health sciences education, 9 (3), 241–249. https://doi.org/10.1023/B:AHSE.0000038173.00909.f6

Klugkist, I., Kato, B., & Hoijtink, H. (2005). Bayesian model selection using encompassing priors. Statistica Neerlandica, 59 (1), 57–69. https://doi.org/10.1111/j.1467-9574.2005.00279.x

Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6 (3), 299–312.

https://doi.org/10.1177/1745691611406925

Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social psychological and personality science, 8 (4), 355–362.

https://doi.org/10.1177/1948550617697177

Lynch, S. M. (2007). Introduction to applied Bayesian statistics and estimation for social scientists. Springer.

McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 73, 235–245.

https://doi.org/10.1080/00031305.2018.1527253

Morey, R. D., & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16, 406–419. https://doi.org/10.1037/a0024377

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.

Rietdijk, S., Janssen, T., van Weijen, D., van den Bergh, H., & Rijlaarsdam, G. (2017). Improving writing in primary schools through a comprehensive writing program. The Journal of Writing Research, 9 (2), 173–225. https://doi.org/10.17239/jowr-2017.09.02.04

Rijlaarsdam, G., Van den Bergh, H., & Zwarts, M. (1992). Incidentele transfer bij produktieve taalopdrachten: Een aanzet tot een baseline [Incidental transfer on productive language tasks: An initiation for a baseline.] Tijdschrift voor Onderwijsresearch, 17, 55–66.

Rijlaarsdam, G., Van den Bergh, H., Couzijn, M., Janssen, T., Braaksma, M., Tillema, M., Graham, S., Bus, A., Major, S., & Swanson, L. (2012). Writing. In K. R. Harris, S. E. Graham, T. E. Urdan, A. G. Bus, S. E. Major, & H. Swanson (Eds.), APA educational psychology handbook, Vol 3: Application to learning and teaching. (pp. 189–227). American Psychological Association. https://doi.org/https://doi.org/10.1037/13275-000

Van den Bergh, H., De Maeyer, S., Van Weijen, D., & Tillema, M. (2012). Generalizability of text quality scores. Measuring writing: Recent insights into theory, methodology and practices, 27, 23–32. https://doi.org/10.1163/9789004248489_003

Van den Bergh, H., & Eiting, M. H. (1989). A method of estimating rater reliability.

Journal of Educational Measurement, 26 (1), 29–40. https://doi.org/10.1111/j.1745-3984.1989. tb00316.x

Vandekerckhove, J., Rouder, J. N., & Kruschke, J. K. (Eds.). (2018). Editorial: Bayesian methods for advancing psychological science. Psychonomic Bulletin & Review, 25, 1–4. https://doi.org/10.3758/s13423-018-1443-8

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: an improved R for assessing convergence of MCMC (with discussion). Bayesian analysis, 16(2), 667-718. https://doi.org/10.1214/20-ba1221

Zwarts, M., Rijlaarsdam, G., Janssens, F., Wolfhagen, I., Veldhuijzen, N., & Wesdorp, H. (1990). Balans van het taalonderwijs aan het einde van de basisschool [Balance of language teaching at the end of the elementary school]. Uitkomsten van de eerste taalpeiling einde basisonderwijs. https://doi.org/10.1163/2214-8264_dutchpamphlets-kb2-kb29970

How Prior Information from National Assessments can be used when Designing Experimental Studies without a Control Group

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)