How Prior Information from National Assessments can be used when Designing Experimental Studies without a Control Group


  • Don Van den Bergh University of Amsterdam
  • Nina Vandermeulen University of Antwerp
  • Marije Lesterhuis University of Antwerp
  • Sven De Maeyer University of Antwerp
  • Elke Van Steendam KULeuven
  • Gert Rijlaarsdam University of Amsterdam
  • Huub Van den Bergh University of Utrecht



Prior information, Baseline comparison, Bayesian inference


National assessments yield a description of the proficiency level in a domain while accounting for differences between tasks. For instance, in writing assessments the level of proficiency is typically evaluated with a variety of topics and multiple tasks. This enables generalizations from specific tasks to a domain. In (quasi-)experimental research, however, writing skills are often evaluated with a single task. Yet, conclusions about the effectiveness of the treatment are formulated on the level of the domain, which is, euphemistically put, quite a stretch. Although conclusions drawn about the effect of the treatment are specific to the task administered, they are often generalized to the domain without any form of reservation. This raises the question whether we can use the results of national assessments about differences between tasks in the analyses of experimental studies. In this paper, we demonstrate how the information of a baseline data set can be used as a kind of control condition in the analysis of an experimental study.


Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-.-J.,Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D.,Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A.,Easwaran, K., Efferson, C., . . . Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10.

Blok, H. (1986). Essay rating by the comparison method. Tijdschrift voor onderwijsresearch, 11 (4), 169–176.

Bouwer, R., Koster, K., & van den Bergh, H. (2017). Leren schrijven met tekster: Een wetenschappelijk beproefde lesmethode voor het basisonderwijs [Learning to write with tekster: A scientifically proven teaching method for elementary schools]. Pedagogische studiën, 94 (4), 304–329.

Bouwer, R., Béguin, A., Sanders, T., & Van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32 (1), 83–100.

Braun, H., Kirsch, I., & Yamamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th-grade naep reading assessment. Teachers college record, 113 (11), 2309–2344.

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80 (1), 1–28.

Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological bulletin, 54 (4), 297.

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of statistical software, 76 (1).

De Smedt, F., Van Keer, H., & Merchie, E. (2016). Student, teacher and class-level correlates of flemish late elementary school childrens writing performance. Reading and writing, 29 (5), 833–868.

Efron, B., & Morris, C. (1977). Stein’s paradox in statistics. Scientific American, 236, 119–127.

Fan, X. (2001). Statistical significance and effect size in education research: Two sides of a coin. The Journal of Educational Research, 94 (5), 275–282.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper). Bayesian analysis, 1 (3), 515–534.

Graham, S. E., & Harris, K. R. (2014). Conducting high quality writing intervention research: Twelve recommendations. Journal of Writing Research, 6 (2), 89–123.

Hojat, M., & Xu, G. (2004). A visitor’s guide to effect sizes–statistical significance versus practical (clinical) importance of research findings. Advances in health sciences education, 9 (3), 241–249.

Klugkist, I., Kato, B., & Hoijtink, H. (2005). Bayesian model selection using encompassing priors. Statistica Neerlandica, 59 (1), 57–69.

Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6 (3), 299–312.

Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social psychological and personality science, 8 (4), 355–362.

Lynch, S. M. (2007). Introduction to applied Bayesian statistics and estimation for social scientists. Springer.

McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 73, 235–245.

Morey, R. D., & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16, 406–419.

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.

Rietdijk, S., Janssen, T., van Weijen, D., van den Bergh, H., & Rijlaarsdam, G. (2017). Improving writing in primary schools through a comprehensive writing program. The Journal of Writing Research, 9 (2), 173–225.

Rijlaarsdam, G., Van den Bergh, H., & Zwarts, M. (1992). Incidentele transfer bij produktieve taalopdrachten: Een aanzet tot een baseline [Incidental transfer on productive language tasks: An initiation for a baseline.] Tijdschrift voor Onderwijsresearch, 17, 55–66.

Rijlaarsdam, G., Van den Bergh, H., Couzijn, M., Janssen, T., Braaksma, M., Tillema, M., Graham, S., Bus, A., Major, S., & Swanson, L. (2012). Writing. In K. R. Harris, S. E. Graham, T. E. Urdan, A. G. Bus, S. E. Major, & H. Swanson (Eds.), APA educational psychology handbook, Vol 3: Application to learning and teaching. (pp. 189–227). American Psychological Association.

Van den Bergh, H., De Maeyer, S., Van Weijen, D., & Tillema, M. (2012). Generalizability of text quality scores. Measuring writing: Recent insights into theory, methodology and practices, 27, 23–32.

Van den Bergh, H., & Eiting, M. H. (1989). A method of estimating rater reliability.

Journal of Educational Measurement, 26 (1), 29–40. tb00316.x

Vandekerckhove, J., Rouder, J. N., & Kruschke, J. K. (Eds.). (2018). Editorial: Bayesian methods for advancing psychological science. Psychonomic Bulletin & Review, 25, 1–4.

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: an improved R for assessing convergence of MCMC (with discussion). Bayesian analysis, 16(2), 667-718.

Zwarts, M., Rijlaarsdam, G., Janssens, F., Wolfhagen, I., Veldhuijzen, N., & Wesdorp, H. (1990). Balans van het taalonderwijs aan het einde van de basisschool [Balance of language teaching at the end of the elementary school]. Uitkomsten van de eerste taalpeiling einde basisonderwijs.



How to Cite

Van den Bergh, D., Vandermeulen, N., Lesterhuis, M., De Maeyer, S., Van Steendam, E., Rijlaarsdam, G., & Van den Bergh, H. (2023). How Prior Information from National Assessments can be used when Designing Experimental Studies without a Control Group. Journal of Writing Research, 14(3), 447–469.




Most read articles by the same author(s)

1 2 3 > >>