Phase to phase: Developing an automated procedure to identify and visualize phases in writing sessions using keystroke data

Authors

  • Rianne Conijn Eindhoven University of Technology
  • Alessandra Rosetti Vrije Universiteit Brussel
  • Nina Vandermeulen University of Antwerp
  • Luuk Van Waes University of Antwerp

DOI:

https://doi.org/10.17239/jowr-2025.17.02.06

Keywords:

Keystroke logging, writing process, change point detection, revision phase , drafting

Abstract

Understanding the temporal organization of writing is key to studying writing processes. Existing methods to segment writing into phases often rely on arbitrary rules, extensive manual annotation, or focus on numerous transitions. This study aimed to develop an automated segmentation method to detect distinctive transition in the dominant writing process, particularly the transition from first draft to revision. For this, keystroke data (source-based L1 writing (N = 80) and text simplification in L2 (N = 88)) were manually annotated. The BEAST algorithm was applied for Bayesian change point detection, based on five characteristics derived from the annotation criteria: (1) percentage of the final text written so far, (2) distance between typed and remaining characters, (3) relative cursor position, (4) source use, and (5) pause timings. The first three features proved most effective in identifying change points. A rule-based approach was further applied to select one final change point, which resulted in mediocre accuracy ranging from 31% exact agreement to 49% agreement within 60 seconds. To conclude, the BEAST algorithm is useful in detecting a variety of change points in writing processes, yet connecting them to meaningful phases is still quite complex.

References

Baaijen, V. M., Galbraith, D., & de Glopper, K. (2012). Keystroke Analysis: Reflections on Procedures and Measures. Written Communication, 29(3), 246–277. https://doi.org/10.1177/0741088312451108

Beard, R., Riley, J., & Myhill, D. (2009). The SAGE Handbook of Writing Development. 1–616. https://doi.org/10.4135/9780857021069

Bowen, N., & Van Waes, L. (2020). Exploring Revisions in Academic Text: Closing the Gap Between Process and Product Approaches in Digital Writing. Written Communication, 37(3), 322–364. https://doi.org/10.1177/0741088320916508

Buschenhenke, F., Conijn, R., & Van Waes, L. (2023). Measuring non-linearity of multi-session writing processes. Reading and Writing, 37(2), 511–537. https://doi.org/10.1007/S11145-023-10449-9/

Cislaru, G., & Olive, T. (2018). Le processus de textualisation: analyse des unités linguistiques de performance écrite [The process of textualization: analysis of linguistic units in written performance]. De Boeck Supérieur. https://doi.org/10.3917/dbu.cisla.2018.01

Conijn, R., Dux Speltz, E., & Chukharev-Hudilainen, E. (2024). Automated extraction of revision events from keystroke data. Reading and Writing, 37, 483–508.

https://doi.org/10.1007/S11145-021-10222-W/

Conijn, R., Dux Speltz, E., Zaanen, M. van, Van Waes, L., & Chukharev-Hudilainen, E. (2021). A Product- and Process-Oriented Tagset for Revisions in Writing. Written Communication, 39(1), 97–128. https://doi.org/10.1177/07410883211052104

Crossley, S. A., Tian, Y., & Wan, Q. (2022). Argumentation features and essay quality: Exploring relationships and incidence counts. Journal of Writing Research, 14(1), 1–34. https://doi.org/10.17239/JOWR-2022.14.01.01

Daxenberger, J., & Gurevych, I. (2013). Automatically classifying edit categories in Wikipedia revisions. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 578–589. https://doi.org/10.18653/v1/D13-1055

De Lario, J. R., Manchón, R. M., & Murphy, L. (2006). Generating Text in Native and Foreign Language Writing: A Temporal Analysis of ProblemSolving Formulation Processes. The Modern Language Journal, 90(1), 100–114. https://doi.org/10.1111/J.1540-4781.2006.00387.X

Feltgen, Q., & Cislaru, G. (2025). The fluency vs. disfluency dichotomy in writing processes as reflected in the structure of the inter-key intervals empirical distribution. Discourse Processes, 1(62), 16–2439. https://doi.org/10.1080/0163853X.2024.2417330

Flower, L., & Hayes, J. R. (1981). A Cognitive Process Theory of Writing. College Composition and Communication, 32(4), 27–87. https://doi.org/10.58680/ccc198115885

Guo, H., Zhang, M., Deane, P., & Bennett, R. E. (2019). Writing Process Differences in Subgroups Reflected in Keystroke Logs. Journal of Educational and Behavioral Statistics, 44(5), 571–596. https://doi.org/10.3102/1076998619856590

Hall, S., Baaijen, V. M., & Galbraith, D. (2022). Constructing theoretically informed measures of pause duration in experimentally manipulated writing. Reading and Writing, 37(2), 329–357. https://doi.org/10.1007/S11145-022-10284-4/

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer New York, NY.

https://doi.org/https://doi.org/10.1007/978-0-387-21606-5

Hu, T., Myers Toman, E., Chen, G., Shao, G., Zhou, Y., Li, Y., Zhao, K., & Feng, Y. (2021). Mapping fine-scale human disturbances in a working landscape with Landsat time series on Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 176, 250–261. https://doi.org/10.1016/J.ISPRSJPRS.2021.04.008

Huang, Y., & Zhang, L. J. (2022). Facilitating L2 writers’ metacognitive strategy use in argumentative writing using a process-genre approach. Frontiers in Psychology, 13, 1036831. https://doi.org/10.3389/FPSYG.2022.1036831/BIBTEX

Kruse, M. (2024). Problem-solving activity during the foreign language writing process: A proposal for categorisation and visualisation of source use and a new take on fluency in. Journal of Writing Research, 1(16), 129–161. https://doi.org/10.17239/jowr-2024.16.01.05

Leijten, M., & Van Waes, L. (2013). Keystroke Logging in Writing Research: Using Inputlog to Analyze and Visualize Writing Processes. Written Communication, 30(3), 358–392. https://doi.org/10.1177/0741088313491692

Leijten, M., Van Waes, L., Schrijver, I., Bernolet, S., & Vangehuchten, L. (2019). Mapping master’s students’ use of external sources in source-based writing in L1 and L2. Studies in Second Language Acquisition, 41(3), 555–582. https://doi.org/10.1017/S0272263119000251

Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2014). Writing in the workplace: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3), 285–337. https://doi.org/10.17239/jowr-2014.05.03.3

Li, J., Li, Z. L., Wu, H., & You, N. (2022). Trend, seasonality, and abrupt change detection method for land surface temperature time-series analysis: Evaluation and improvement. Remote Sensing of Environment, 280, 113222. https://doi.org/10.1016/J.RSE.2022.113222

Li, S., & Yu, H. (2024). Effects of topic familiarity on L2 writing processes and behaviors. International Journal of Applied Linguistics, 34(1), 348–366.

https://doi.org/10.1111/IJAL.12497

Li, T., Fan, Y., Srivastava, N., Zeng, Z., Li, X., Khosravi, H., Lucia, S., Yi-Shan Tsai, A., Swiecki, Z., Gašević, D., & Tsai, Y.-S. (2024). Analytics of Planning Behaviours in Self-Regulated Learning: Links with Strategy Use and Prior Knowledge. The 14th Learning Analytics and Knowledge Conference (LAK ’24), 438–449. https://doi.org/10.1145/3636555.3636900

Lindgren, E., & Sullivan, K. (2019). Observing Writing: Insights from Keystroke Logging and Handwriting. Brill. https://doi.org/10.1163/9789004392526

Lindgren, E., & Sullivan, K. P. (2006). Writing and the analysis of revision: An overview. In K. P. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing: methods and applications (Studies in Writing) (pp. 31–40). Elsevier.

Lindgren, E., Westum, A., Outakoski, H., & Sullivan, K. (2019). Revising at the leading edge: shaping ideas or clearing up noise. In E. Lindgren & K. Sullivan (Eds.), Studies in writing: Vol. 38. Observing writing (pp. 346–365). Brill. https://doi.org/10.1163/9789004392526_017

Lo Sardo, D. R., Gravino, P., Cuskley, C., & Loreto, V. (2023). Exploitation and exploration in text evolution. Quantifying planning and translation flows during writing. PLOS ONE, 18(3), e0283628. https://doi.org/10.1371/JOURNAL.PONE.0283628

Mahlow, C. (2015). A definition of “version” for text production data and natural language document drafts. ACM International Conference Proceeding Series, 27–32. https://doi.org/10.1145/2881631.2881638

Mahlow, C., Ulasik, M. A., & Tuggener, D. (2022). Extraction of transforming sequences and sentence histories from writing process data: a first step towards linguistic modeling of writing. Reading and Writing, 37(2), 443–482. https://doi.org/10.1007/S11145-021-10234-6/FIGURES/14

Medimorec, S., & Risko, E. F. (2017). Pauses in written composition: on the importance of where writers pause. Reading and Writing, 30(6), 1267–1285. https://doi.org/10.1007/s11145-017-9723-7

Miller, B., McCardle, P., & Connelly, V. (2018). Writing Development in Struggling Learners: Understanding the Needs of Writers across the Lifecourse. Brill.

https://doi.org/https://doi.org/10.1163/9789004346369

Roeser, J., Conijn, R., Chukharev, E., Ofstad, G. H., & Torrance, M. (2025). Typing in Tandem: Language Planning in Multisentence Text Production Is Fundamentally Parallel. Journal of Experimental Psychology: General. https://doi.org/10.1037/XGE0001759

Rossetti, A., & Van Waes, L. (2022a). Dataset - Text simplification in second language: process and product data. Zenodo. https://doi.org/10.5281/ZENODO.6720290

Rossetti, A., & Van Waes, L. (2022b). It’s not just a phase: Investigating text simplification in a second language from a process and product perspective. Frontiers in Artificial Intelligence, 5, 983008. https://doi.org/10.3389/FRAI.2022.983008

Sala-Bubaré, A., Castelló, M., & Rijlaarsdam, G. (2021). Writing processes as situated regulation processes: A context-based approach to doctoral writing. Journal of Writing Research, 13(1), 1–30. https://doi.org/10.17239/jowr-2021.13.01.01

Saqr, M., Peeters, W., & Viberg, O. (2021). The relational, co-temporal, contemporaneous, and longitudinal dynamics of self-regulation for academic writing. Research and Practice in Technology Enhanced Learning, 16(1), 1–22. https://doi.org/10.1186/S41039-021-00175-7

Tarchi, C., Villalón, R., Vandermeulen, N., Casado-Ledesma, L., & Fallaci, A. P. (2023). Recursivity in source-based writing: a process analysis. Reading and Writing, 37, 2571–2593. https://doi.org/10.1007/S11145-023-10482-8

Tian, Y., Kim, M., & Crossley, S. (2024). Making sense of L2 written argumentation with keystroke logging. Journal of Writing Research, 15(3), 435–461.

https://doi.org/10.17239/JOWR-2024.15.03.01

Torrance, M., & Conijn, R. (2024). Methods for studying the writing time-course. Reading and Writing, 37(2), 239–251. https://doi.org/10.1007/S11145-023-10490-8/METRICS

Van den Bergh, H., & Rijlaarsdam, G. (2001). Changes in Cognitive Activities During the Writing Process and Relationships with Text Quality. Educational Psychology, 21(4), 373–385. https://doi.org/10.1080/01443410120090777

Van Hell, J. G., Verhoeven, L., & Van Beijsterveldt, L. M. (2008). Pause time patterns in writing narrative and expository texts by children and adults. Discourse Processes, 45(4–5), 406–427. https://doi.org/10.1080/01638530802070080

Van Waes, L., & Leijten, M. (2015). Fluency in Writing: A Multidimensional Perspective on Writing Fluency Applied to L1 and L2. Computers and Composition, 38, 79–95. https://doi.org/10.1016/j.compcom.2015.09.012

Van Waes, L., & Schellens, P. J. (2003). Writing profiles: The effect of the writing mode on pausing and revision patterns of experienced writers. Journal of Pragmatics, 35(6), 829–853. https://doi.org/10.1016/S0378-2166(02)00121-2

Van Waes, L., van Weijen, D., & Leijten, M. (2014). Learning to write in an online writing center: The effect of learning styles on the writing process. Computers & Education, 73, 60–71. https://doi.org/10.1016/j.compedu.2013.12.009

Vandermeulen, N., De Maeyer, S., Van Steendam, E., Lesterhuis, M., Van den Bergh, H., & Rijlaarsdam, G. (2020). Mapping synthesis writing in various levels of Dutch upper-secondary education: A national baseline study on text quality, writing process and students’ perspectives on writing. Pedagogische Studiën, 97(3), 187–236. https://psycnet.apa.org/record/2021-69640-001

Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting writing process feedback in the classroom: Using keystroke logging data to reflect on writing processes. Journal of Writing Research, 12(1), 109–140. https://doi.org/10.17239/JOWR-2020.12.01.05

Vandermeulen, N., Van Steendam, E., De Maeyer, S., & Rijlaarsdam, G. (2022). Writing Process Feedback Based on Keystroke Logging and Comparison With Exemplars: Effects on the Quality and Process of Synthesis Texts. Written Communication, 1(40), 90–144. https://doi.org/10.1177/07410883221127998

Vandermeulen, N., Van Steendam, E., & Rijlaarsdam, G. (2020). DATASET - Baseline data LIFT Synthesis Writing project. Zenodo. https://doi.org/10.5281/ZENODO.3893538

Xu, C. (2018). Understanding online revisions in L2 writing: A computer keystroke-log perspective. System, 78, 104–114. https://doi.org/10.1016/j.system.2018.08.007

Xu, C., & Xia, J. (2021). Scaffolding process knowledge in L2 writing development: insights from computer keystroke log and process graph. Computer Assisted Language Learning, 34(4), 583–608. https://doi.org/10.1080/09588221.2019.1632901

Zhang, M., Hao, J., Li, C., & Deane, P. (2016). Classification of Writing Patterns Using Keystroke Logs. In L. A. van der Ark, D. M. Bolt, W.-C. Wang, J. A. Douglas, & M. Wiberg (Eds.), Quantitative Psychology Research: The 80th Annual Meeting of the Psychometric Society, Beijing, 2015 (pp. 299–314). Springer. https://doi.org/10.1007/978-3-319-38759-8_23

Zhao, K., Valle, D., Popescu, S., Zhang, X., & Mallick, B. (2013). Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection. Remote Sensing of Environment, 132, 102–119. https://doi.org/10.1016/J.RSE.2012.12.026

Zhao, K., Wulder, M. A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X., & Brown, M. (2019). Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing of Environment, 232, 111181.https://doi.org/10.1016/J.RSE.2019.04.034S

Published

2025-10-07

Issue

Section

Articles

How to Cite

Conijn, R., Rosetti, A. ., Vandermeulen, N., & Van Waes, L. (2025). Phase to phase: Developing an automated procedure to identify and visualize phases in writing sessions using keystroke data. Journal of Writing Research, 17(2), 339-369. https://doi.org/10.17239/jowr-2025.17.02.06

Similar Articles

41-50 of 261

You may also start an advanced similarity search for this article.