A NLP-based stylometric approach for tracking the evolution of L1 written language competence


  • Alessio Miaschi
  • Dominique Brunato
  • Felice Dell'Orletta




diachronic evolution of written language competence, natural language processing, Italian learner corpus, stylometry, learners' errors, machine learning


In this study we present a Natural Language Processing (NLP)-based stylometric approach for tracking the evolution of written language competence in Italian L1 learners. The approach relies on a wide set of linguistically motivated features capturing stylistic aspects of a text, which were extracted from students’ essays contained in CItA (Corpus Italiano di Apprendenti L1), the first longitudinal corpus of texts written by Italian L1 learners enrolled in the first and second year of lower secondary school. We address the problem of modeling written language development as a supervised classification task consisting in predicting the chronological order of essays written by the same student at different temporal spans. The promising results obtained in several classification scenarios allow us to conclude that it is possible to automatically model the highly relevant changes affecting written language evolution across time, as well as identifying which features are more predictive of this process. In the last part of the article, we focus the attention on the possible influence of background variables on language learning and we present preliminary results of a pilot study aiming at understanding how the observed developmental patterns are affected by information related to the school environment of the student.



How to Cite

Miaschi, A., Brunato, D., & Dell’Orletta, F. (2021). A NLP-based stylometric approach for tracking the evolution of L1 written language competence . Journal of Writing Research, 13(1), 71–105. https://doi.org/10.17239/jowr-2021.13.01.03