A2–C1-taseme eksamitekstide käändsõnakasutus

Kais Allkivi-Metsoja


Keeleoskustaseme automaatseks hindamiseks on tarvis kindlaks teha mõõdetavad tunnused, mis võimaldavad eri tasemete keelekasutust usaldusväärselt määrata. Siinses artiklis on tähelepanu keskmes eesti keele A2–C1-taseme eksamitekstide käändsõnatunnused. Analüüsitakse käändsõnavormide sagedust ja varieerumist nii summaarselt kui ka eri käändsõnaliikide võrdluses. Tuuakse välja need tunnused, mis on korrelatsioonis keeleoskustasemega ja muutuvad kasvavas või kahanevas suunas, piiritledes järjestikuseid tasemeid. Läbivalt eristavate tunnustena tulevad esile tekstis leidunud käänete arv ning ainsuse ja mitmuse kasutus. Erinevused käändsõnade vormides ilmnevad eelkõige B1–C1-tasemel, olulisemad muutused on seotud saava, nimetava ja omastava käändega.


Use of nominals in Estonian A2–C1-level exam writings

In this study, natural language processing (NLP) is used to analyse nominal inflection in Estonian proficiency examination writings representing the CEFR levels A2–C1. The aim is to define the nominal features that distinguish learner language production at each proficiency level. For this purpose, the frequency and variation of inflectional forms are measured in two ways: a) for the nominal parts of speech (PoSs) in total, i.e., considering the use of nouns, pronouns, adjectives and numerals; b) for nouns, pronouns and adjectives individually (numerals were discarded due to low frequency).

The analysed corpus contains 480 texts, 120 for each level. Nominal features based on the grammatical categories of number, case and degree of comparison are extracted from the morphologically tagged and manually corrected output of the Stanza NLP toolkit. Relevant features are selected according to the following criteria: they correlate with the proficiency level, their values change monotonically, and there are statistically significant differences between (some) adjacent levels.

A2–C1-level texts are consistently distinguished by the number of cases used in the text as well as the ratio of singular and plural forms. The changes in the frequency of nominal inflectional forms mainly occur from level B1 to C1. The use of translative, nominative and genitive case are more strongly related to the text level, while partitive, inessive, elative and comitative case and comparative adjectives also differentiate some levels.

Furthermore, the study indicates that it is beneficial to observe inflection-based features separately for each PoS when analysing L2 development. Firstly, the PoSspecific frequencies of some grammatical categories increase at different stages of proficiency. Secondly, changes may emerge for certain PoSs only.

The identified criterial features could be used for automated assessment of Estonian L2 writings alongside lexical, syntactic and other linguistic features. The results can also help to specify the CEFR level descriptions for Estonian.


keeletöötlus, morfoloogia, keeleoskustasemed, kirjalik õppijakeel, eesti keel; natural language processing, morphology, CEFR levels, written learner language, Estonian

Full Text:


DOI: http://dx.doi.org/10.5128/533


  • There are currently no refbacks.

Copyright (c) 2022 Kais Allkivi-Metsoja

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

ISSN 1736-2563 (print)
ISSN 2228-0677 (online)
DOI 10.5128/ERYa.1736-2563