“Keeleteadlane raputab lausemalle korpusest loendisse”: lausemallide automaatse tuvastamise esimesed sammud
Abstract
https://doi.org/10.5128/ERYa21.16
Artiklis kirjeldatakse arvutilingvistilist katset, mille eesmärk on luua esmane töövoog eesti keele lausemallide korpuspõhiseks automaattuvastamiseks. Materjalina on kasutatud märgendussüsteemi Universal Dependencies alusel automaatselt ja käsitsi annoteeritud korpusi. Testandmestikuna on kasutatud 28 liigutamisverbi lausemalle. Pakutud meetod moodustab verbist ja selle otsestest alluvatest paarid, eemaldab need paarid, mis jäävad alla 5% sageduslävendi ning kombineerib allesjäänutest lausemallid. Meetod osutus efektiivseks verb-alluva paaride tuvastamisel, kuid terviklausemallide tuvastamise kvaliteet oli halvem. Artiklis analüüsitakse põhjalikult tuvastusvigade tüüpe ning põhjusi, pakutakse lahendusi iga konkreetse veapõhjuse kõrvaldamiseks ja määratakse edasiarenduse suundi. Meetod osutus kasulikuks ka verbiüleste ja seni kirjeldamata mallide tuvastamiseks.
***
"Extracting valency patterns from a syntactically annotated corpus"
The article presents a computational linguistics experiment focused on automating the detection of Estonian valency patterns, specifically using caused-motion verbs as a case study. The goal is to develop an initial workflow for corpus-based automatic detection of valency patterns in Estonian, leveraging the Universal Dependencies annotation system (de Marneffe et al. 2021). A dataset comprising 28 motion verbs was employed to test the proposed method, which pairs verbs with direct dependents and filters statistically significant results to form valency patterns. The method showed high performance in identifying verb-dependent pairs (81.2% recall, 69.9% precision) but was less effective at detecting complete valency patterns (33.7% recall, 33% precision). The analysis addresses various detection errors, offering targeted solutions to improve performance. Key obstacles included the misidentification of adverbial modifiers as arguments, the failure to detect certain oblique cases (notably the illative), and the exclusion of some arguments. Despite these challenges, the method succeeded in identifying previously undocumented valency patterns, such as a unique structure for the verb liigutama (‘to move/to touch’) with an emotional context and another for pistma (‘to put’) denoting specific syntactic roles (Goal and Recipient). The author’ evaluation underscores methodological limitations, such as filtering biases and insufficient corpus size, which impacted precision and recall. Proposed improvements include adjusting frequency thresholds, refining filters for phraseological verbs, and enhancing argument-adjunct distinctions. These advancements aim to refine automatic valency pattern detection and enhance grammatical information presentation within Estonian lexicographical resources. The study contributes significant insights for the computational processing of Estonian syntax and suggests further directions for the efficient representation of valency patterns.
Keywords
Full Text:
PDFDOI: http://dx.doi.org/10.5128/ERYa21.16
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Kertu Saul, Kadri Muischnek, Jelena Kallas

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ISSN 1736-2563 (print)
ISSN 2228-0677 (online)
DOI 10.5128/ERYa.1736-2563