Sõnarõhu märkimine sõnastikes ja selle mõju kõnesünteesile

Liisi Piits, Heete Sahkai, Meelis Mihkla, Indrek Hein, Hille Pajupuu, Liis Ermus

Abstract


https://doi.org/10.5128/ERYa22.07

Artiklis käsitletakse eesti keele sõnarõhu märkimise probleeme sõnastikes ning märgenduse mõju hääldusmärke arvestava kõnesünteesi kvaliteedile. Eesti Keele Instituudi tänapäeva sõnastikes kasutatav häälduse märkimise süsteem ei võimalda eristada esi- ja järgsilbirõhuga võõrsõnu, milles on teisest silbist kaugemal järgsilbis III välte märk või pikk vokaal. Vaikimisi langeb pearõhk siis järgsilbile, kuigi tegelikult võib sellistest sõnadest ligi neljandik häälduda pearõhuga esisilbil. Uurimuses pakume välja märgendussüsteemi, mis võimaldaks selliste sõnade pearõhku sõnaraamatutes adekvaatselt kirjeldada ja kõnesüntesaatoriga helindada, ning treenisime hääldusmärke arvestava üksiks.nade süntesaatori uue versiooni, mis võimaldab sünteesida seda tüüpi sõnu soovitud rõhuga.

***

"Lexicographic marking of Estonian primary word stress and its influence on text-to-speech synthesis" 

Estonian has traditionally been described as a language with fixed primary word stress on the first syllable. However, there is a relatively large group of words that are described as having non-initial or variable primary stress but have not been studied in detail. These are loanwords that are described as containing a third quantity degree or a long vowel in a non-initial syllable.

The pronunciation marking system currently used in the dictionaries of the Institute of the Estonian Language does not allow distinguishing between words with initial, non-initial and variable primary stress within this group. This gap in the description of Estonian word stress creates problems for language learners as well as for the Estonian isolated-word text-to-speech synthesizer, which serves pedagogical and speech therapy purposes. The synthesizer was trained on sound files that had been recorded for the purpose of exemplifying the pronunciation of the headwords of the Combined Dictionary of the Institute of the Estonian Language, and the corresponding text files which used the dictionary’s pronunciation marking system. Due to the above-mentioned gap in the dictionary’s stress marking system, the training data contained mismatches between the text and sound files, which became evident in the synthesizer’s output. In this study, we proposed a new stress marking system that would allow this type of word stress to be described adequately in dictionaries and synthesized correctly by the isolated-word speech synthesizer. Using the new stress marking system, we trained a new version of the synthesizer which can generate this type of words with the desired stress placement.

In order to implement the new labeling system in dictionaries, the stress patterns of this particular group of words need to be studied in more detail.


Keywords


pearõhk; kõnesüntesaator; sõnastikud; hääldusmärgid; eesti keel; primary stress; speech synthesizer; dictionaries; stress marking symbols; Estonian

Full Text:

PDF


DOI: http://dx.doi.org/10.5128/ERYa22.07

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Liisi Piits, Heete Sahkai, Meelis Mihkla, Indrek Hein, Hille Pajupuu, Liis Ermus

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

ISSN 1736-2563 (print)
ISSN 2228-0677 (online)
DOI 10.5128/ERYa.1736-2563