Compiling the Dictionary of Word Associations in Estonian: From scratch to the database

Ene Vainik

Abstract


The present paper describes the project titled “The Dictionary of Word Associations in Estonian” undertaken by the author at the Institute of the Estonian Language. The general aim of the Dictionary is to provide insights into Estonians’ common-sense mind. It is meant to be a tool of self-reflection for Estonian native speakers and a guide for the foreigners who are eager enough to make themselves familiar with the Estonian cultural patterns of thought. The Dictionary will be published online. The number of keywords was initially limited to approximately 800. Specific emphasis is given to the stage of data collection by implementing the principles of citizen science.

***

Eesti keele assotsiatsioonisõnastiku loomine: tühjast kohast andmebaasini

Artiklis kirjeldatakse “Eesti keele assotsiatsioonisõnastiku” loomise esimesi etappe kavandamisest kuni algandameid sisaldava andmebaasini. Esmalt antakse ülevaade põhimõistetest (assotsiatsioon, sõna-assotsiatsioon, assotsiatsioonisõnastik vs.assotsiatsiooninormid) ja kirjanduses kasutatavast terminoloogiast. Järgneb ülevaade sõna-assotsiatsioonide uurimise ajaloost ja tuuakse välja sõnastikuprojekti teoreetilised eeldused: a) sõnu iseloomustavad nende seosed teiste sõnadega; b) nende seoste väljatoomine on oluline leksikograafiline ülesanne; c) assotsiatsioone saab tuvastada üksnes inimeste testimise teel.

Järgnevas osas kirjeldatakse tehtud töid ja põhjendatakse praktilisi valikuid. Lahti seletatakse märksõnastiku ja testide koostamise põhimõtted, kodanikuteaduse kampaania käivitamise vajadus inimeste värbamiseks ning selle kulg. Artikli viimases osas põhjendatakse valikut andmete talletamise osas (relatsiooniline baas), kirjeldatakse andmebaasi struktuuri ning andmete impordi protseduure. Tabel 2 annab arvulise ülevaate sõnastiku aluseks olevast andmebaasist.

Artikli lõpus arutletakse tehtud valikute eeliste ja nõrkuste üle. Andmete kogumist kodanikuteaduse raames loeti õnnestunud ettevõtmiseks, seda nii järjest kasvava osalemisaktiivsuse kui ka sooritamisedukuse mõttes (vt tabel 1). Kuna kodanikuteaduse partnerid kalduvad olema naissoost ja kõrgema haridusega, siis kontrolliti nende tegurite mõju statistilise analüüsiga. Tulemused näitasid, et sugu, iga ja amet vastuste stereotüüpsust ei mõjutanud, küll aga kõrgem haridustase. Seega on kogutud andmestikus tõenäoliselt üldpopulatsioonist stereotüüpsemad seosed, mida autor luges aga pigem eeliseks, kuna sõnastiku eesmärk ongi just koguda tüüpilisemaid seoseid ja ainukordsed vastused jäävad andmete suure mahu tõttu igal juhul sõnastikust välja. Kõik vastused koos andmetega vastajate soo, ea, hariduse jm kohta jäävad andmebaasi alles tulevasteks uuringuteks.


Keywords


word association, mental lexicon, lexicography, e-dictionary, citizen science, crowdsourcing, Estonian

Full Text:

PDF

References


Aitchison, Jean 2012. Words in the Mind: An Introduction to the Mental Lexicon. 4rd ed. Wiley-Blackwell.

Apresjan, Juri 2000. Systematic Lexicography. Oxford University Press.

Atkins, B. T. Sue; Rundell, Michael 2008. The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press.

Benjamin, Martin 2015. Crowdsourcing microdata for cost-effective and reliable lexicography. – Lan Li, Jamie Mckeown, Liming Liu (Eds.), Proceedings of AsiaLex 2015, Hong Kong. Hong Kong Polytechnic University, 213–221.

Buk, Solomyia 2009. Lexical base as a compressed language model of the world (on material from the Ukrainian language). – Psychology of Language and Communication, 13 (2), 35–44. https://doi.org/10.2478/v10057-009-0008-3

Church, Kenneth W.; Hanks, Patrick 1989. Word association norms, mutual information, and lexicography. – Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics. Vancouver, 76–83. https://doi.org/10.3115/981623.981633

Čibej, Jaka; Fišer, Darja; Kosem, Iztok 2015. The role of crowdsourcing in lexicography. – I. Kosem, M. Jakubíček, J. Kallas, S. Krek (Eds.), Electronic Lexicography in the 21st Century: Linking Lexical Data in the Digital Age. Proceedings of the eLex 2015 conference, 11–13 August 2015, Herstmonceux Castle, United Kingdom. Ljubljana, Brighton: Trojina, Institute for Applied Slovene Studies, Lexical Computing Ltd., 70–83. https://elex.link/elex2015/conference-proceedings/ (10.10.2017).

Cohn, Jeffrey P. 2008. Citizen science: Can volunteers do real research? – BioScience, 58 (3), 192–197. https://doi.org/10.1641/B580303

Cruse, David Alan 2000. Meaning in Language. An Introduction to Semantics and Pragmatics. New York: Oxford University Press.

De Deyne, Simon; Navarro, Daniel J.; Storms, Gert 2013. Better explanations of lexical and semantic cognition using networks derived from continued rather than single-word associations. – Behavior Research Methods, 45 (2), 480–498. https://doi.org/10.3758/s13428-012-0260-7

Digman, John M. 1990. Personality structure: Emergence of the five-factor model. – Annual Review of Psychology, 41, 417–440. https://doi.org/10.1146/annurev.ps.41.020190.002221

Fellbaum, Christiane 1999. Wordnet. An Electronic Lexical Database. London: The MIT Press.

Fillmore, Charles 1985. Frames and the semantics of understanding. – Quaderni di Semantica, 6 (2), 222–254.

Fitzpatrick, Tess; Playfoot, David; Wray, Alison; Wright, Margareth J. 2015 (2013). Establishing the reliability of word association data for investigating individual and group differences. – Applied Linguistics, 36 (1), 23–50. https://doi.org/10.1093/applin/amt020

Galton, Francis 1879. Psychometric experiments. – Brain, 2 (2), 149–162. https://doi.org/10.1093/brain/2.2.149

Jung, C. Gustav 1910. The association method. – The American Journal of Psychology, 21 (2), 219–269. https://doi.org/10.2307/1413002

Kallas, Jelena; Tuulik, Maria 2011. Eesti keele põhisõnavara sõnastik: ajalooline kontekst ja koostamispõhimõtted [‘The Basic Dictionary of Estonian: The historical context and the principles of compilation’]. – Eesti Rakenduslingvistika Ühingu aastaraamat, 7, 59–76. https://doi.org/10.5128/ERYa7.04

Kent, Grace H.; Rosanoff, Aaron J. 1910. A study of association in insanity. – American Journal of Insanity, 67 (1–2), 37–96.

Kiss, George R.; Armstrong, Christine; Milroy, Robert; Piper, James 1973. An associative thesaurus of English and its computer analysis. – A. J. Aitken, R. W. Bailey (Eds.), The Computer and Literary Studies. Edinburgh: University Press, 153–165.

Klein, Stephen 2012. Learning: Principles and Applications. 6th ed. SAGE Publications.

Krasnõh, V. 2001. Osnovõ psihholingvistiki i teorii kommunikatsii. Moskva: Gnosis.

Postman, Leo; Keppel, Geoffrey (Eds.) 1970. Norms of Word Association. Elsevier.

Männik, Anna-Liisa 2016. Assotsiatsioonid eesti ja inglise keeles. Käsikirjaline magistritöö. Tallinna Ülikooli humanitaarteaduste instituut.

Meara, Paul 1982. Word associations in a foreign language: A report on the Birkbeck Vocabulary Project. – Nottingham Linguistic Circular, 11 (2), 29–37.

Meara, Paul 2009. Connected Words. John Benjamins. https://doi.org/10.1075/lllt.24

Morkovkin, Valery V. 1970. Ideographic Dictionaries. Moscow, USSR.

Nelson, Douglas L.; McEvoy, Cathy L.; Schreiber, Thomas A. 2004. The University of South Florida word association, rhyme, and word fragment norms. – Behavior Research Methods, Instruments, & Computers, 36 (3), 402–407. https://doi.org/10.3758/BF03195588

Nelson, Douglas L.; McEvoy, Cathy L.; Dennis, Simon 2000. What is free association and what does it measure? – Memory & Cognition, 28 (6), 887–899. https://doi.org/10.3758/BF03209337

Orav, Heili; Vider, Kadri 2005. Estonian wordnet and lexicography. – H. Gottlieb, J. E. Mogensen, A. Zettersten (Eds.), Symposium on Lexicography XI. Proceedings of the Eleventh International Symposium on Lexicography, May 2–4, 2002 at the University of Copenhagen. Tübingen: Max Niemeyer, 549–555.

Pavlenko, Anetta (Ed.) 2009. The Bilingual Mental Lexicon: Interdisciplinary Approaches. Bristol, UK, Buffalo, NY: Multilingual Matters.

Peppard, Jason 2007. Exploring the Relationship between Word-Association and Learners’ Lexical Development. An assignment for Master of Arts in Applied Linguistics. Centre for English Language Studies, Department of English, University of Birmingham, United Kingdom. https://www.birmingham.ac.uk/Documents/college-artslaw/cels/essays/lexis/PeppardMod2.pdf (23.3.2018).

RAS = Karaulov, J. N.; Tšerkassova, G. A.; Ufimtseva, N. V.; Sorokin, J. A.; Tarasov, E. F. 2002. Russkii assotsiativnõi slovar. Moskva: Astrel.

Rosenzweig, Mark R. 1961. Comparisons among word-association responses in English, French, German, and Italian. – The American Journal of Psychology, 74 (3), 347–360. https://doi.org/10.2307/1419741

Stevens, Anthony 1994. Jung: A Very Short Introduction. Oxford: Oxford University Press.

Toim, Kalju 1980. Estonian word association norms for the Kent-Rosanoff test. – Problems of cognitive psychology. (Труды по психологии. Проблемы когнитивной психологии). Tartu Riikliku Ülikooli Toimetised 522. Tartu, 60–76.

Vainik 2017. The word associations reveal: What does it take to be an Estonian? – Liisi Laineste (Ed.), Book of abstracts of the international conference „Across Borders VII. Cultures in dialogue”. Tartu: ELM Scholarly Press, 104. http://www.folklore.ee/rl/fo/konve/AcrossBorders/2017/borders2017web_abstracts.pdf (10.10.2017).

Vainik, Ene 2012. Kuidas määrata eesti keele sõnavara tundetoone? [‘Detecting emotional valencies for the Estonian vocabulary’] – Eesti Rakenduslingvistika Ühingu aastaraamat, 8, 257–274. https://doi.org/10.5128/ERYa8.17




DOI: http://dx.doi.org/10.5128/ERYa14.14

Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 Ene Vainik

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

ISSN 1736-2563 (print)
ISSN 2228-0677 (online)
DOI 10.5128/ERYa.1736-2563