Please use this identifier to cite or link to this item: https://hdl.handle.net/10321/5570
DC FieldValueLanguage
dc.contributor.advisorOjo, Sunday O.-
dc.contributor.advisorOlugbara, Oludayo O.-
dc.contributor.authorMoape, Tebatso Gorginaen_US
dc.date.accessioned2024-10-07T12:55:12Z-
dc.date.available2024-10-07T12:55:12Z-
dc.date.issued2024-
dc.identifier.urihttps://hdl.handle.net/10321/5570-
dc.descriptionSubmitted in Fulfilment of the requirements of the Degree of Doctor of Philosophy in Information Technology, Durban University of Technology, Durban, South Africa, 2024.en_US
dc.description.abstractThere are several challenges that hinder the development of Setswana-to-English machine translation systems. A key obstacle is the absence of machine-readable knowledge resources. This has prompted the use of the only accessible data, which originates from the government domain. While training machine-translation systems using government-domain data can offer specialized language knowledge, such training introduces obstacles such as limited vocabulary, style variation, bias, and domain specificity. Furthermore, it is noted in the literature that the ongoing problem of polysemy in a machine-translation system reduces the overall accuracy. Polysemy is a linguistic phenomenon in which a single word or phrase has multiple senses, resulting in ambiguity. The task of resolving ambiguity in natural language processing (NLP) is known as word sense disambiguation (WSD). The concept of WSD serves as an intermediate task for enhancing text understanding in NLP applications, including machine translation, information retrieval, and text summarization. Its cardinal role is to enhance the effectiveness and efficiency of these applications by ensuring the accurate selection of the appropriate sense for polysemous words in diverse contexts. This study addresses these challenges by proposing three essential components: a diversity-aware machine-readable knowledge resource for SetswanaEnglish, or the Setswana universal knowledge core (SUKC), a WSD approach to resolving lexical ambiguity; and a corresponding machine-translation model embedded with a WSD capability. Setswana-English data was collected from the existing paper-based bilingual dictionaries to achieve this purpose. Secondly, the study employed professional translators to translate space domain concepts from English to Setswana. The collected lexicon was integrated into the universal knowledge core (UKC). The Lesk algorithm which has seen various adaptations by researchers for different languages over the years was employed to address the inherent polysemy challenges. This study used a simplified, Lesk-based algorithm to resolve polysemy for Setswana; and used the bidirectional encoder representations from transformers (BERT) model for Setswana, and cosine similarity measure to embed Setswana glosses and measure semantic similarity, thus determining the accurate sense. The study employed a rule-based method embedded with the WSD algorithm for machine translation. The translation accuracy of the machine-readable dictionary was assessed by employing the developed machine-translation model; and evaluated using the BLEU score. The proposed model was tested on a combination of sentences containing both ambiguous words and those without ambiguity; and a higher BLEU score of 34.89 was achieved.en_US
dc.format.extent265 pen_US
dc.language.isoenen_US
dc.subjectSetswana-to-Englishen_US
dc.subjectMachine-readable knowledge resourcesen_US
dc.subjectMachine-translation systemen_US
dc.subject.lcshTswana languageen_US
dc.subject.lcshTranslating and interpretingen_US
dc.subject.lcshMachine translatingen_US
dc.subject.lcshEnglish languageen_US
dc.subject.lcshTswana language--Translating into Englishen_US
dc.titleKnowledge-based word sense disambiguation for Setswana-English machine translationen_US
dc.typeThesisen_US
dc.description.levelDen_US
dc.identifier.doihttps://doi.org/10.51415/10321/5570-
local.sdgSDG04en_US
local.sdgSDG10en_US
local.sdgSDG16en_US
item.grantfulltextopen-
item.cerifentitytypePublications-
item.fulltextWith Fulltext-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.openairetypeThesis-
item.languageiso639-1en-
Appears in Collections:Theses and dissertations (Accounting and Informatics)
Files in This Item:
File Description SizeFormat
Moape_TG_2024.pdf4.6 MBAdobe PDFView/Open
Show simple item record

Page view(s)

83
checked on Dec 13, 2024

Download(s)

98
checked on Dec 13, 2024

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.