Before this attempt, there were currently two spellcheckers available for the Nguni language isiXhosa (Spellchecker.net and an OpenOffice plugin), which both have limitations in terms of their functionality, scope and accuracy. The main objective of this project was to investigate the feasibility of developing an error detector for isiXhosa using the rule based approach and then compare which approach was more accurate between this rule-based approach and the statisctical based approach (presented in Nthabiseng Moshiane's paper).
Since a rule-based spellingchecker uses the grammar rules of the language to check for spelling errors, we had to define these grammar rules as finite state transducers. Due to time constraints we only developed rules for specific Part-Of-Speech (POS) categories of the language. These POS categories were nouns, verbs, adjectives, pronouns and possessives. These were choosen based on the morphology books that we were reading and the effect of each rule in the entire system was evaluated.
The error detector was implemented as a finite state transducer network, where we used the
SFST-PL (a programming language for the tool SFST) which supports many different formats of regular expressions such
as the ones used in grep, sed or Perl.
We have used Java Swing for implementing the interface shown on the left. The interface was built for testing the rules
and there we no user evaluations conducted for this interface. In testing the system we used a corpus received from the African Language section in the University of Cape Town which had 21852 words and a textfile downloaded from the RMA website which had 20826 nouns. We first tested each transducer for POS tagging and then combined the transducers to form the complete system. In terms of POS tagging the accuracy for the noun rules was 88.26%, 94.58% for the verb rules, 97.91% for adjective rules, 98.12% for the pronoun rules and 100% for the possessive rules. The overall spellchecking accuracy of the system was 80.08%. Read more |
The project has shown that it is feasible to develop an error detector for isiXhosa using the rule-based approach and this project resulted in the successful implementation of the morphological analyser with noun, verb, adjectives, pronouns and possessives rules. The rule based approach (presented here) had a higher spellchecking accuracy than the statistical based approach (presented in Nthabiseng Moshiane's paper).
Literature Review |
Project Report |
Project Code |