How can an NLG system that correctly verbalises numbers in all contexts while generating understandable isiZulu sentences be built?
The task of verbalising numbers in an agglutinating language such is isiZulu is dependent on the object (noun) being qualified. Numbers are adjectives and isiZulu adjectives have the from concord+stem [6]. The noun being qualified determines the concord of the adjective. Each noun belongs to a noun class and information derived from that noun class is used to form the correct concord. Information that can derived from the noun classes useful for verbalising numbers includes but not limited to, prefixes, possessive particles and pronouns.
The example on the right demonstrates the verbalisation differences between isiZulu and English. We can see in the example that verbalisation of the number in isiZulu is affected by the noun being qualified, resulting in formation of concord "eziy" while in english this is not the case. Another subtle difference is the order of verbalisation, noun followed by number in isiZulu and vice versa for English [6] .
Read research paper, for comprehensive set of rules used. Java algorithms were built following these and other grammar rules.
We followed the three-module pipeline architecture [7], "de-facto" NLG architecture for our work. The text planner takes a users financial transaction history as input and performs the task of content selection, determining which transactions to report on. The text planner will create a message, this message will be forwarded to the sentence planner. The sentence planner, which is responsible for deciding on the overall structure of the message [7]. It starts by opening an external excel file consisting of templates and depending on the arguments on the message selects the most appropriate template that can convey that message. The sentence plan consists of an unfilled template and message arguments which will be used to fill those slots. The lingusitic realiser is responsible for filling those unfiled template slots with the message arguments and applying the appropriate grammar rules to ensure that the generated sentence is grammatical correct. The lingusitic realisation phase has been a stumbling block to previous isiZulu NLG research due to the languages structure and lack of computational resources, therefore we are only limited to grammar infused templates as a method of realisation. The linguistic realiser will make a call to the number verbalising algorithms and use this to fill some of the template slots.
The research question this work aimed to answer was: How can an NLG system that appropriately verbalises numbers in all contexts while generating understandable isiZulu sentences be built? It was previously shown that grammar-infused templates are required to produce grammatically correct and understandable isiZulu sentences [2]. This work has shown that to build an NLG system that verbalise numbers correctly in all contexts while generating understandable isiZulu sentences requires the incorporation of a specialised numbers module in the natural language generator and the use of grammar-infused templates as a method of linguistic realisation. This specialised numbers module must be developed following the grammar rules of that particular language.
[1] Aditi Sharma Grover, Gerhard B Van Huyssteen, and Marthinus W Pretorius.
2011. The South African human language technology audit. Language resources
and evaluation 45, 3 (2011), 271–288.
[2] Zola Mahlaza and C Maria Keet. 2019. A classification of grammar-infused
templates for ontology and model verbalisation. In Proceedings of the Research
Conference on Metadata and Semantics Research. Springer, 64–76
[3]Brigitte van Schouwenburg and Marné Pienaar. 2005. Taalbeleid aan !-
nansiële instellings. Southern African Linguistics and Applied Language
Studies 23, 4 (2005), 335–347. https://doi.org/10.2989/16073610509486394
arXiv:https://doi.org/10.2989/16073610509486394
[4] Margie J Probyn. 2005. Learning science through two languages in South Africa.
In Proceedings of the International Symposium on Bilingualism. 1855–1873
[5] Maria Keet and Langa Khumalo. 2014. Toward verbalizing ontologies in isiZulu.
In Proceedings of the International Workshop on Controlled Natural Language.
Springer, 78–89.
[6] Arnett Wilkes and Nikolias Nkosi. 2012. Complete Zulu Beginner to Intermediate
Book and Audio Course: Learn to read, write, speak and understand a new language
with Teach Yourself. Hachette UK.
[7] Ehud Reiter and Robert Dale. 1997. Building applied natural language generation
systems. Natural Language Engineering 3, 1 (1997), 57–87