DAFT

Research Question

How can an NLG system that correctly verbalises numbers in all contexts while generating understandable isiZulu sentences be built?

Verbalising of Numbers

T​he task of verbalising numbers in an agglutinating language such is isiZulu is dependent on the object (noun) being qualified. Numbers are adjectives and isiZulu adjectives have the from concord+stem [6]. The noun being qualified determines the concord of the adjective. Each noun belongs to a noun class and information derived from that noun class is used to form the correct concord. Information that can derived from the noun classes useful for verbalising numbers includes but not limited to, prefixes, possessive particles and pronouns.

The example on the right demonstrates the verbalisation differences between isiZulu and English. We can see in the example that verbalisation of the number in isiZulu is affected by the noun being qualified, resulting in formation of concord "eziy" while in english this is not the case. Another subtle difference is the order of verbalisation, noun followed by number in isiZulu and vice versa for English [6] .

Read research paper, for comprehensive set of rules used. Java algorithms were built following these and other grammar rules.

NLG Architecture

We followed the three-module pipeline architecture [7], "de-facto" NLG architecture for our work. The text planner takes a users financial transaction history as input and performs the task of content selection, determining which transactions to report on. The text planner will create a message, this message will be forwarded to the sentence planner. The sentence planner, which is responsible for deciding on the overall structure of the message [7]. It starts by opening an external excel file consisting of templates and depending on the arguments on the message selects the most appropriate template that can convey that message. The sentence plan consists of an unfilled template and message arguments which will be used to fill those slots. The lingusitic realiser is responsible for filling those unfiled template slots with the message arguments and applying the appropriate grammar rules to ensure that the generated sentence is grammatical correct. The lingusitic realisation phase has been a stumbling block to previous isiZulu NLG research due to the languages structure and lack of computational resources, therefore we are only limited to grammar infused templates as a method of realisation. The linguistic realiser will make a call to the number verbalising algorithms and use this to fill some of the template slots.

Evaluation and Results

Overall verbalisation accuracy of numbers

Human evaluation with isiZulu speakers were conducted. IsiZulu speakers strongly agree (83%) that the algorithms are accurately verbalising numbers in any given context.

Verbalisation accuracy for each type of number
Verbalisation per number type

Both types of numbers achieved a verbalisation accuracy score greater than 75%, showing that verbalisation accuracy is independent of the type of number (cardinal or ordinal) and isiZulu speakers strongly agree that both verbalisations are accurate.

Verbalisation accuracy for each noun class

All noun classes achieved a verbalisation accuracy greater than 75%, showing that isiZulu speakers strongly agree that verbalisation is accurate and all verbalisations are within the same range showing that verbalisation accuracy is independent of the noun class.

Overall understandability of generated summaries

Human evaluation with isiZulu speakers were conducted. IsiZulu speakers strongly agree (83%) that the algorithms are accurately verbalising numbers in any given context.

Understandability of each type of message

Both types of numbers achieved a verbalisation accuracy score greater than 75%, showing that verbalisation is number type independent and strongly accurate according to isiZulu speakers.

Conclusion

The research question this work aimed to answer was: How can an NLG system that appropriately verbalises numbers in all contexts while generating understandable isiZulu sentences be built? It was previously shown that grammar-infused templates are required to produce grammatically correct and understandable isiZulu sentences [2]. This work has shown that to build an NLG system that verbalise numbers correctly in all contexts while generating understandable isiZulu sentences requires the incorporation of a specialised numbers module in the natural language generator and the use of grammar-infused templates as a method of linguistic realisation. This specialised numbers module must be developed following the grammar rules of that particular language.

References

[1] Aditi Sharma Grover, Gerhard B Van Huyssteen, and Marthinus W Pretorius. 2011. The South African human language technology audit. Language resources and evaluation 45, 3 (2011), 271–288.

[2] Zola Mahlaza and C Maria Keet. 2019. A classification of grammar-infused templates for ontology and model verbalisation. In Proceedings of the Research Conference on Metadata and Semantics Research. Springer, 64–76

[3]Brigitte van Schouwenburg and Marné Pienaar. 2005. Taalbeleid aan !- nansiële instellings. Southern African Linguistics and Applied Language Studies 23, 4 (2005), 335–347. https://doi.org/10.2989/16073610509486394 arXiv:https://doi.org/10.2989/16073610509486394

[4] Margie J Probyn. 2005. Learning science through two languages in South Africa. In Proceedings of the International Symposium on Bilingualism. 1855–1873

[5] Maria Keet and Langa Khumalo. 2014. Toward verbalizing ontologies in isiZulu. In Proceedings of the International Workshop on Controlled Natural Language. Springer, 78–89.

[6] Arnett Wilkes and Nikolias Nkosi. 2012. Complete Zulu Beginner to Intermediate Book and Audio Course: Learn to read, write, speak and understand a new language with Teach Yourself. Hachette UK.

[7] Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Natural Language Engineering 3, 1 (1997), 57–87