This study focussed on developing two systems. A Rule-Based Semantic and a Template-Based Question Generation (QG) System. The main differentiating feature of these two systems is how their rules and templates are devised. The semantic system makes used of 8 manually created rules, while the template-based system combines semantic role labelling (SRL) with coded logic to automatically extract templates from sample input questions. The two systems are then compared in terms of question quality and dataset coverage. The template-based system is built with the goal to maintain the same question quality produced by the manually created rules, while increasing the coverage compared to the semantic QG system. This aims to alleviate the tedious work assosiated with creating templates. In analysing the performance of the systems no statistical significant difference in peformance was found. This means the template-based system achieved its aim of matching the output quality of manually created rules. In addition, the template-based system was found to possess far greater coverage, due to its larger number of templates/rules.
The Template-Based System is split into 3 modules which combine to produce questions. These modules are the Template Extraction, Content Extraction and Template Filling modules.
The systems were compared in terms of Question Quality through an online survey. This involved presenting 15 participants with 10 contexts and questions generated by the systems from these contexts. The particpants scored the questions from 1-5 in terms of Grammatical Correctness, Logical Sense and Relevance. Coverage over the dataset was also accessed through internal evaluation.
The template-based system was able to match the performance of the semantic system indicating the potential stength of the template-extraction module in matching manually created rules/templates. This indicates that the tedious process traditionally associated with creating templates could be alleviated through automatic systems. In addition, the automatic template-based system achieved much greater coverage over the dataset than the semantic system. The template-based system also matched the performance of the worst peforming neural system but was noticably surpassed by a neural system trained on a larger corpus. These results highlight the continued potential for template-based systems to match more modern neural systems. Further improvements to the template-filling module, by introducing superior linking techniques between templates, and sentences and not just k-means clustering, would likely enhance the performance futher. In addition, the incorporation of a question filtering system using automatic metrics could lead to improved average quality of the output questions.