Template-Based Question Generation

Overview

This study focussed on developing two systems. A Rule-Based Semantic and a Template-Based Question Generation (QG) System. The main differentiating feature of these two systems is how their rules and templates are devised. The semantic system makes used of 8 manually created rules, while the template-based system combines semantic role labelling (SRL) with coded logic to automatically extract templates from sample input questions. The two systems are then compared in terms of question quality and dataset coverage. The template-based system is built with the goal to maintain the same question quality produced by the manually created rules, while increasing the coverage compared to the semantic QG system. This aims to alleviate the tedious work assosiated with creating templates. In analysing the performance of the systems no statistical significant difference in peformance was found. This means the template-based system achieved its aim of matching the output quality of manually created rules. In addition, the template-based system was found to possess far greater coverage, due to its larger number of templates/rules.

Github Repository

Methods

Semantic System

Diagram Showing the Architecture of the Semantic Question Generation System

Description of Semantic QG System

The transcript is pre-processed with contractions being expanded.
The transcript is then segmented into sentences using the SpaCy library in Python.
Each valid sentences then undergoes SRL using a trained model from the AllenNLP Institute.
The identified tags are then matched to the 8 manually created rules.
Upon detecting a match, the sentence is re-arranged to form a question.

Template System

Diagram Showing the Architecture of the Template-Based Question Generation System

Description of Template QG System

The Template-Based System is split into 3 modules which combine to produce questions. These modules are the Template Extraction, Content Extraction and Template Filling modules.

Template-Extraction Module

Description of Template-Extraction Module

The sample input questions are clustered into 12 clusters using k-means clustering.
Each question, in every cluster, undergoes SRL.
The semantic tags detected, relative to the individual verbs, are then combined to produce a single set of semantic tags per question.
The words/phrases in the question are then replaced with their identified semantic tags.
The templates are then filtered, with valid ones being stored back in their original cluster.
This process led to the creation of 81 templates.

Content Extraction Module

Description of Content-Extraction Module

This modules functions as per the Semantic System above.
However instead of matching against pre-defined rules the identifed semantic tags are matched against the extracted templates.
This is undertaken in the Template-Filling module.

Template-Filling Module

Description of Template-Filling Module

This module accepts each sentence in the transcript as well as its identified semantic tags as input.
It then predicts into what cluster the sentence should be placed.
The templates from that cluster are then retrieved.
Each template is then checked to see if all it's slots can be filled by the semantic tags identified in the sentence.
If they can, the template is filled.
The generated question then undergoes filtering before being output.

Results

The systems were compared in terms of Question Quality through an online survey. This involved presenting 15 participants with 10 contexts and questions generated by the systems from these contexts. The particpants scored the questions from 1-5 in terms of Grammatical Correctness, Logical Sense and Relevance. Coverage over the dataset was also accessed through internal evaluation.

Graph Showing the Mean Scores Achieved by Each System in Each Accessed Category Over All 10 Contexts

Table Showing a Comparison of the Coverage of the Semantic and Template-Based Systems

Table Showing Statistical Comparsions of the Systems Output Quality

Results Summary:

The results showed no statistical significant difference in output quality between the Semantic and Template-Based system in any of the 3 assessed categories.
The Template-Based system achieved statistically significant superiority to the worst performing neural system in terms of Logical Sense and Relevance.
The Template-Based system achieved far superior coverage over the dataset compared to the semantic system.

Conclusions and Future Work

The template-based system was able to match the performance of the semantic system indicating the potential stength of the template-extraction module in matching manually created rules/templates. This indicates that the tedious process traditionally associated with creating templates could be alleviated through automatic systems. In addition, the automatic template-based system achieved much greater coverage over the dataset than the semantic system. The template-based system also matched the performance of the worst peforming neural system but was noticably surpassed by a neural system trained on a larger corpus. These results highlight the continued potential for template-based systems to match more modern neural systems. Further improvements to the template-filling module, by introducing superior linking techniques between templates, and sentences and not just k-means clustering, would likely enhance the performance futher. In addition, the incorporation of a question filtering system using automatic metrics could lead to improved average quality of the output questions.

Resources

Literature Review

An Analysis of Using Templates to Generate Questions for Inquiry-Based Learning

Download

Paper

Investigating Rule and Template-Based Methods for Automatic Question Generation from Lecture Transcripts

Download

Project Code

Download