Project Aim
- Determine the characteristics that are required of an ontology to generate instances of a question type.
- Build a program that generate questions from a set of question template and an ontology.
- Evaluate and determine the quality of the generated questions.
Background
The automated question generator uses concepts from two major disciplines: Ontologies and Natural Language Generation (NLG)
Ontology
An ontology can be defined as equivalent to a description logic knowledge base. Essentially, an ontology can be seen as classes or concepts in a subject domain that are linked with each other using different relations and the underlying description logic (DL) provides more meaning to and restrictions on the ontology.
Web Ontology Language (OWL) is based on DL representation formalism. It is commonly used to format and serialise an ontology into a file.
In the project, Protégé, an Ontology Development Environment (ODE), is used to design and manage the ontology. The ontology used to generate questions in this project is the African Wild Life ontology. Detail about this ontology can be viewed at https://keet.wordpress.com/2010/08/20/african-wildlife-ontology-tutorial-ontologies/. Some changes and axioms were added to this ontology to ensure its ability to generate instances of all question types in the project.
Natural Language Generation
In this project, template based NLG is used to generate questions. Template based NLG is a form of NLG that maps non-linguistic words into “gaps” in a linguistic structure. Such linguistic structures with “gaps” in them are known as templates. In this paper, templates of different question types are identified and the “gaps” in template are replace by the appropriate classes in an ontology.
For example:
Template - Does a <animal> eat <plant>?
Sentence - Does an impala eat grass?
Question Generator
The question generator are made up by two essential components: The question template and generation algorithms.
The question generator is build using Java and is available on GitHub.
Question Templates
A set of question types were identified along with their templates. The “gaps” in the template are enclosed by ’<’ and ’>’ and the word in between is a token that represent a class in the ontology. Such “gaps” can be replace with any subclass of the class specified in the template.
In the case of replacing a “gap” with a specific type of object property, >Object- Property:Verb< is used. This allows only object properties that are verbs to replace the token.
Besides replacing tokens with its subclass, the replacement needs to conform to the underlying DL so that the question is answerable by the ontology. However, this differs across question types.
Generation Algorithms
Question generation algorithms are defined for each question type. This ensures that the generated question is answerable by the ontology.
There are 9 question types that can be generated using 8 different question generation algorithms. These 8 generation algorithms can be classified into 3 group, algorithms in the same group differ slightly according to the templates.
Experiment
The main aim of this experiment is to evaluate the quality of the generated questions. Besides the evaluation, another aim is to gather other details through feedback provided by the participants. This feedback can used to improve in the next iteration of the question generator development.
The quality of questions in this project consists of three aspects. THe first aspect is that the syntax of a question refers to rules in the grammar of English for the use of words, punctuation, phrases, clauses and the structure of the sentence. The second aspect of a quality question is the semantics of the question. A question with a clear and unambiguous meaning is known to have good semantics. The last aspect of a quality question is that it must be answerable by the ontology.
The survey used to evaluate the set of generated question can be found here.
Results
From the feedbacks provided by the participants, the following results and finding can be concluded:
A linguistic library is required to manage the articles, plural and singular forms and the tenses of the questions.
There were errors in defining the templates, such as certain “what” questions should have been “which” questions.
Use the domain and range of an object property rather than using actual axioms in the ontology.
The ontology used is too small and causes repeated random generations.
Final Project Paper
The final project paper can be found here. This paper explains the above sections in greater details.
Other Documents
The literature review, proposal and poster are also available online.