Template-based System Design
Overview
The Template-based System was developed in Java using a pipeline architecture. Templates were hand-crafted and read by the system as a series of JSON files. The system works to analyse each input by the slot types it contains before a suitable template is chosen and the output text produced.
Architecture
The architecture of the template-based NLG system loosely follows the tri-module pipeline approach as proposed by Reiter and Dale and shown in the diagram here. The Document Planner module evaluates the input provided by creating a list of the slot types and creating a Document Plan from them. This Document Plan is passed to the Microplanner module which randomly chooses one of the redundant list of minor templates, and creates a Setnece Plan from it. The Realiser module takes the Sentence Plan and generates human-readable sentences.
The templates
Given the nature of this system, templates needed to be created. These templates were hand-crafted using training data. The creation process was such that entries within the dataset were categorised according to the data available in the slots and the way this data was conveyed in the reference text.
From the categorisation of the templates, major and minor template categories were formed such that minor templates were grouped together into a major template by the slot types that the minor template required. The major templates were therefore indexed by these required slots. An example of a major template with corresponding minor templates is shown below:
Slots Name; Sex or Gender; Date of Birth; Member of a Sports Team; Country of Citizenship; Position
Minor template 1 <Name> (born <Date of Birth>) is from <Country of Citizenship>. <o:Sex or Gender:He:She> was a <Position> for many teams, including <l:Member of a Sports Team:and>.
Minor template 2 <Name> (born <Date of Birth>) was a <Position> for <l:Member of a Sports Team:and>. <o:Sex or Gender:He:She> is from <Country of Citizenship>.
Sentence realising
To realise the final sentences, the system used the SimpleNLG API library to produce the text. The SimpleNLG library is capable of creating well-formed, grammatically and syntactically correct sentences given a set of inputted components.