Data-driven System Design
Overview
The Data-driven NLG system consists of two stages; the sentence planner and linguistic realiser. The sentence planner takes the tokens from the input Meaning Representation(MR) and orders them into sentences to form a sentence plan. This sentence plan is then passed to the linguistic realiser which produces a natural language sentence. However, in order to train the sentence planner and linguistic realiser, the dataset must be delexicalised.
Dataset Preprocessing
The dataset consists of a number of entries each containing a set of slot-type, slot-value (e.g Name_ID: Mel Clark) pairs, tokens, as well as a reference text. For example:
Name_ID: Mel Clark
instance of: Human
date of birth: July 7 1926
date of death: May 1 2014
sex or gender: male
member of sports team: Philadelphia Phillies, Detroit Tigers
country of citizenship: United States
The reference text for this entry was:
Mel Clark (July 7 1926 -May 1 2014) was an United States Major League Baseball outfielder. He attended college at Ohio University and was signed by the Philadelphia Phillies in 1947. Clark played with the Phillies from to and with the Detroit Tigers in.
Delexicaliseing the dataset involves replacing all the slot-values in the reference text with the corresponding slot-type. Once it has been delexicalised the reference text for the example above would then become:
Name_ID ( date_of_birthdate_of_death ) was an country_of_citizenship Major League Baseball outfielder. he attended college at Ohio University and was signed by the member_of_sports_team in 1947. clark played with the Phillies from to and with the member_of_sports_team in.
Sentence Planner
The sentence planner produces an ordered sequence of slot-types, the sentence plan, from the collection of input tokens given to the sentence planner. The aim is to order all the tokens so that the linguistic realiser is able to produce a coherent utterance.
Training
The sentence planner works in a similar way to a Markov Decision Process (MDP) it l earns the probabilities for the next slot-type given the current slot-type or pair of slot-types. These probabilities are learnt using the tokens extracted from the delexlicalised reference texts.
Generation
To generate a sentence plan the most likely slot-type to start an utterance is found. This slot-type is then added to the sentence plan. The sentence plan is then generated iteratively by adding the next slot-type that is the most likely slot-type to follow the two most recently added slot-types. The sentence planner maintains an array of slot-types from the input MR that must still be included in the sentence plan the remaining array. A slot-type is only added to the sentence plan if it is in the remaining array. With the exception of the Name_ID and <end> slot-types. Once the remaining array is empty the sentence plan is complete. This process is shown in the figure below for the following input and remaining arrays:
Input = [Name_ID, date_of_birth, country of citizenship, sport]
Remaining = [Name_ID, date_of_birth, country of citizenship, sport, <end>]
Linguistic Realiser
The linguistic realiser takes a sentence plan as input then uses an encoder-decoder network to produce a natural language utterance. The encoder-decoder network is composed of two Recurrent Neural Networks (RNNS) using Long Short-Term Memory (LSTM) cells, such the last hidden state of the first network, the encoder, is used as the first hidden state for the second network, the decoder. At a high level, the encoder encodes the input sequence (in this case the sentence plan) into an abstract vector representation. The decoder is then used to map this representation to a natural language utterance.
Training
To create the training dataset for the encoder-decoder network pairs for token sentences and natural language sentences are required. These were generated by extracting the tokens from each sentence in the reference texts for each entry in the dataset. The token sequences form the training input and the natural language sequences make up the training targets. For example, one input-target pair might be:
Input Name_ID date_of_birth date_of_death country_of_citizenship <end>
Target Name_ID ( date_of_birthdate_of_death ) was an country_of_citizenship Major League Baseball outfielder.
Generation
One the utterance has been generated by the encoder-decoder network, it must be relexicalised by replacing slot-types with suitable slot-values. Once this is done a few minor rules such as ensuring full stops at the end of sentences and capital letters for the first word in a sentence. The figure below shows an example of how a sentence plan is converted into a natural language utterance.