Does a data-driven approach, rather than a template-based approach, produces more natural text?
What is Natural Language Generation?
Data-to-text Natural Language Generation (NLG) is the process of generating human-readable text from non-linguistic and structured data. While there are many different approaches to NLG, this investigation focuses on two of them — a more traditional template-based approach and the data-driven approach.
How does NLG work?
Meaning Representation
Name_ID: Zola
Occupation: supervisor
Sentence Planner
Linguistic Realiser
Natural Text
Zola works as a supervisor.
Where does the data come from?
The data and references used were collected from Wikipedia pages for people. The two systems were then developed to generate a description of a person given a set of facts about them.
How did we compare the systems?
We conducted a survey of 98 participants where each was presented with 10 paired descriptions taken from the two systems or a human reference text. The participants were asked to rate clarity and fluency of description and select which description is more natural.
The results showed the following:
What did we find?
The results from the human comparison indicate that users found the texts generated by the template-based system to have higher clarity and fluency, and further that they were more natural than those produced by the data-driven system.
Are there downloadable resources?
The two papers and the results of our survey can be found on the Downloads page.