Does a data-driven approach, rather than a template-based approach,
produces more natural text?
What is Natural Language Generation?
Data-to-text Natural Language Generation (NLG) is the process of
generating human-readable text from non-linguistic and structured
data. While there are many different approaches to NLG, this
investigation focuses on two of them — a more traditional
template-based approach and the data-driven approach.
The data and references used were collected from Wikipedia
pages for
people. The two systems were then developed to generate a description
of a person given a set of facts about them.
How did we compare the systems?
We conducted a survey of 98 participants where each was presented with
10 paired descriptions taken from the two systems or a human reference
text. The participants were asked to rate clarity and fluency of
description and select which description is more natural.
The results from the human comparison indicate that users found the
texts generated by the template-based system to have higher clarity
and fluency, and further that they were more natural than those
produced by the data-driven system.
Are there downloadable resources?
The two papers and the results of our survey can be found on the Downloads page.