Yuan et. al.
The QAit (Question Answering with interactive text) task proposes a novel text-based question answering problem whereby an agent must interact with a partially observable text-based environment to gather the declarative knowledge required to answer questions. QAit poses said questions about the location, existence and attributes of objects distributed throughout the environment. QAit produced and evaluated a set of baseline models on a created test-set of unseen environments and questions. This test-set is intended to be used as a benchmark for future research to evaluate an agents ability to comprehend language and generalise its action policy.
QAit aims to test an agent's language comprehension abilities using tasks that require an understanding of locality, existence, and attributes. All environments are generated, using Textworld, by sampling from the world setting distribution (tabulated below), where environment configurations are distinguished into fixed map and random map categories. The fixed map category sees to the creation of environments consistently containing 6 unique rooms. In contrast, random map games draw from a uniform distribution to decide on the number of rooms to create.
Fixed Map | Random Map | |
# Locations, Nr | 6 | Uniform(2,12) |
# Entites, Ne | Uniform(3*Nr, 6*Nr) |
Questions based on each environment are created on the fly as an agent plays a game, but the number of different games an agent trains on is set as an experimental parameter. All agents are trained on datasets consisting of 1, 2, 10, 100, 500 created environments as well as an unlimited setting where different environments are created for each question, thereby theoretically not allowing an agent to see the same environment question pair twice. In this setting, more than 1040 different games can be created, indicating that an agent is unlikely to see the same environment again.
Since language generation can become intractable within a reinforcement learning setting, all text commands are triplets of the form {action, modifier, object} (e.g., open metallic gate). When there is no ambiguity present such as two different keys in a room, the environment understands commands without modifiers, e.g. pick key will result in picking up the "copper key" provided it is the only key in the room. At each game step, there are three lexicons that divide the vocabulary into actions, modifiers and objects. This reduces the size of the action space for each word in the command triplet compared to a sequential, freeform setting. The wait command indicates the agent wants to stop interaction and answer the question.
The QAit test set provides 500 held out games for both map types and all three question types. This testing set is used to benchmark the generalisation abilities of agents on all experiment configurations. This allows for models to be assessed in a reproducible manner and is analogous to supervised learning test sets.
Accuracy refers to the proportion of correctly answered questions and is deemed the most important metric since, ultimately, the goal of IQA is to answer a question.
Sufficient information is a metric used to evaluate the amount of information gathered by the agent and whether or not the information was sufficient to answer the question. It is also used as part of the reward function. This is a metric to evaluate the performance of the navigation and interaction required to answer a given question.
QAit provides five baselines - these are human, random, and three popular value-based reinforcement learning methods. The human baseline consists of results achieved by 21 human participants. The random baseline performs no interaction with the environment and simply samples answers from the potential answer pool. This is yes or no for existence type questions and all possible object names for location type questions. The reinforcement learning baselines are DQN, DDQN and Rainbow.