Lodestar

Tuesday, Oct 1, 2019| Tags:

Overview

Lodestar is an ontology-based text annotation platform. It provides a reusable digital textbook platform that will annotate uploaded textbooks. These annotations consist of definitions and related concepts extracted from the ontology. They are presented to the user in parallel with the textbook.

Background

Textbooks are used to convey knowledge. With the progression of the digital age, textbooks have frequently become digital. However, these digital textbooks often take the form of a scanned or typed copy of the physical book and thus, do not take advantage of new techniques that digital textbooks can bring. These techniques include text annotation, context-based navigation and automatic question and answer generation.

Text annotation is a key feature of a smart textbook as it allows users to interact with key parts of the text and gain deeper understanding of what the textbook is covering. Thus, Lodestar focuses on this text annotation, which involves linking the textbook to an ontology of the subject domain. The ontology contains concepts that are referenced in the text and additional information such as definitions and relationships with other concepts. This additional information needs to be extracted from the ontology and used to annotate terms in the textbook.

Operation

Pipeline

Textbook processing pipeline

Textbooks can be uploaded as single HTML files or as a PDF document. The Source Parser then converts these into a standard HTML format that the annotation system uses to detect terms found in the ontology. Once these are annotated, the completed HTML document is passed to the renderer where users can view it.

An example Lodestar site can be viewed at https://lodestar.paddatrapper.com. The demo is reset every 2 hours and does not persist changes. Log in using the username and password “admin”. This site is not guaranteed to be available after 31 December 2019. The code is available on GitLab.

Screenshot

Lodestar screenshot

Final Project Paper

The final paper can be viewed online here. This paper gives a technical overview of Lodestar and how it performs text annotation using ontologies. It also compares the accuracy and recall for several versions of the text annotation matching system.

The best accuracy was a recall of 89% and a recall of 70%. This is potentially better than existing text annotation systems. Further, Lodestar is not tied to a single ontology or piece of text, rather it is generic. Of the 4 versions tested, the best included matching the text for phrases, words and plurals of the terms in the ontology. However, adding the matching of synonyms retrieved from WordNet, resulted in much worse performance. This drop in accuracy largely reflects the limitations of WordNet, as it does not have many synonyms for technical terms.

Other Documents

The literature review, proposal and poster are also available online.

This project was completed in conjuction with Automated Question Generation.