Toolkit for an Abstract Wikipedia

Toolkit for an Abstract Wikipedia

Year 2024
Students Jordy Kafwe
Matthew Craig
Supervisor Associate Prof. Maria Keet
Second Reader Associate Prof. Sonia Berman
Tools CRAFT
TempTing

This website covers a set of software tools, CRAFT and TempTing, that aim to streamline contribution to Abstract Wikipedia. These tools were developed by Jordy Kafwe and Matthew Craig, honours students at the University of Cape Town. This page contains an overview and context, relevant to both projects. For detailed information about each project, see the CRAFT and TempTing pages.

Context and Background

Abstract Wikipedia NLG Pipeline

Abstract Wikipedia

Abstract Wikipedia is a much broader project, of which this project forms only a small part. Abstract Wikipedia envisions a system of open collaboration between people of diverse, multilingual backgrounds. Its goal is to leverage Wikidata and Wikifunctions to facilitate the creation of language-agnostic representations of content. Natural Language Generation (NLG) techniques will be utilised to produce Wikipedia articles from these abstract representations in a vast array of languages.

Wikidata

Example constructor
Wikidata is an open, collaborative knowledge base. Wikidata houses a large collection of labelled entities and the relationships they have with other entities. Its goal is to provide a diverse collection of machine-readable knowledge that anyone may contribute to and benefit from. The content is, however, largely inaccessible to the broader public. This is due to the technical barrier prohibiting interaction with the content. Abstract Wikipedia intends to solve this problem by producing human-readable natural language from the knowledge housed in Wikidata.

Constructors

The abstract representations of content are given the term constructors. Constructors are declarative statements of content to be selected from Wikidata. The declarations can be conceptualised as expressive arrangements of language-independent Wikidata identifiers. They are modular representations of content that can be composed to form an article.
Example Template

Templates

Constructors are inherently multilingual and thus require an intermediary language-specific representation before they can be realised as natural language. These representations, called templates, are specific to a constructor and language. Each template describes how constructor-specified content is to be arranged in a particular language. Notably, elements of a template can be given dependency labels which identify their grammatical role. Natural language, such as Wikipedia articles, is to be generated from the realisation of these templates.

Problem Statement

This project seeks to address some Abstract Wikipedia's current shortcomings, particularly those pertaining to constructors and templates. These can be loosely categorised into two categories: Functionality and Accessibility.

Problem: Functionality

The constructor template functionality required by Abstract Wikipedia is largely absent. Progress towards a production-ready system has not yet begun. Constructors need to be processed to extract the necessary content from Wikidata. Templates need be parsed and validated as a precursor to realisation.

Problem: Accesibility

Significant technical barriers are preventing the general public from contributing to and benefiting from Abstract Wikipedia. There is a need for tools that improve the accessibility and comprehensibility of the project. Constructors suffer from a lack of discoverability of Wikidata content. Templates require a tool that can give real-time feedback in the creation process.

Projects

To solve the aforementioned problems, two software tools were developed:
  • CRAFT - An API for processing constructors and retrieving Wikidata content.
  • TempTing - A webapp tool for creating and managing templates.

CRAFT

Jordy Kafwe

View a detailed explanation on the CRAFT Page.

CRAFT is an HTTP API for multilingual content selection that processes constructors and converts them into SPARQL queries to retrieve content from Wikidata.

Resulting Features
  • Parser for processing constructors
  • Handles multilingual input
  • Returns content in the input language
  • Query expansion for related content suggestions
  • JSON response containing selected content
Example API Request
Resources

TempTing

Matthew Craig

View a detailed explanation on the TempTing Page.

TempTing is a webapp that aids in the creation and management of templates. TempTing provides an integrated development environment for template editing. The tool’s features are supported by a custom-built parser for the template syntax. The webapp can be acessed at https://tempting-frontend.fly.dev/, however, this will not be hosted indefinitely.

Resulting Features
  • Syntax Highlighting
  • Auto-completion
  • Linting/Errors
  • Template Management
User Interface
Resources

Shared Resources