AfriLex Database Application

Overview

AfriLex is a specialized lexicographical database designed to enhance Wikidata's coverage of Niger-Congo B languages, including Bantu, by facilitating the collection, refinement, and batch upload of linguistic data. AfriLex stands out due to its unique design, which is specifically tailored to accommodate the linguistic complexities of the Niger-Congo B languages. This focus on the Niger-Congo B language family ensures that the platform is finely attuned to the intricacies and nuances of these languages, providing a rich and accurate representation of their grammatical structures, phonology, and vocabulary. This specialized design is a testament to AfriLex's commitment to delivering a platform that is not only functional but also culturally sensitive and linguistically accurate. Another notable feature of AfriLex is its integration of advanced technological tools to enhance its functionality. For instance, the platform includes a SPARQL query endpoint, which enables users to perform complex searches and extract specific linguistic data efficiently. This feature is particularly useful for researchers and linguists who require precise and targeted information. Additionally, AfriLex boasts a verb form generator, which can automatically generate various forms of verbs based on the linguistic rules of the Niger-Congo B languages. This feature is invaluable for language learners, educators, and linguists, as it provides a quick and reliable means of understanding verb conjugations in these languages.

Objectives

AfriLex aims to enhance Wikidata's coverage of Niger-Congo B languages by providing a comprehensive and linguistically detailed lexicographical database.

Niger Congo B Database Model

Design a unique database model , that can reflect the extensive linguistic features and detail of Niger Congo B languages.

AFRILEX Database Application

Develop a database application which facilitates the meticulous collection, refinement, and batch upload of linguistic data to Wikidata

Verb Form Generation

Automatically generate various forms of verbs based on the linguistic rules of the Niger-Congo B languages, and facilitate their upload to Wikidata.

Database Design

We designed our Niger-Congo B languages database through an iterative cycle, starting with a review of relevant resources and projects. This informed the creation of an initial ORM prototype, which encapsulated key linguistic features. The prototype was then refined based on additional analyses, ensuring accuracy and comprehensiveness while reducing the risk of failure and simplifying development.
   
The database captures the complexity of Niger-Congo B languages through interconnected entities. LanguageFamily and Language categorize languages, connecting to linguistic details like Morpheme and Phoneme. Word entries are displayed through the Word entity, linked to a central LexicalEntry. Tonal intricacies are represented by TonalPattern and Tone entities, while User, VerbForm, VerbalMorphology, and VerbExtension entities handle user management and grammatical complexities, among others.

Full ORM Database Model

Database Application Development

The AfriLex Database Application is a modular and scalable platform specifically designed for storing and managing lexicographical data of Niger-Congo B languages. It features a comprehensive database that encapsulates the linguistic intricacies and grammatical features of these languages. The application leverages MySQL for efficient data management, and employs a lightweight Python Flask backend coupled with a dynamic JavaScript frontend to provide a responsive user interface. Key features include a custom bot, WingUCTBOT, for batch uploading data to Wikidata, a SPARQL endpoint for enhanced data querying capabilities, and a unique Verb Form Generator that automates the generation of diverse verb forms, ensuring the linguistic diversity of Niger-Congo B languages is accurately represented and easily accessible.

Icon 1
Niger Congo B lexicographical Database
Icon 2
Interactive Database Interface
Icon 3
WingUCTBOT for Wikidata Lexeme Upload
Icon 4
SPARQL Query Point
Icon 6
Verb Form Generator
Icon 5
Verb Form Upload

Lexemes Generated and Uploaded

The project successfully improved the representation of Bantu Languages on Wikidata, achieving a high upload success rate of 99.26%.

536

Lexemes Uploaded To Wikidata

67

% Niger Congo B Feaures Included

76

% Generated Verb Form Validity

AfriLex Database Application 
Evaluation Results

Below are the results based on our project objectives to improve Wikidatas lexicographic repository for Niger Congo B Languages. The evaluations carried out provide insights into the data compatibility, linguistic representation, and the performance of the Verb Form Generator. The findings highlight our project's achievements and point towards areas for further improvement.

The Afrilex Database (DB) evaluation shows a varied representation of Niger-Congo linguistic features. With a 68% coverage in the Noun class system and 89% in Nominal morphology, it displays strengths in noun morphologies. However, areas like Syntax at 56%, and Writing Systems and Analysis at 40% indicate room for improvement. Notably, the Verb Extension System is fully captured at 100%, showcasing some comprehensive coverage amidst other areas needing refinement.
Image 1
Image 2
The database was able to correctly capture 67.7% of Niger Congo B language features. This was limited by the scope of the project as going any further would not have added any more functionality to the Lexeme or Verb Generation.Important features such as Noun Classes and Subject Verb Order were modeled first, and then with further iterations more language features where added.

AFRILEX Database Application

Batch Upload Lexemes to Wikidata
   
SPARQL Query Point for the Database
   
Generate Niger Congo B Verb Forms and Upload To Wikidata

Image description here

 Documents

Literature Review

Review of the academic literature relevant to lexicographic database tools and their implementations.

Read More 

Project Proposal

Assess the feasibility of the project and outlines the plans of the project.

Read More 

Final 
Paper

Final Paper containing more details about the project and sections mentioned above, specifically for AfriLex.

Read More 

Project Poster

Poster illustrating the overview of components developed within this project.

Read More