About

What is SANCTUM?

SANCTUM is an online web interface used to index, search and analyze a large Twitter dataset on a cluster environment.

SANCTUM uses both modern and classic approaches to create an environment in which analysis of large datasets - not only Twitter - is simple and accessible.

Introduction and Background

Social media platforms produce vast quantities of data on a daily basis. Users of social media are able to share their thoughts and opinions of various topics easily and conveniently through these platforms. Twitter is one of the world’s leading social media platforms that is used by millions of users. By analyzing Twitter data, interesting insights regarding the dynamics of online communities may be discovered.

However, Twitter data can not be easily processed due to the sheer amount of data generated daily by its users. A problem is also faced when identifying meaningful tweets as sometimes user-generated content may be nonsensical and it can be difficult to extract meaning from certain tweets without context.

Problem Statement

The project focused on building a system that could aid researchers in analyzing large amounts of Twitter data in usable time. The system should be able to find frequent relationships in tweets relatively fast and be able to visualize these results in a way such that meaningful information can be extracted from the dataset provided.

Project Goals and Significance

Social media has become an integral part of society, and generates massive amounts of data on a daily basis. By employing data mining techniques on these datasets, researchers and businesses can gain valuable insight into the dynamics of online communities and individual users. The detection of interesting associations between topics and words is one of these insights. Unfortunately, the analysis of large amounts of social media data requires significant amounts of processing power, which can often be costly to both researchers and businesses. Maintaining a cluster of your own, or renting a cluster on a permanent basis from providers such as Amazon Web Services (AWS) is an expensive exercise. Additionally, extracting meaningful information from the results of the analysis can be a time-consuming and tedious process.The goal of this project was to build a system that can utilize a cluster to do a scalable analysis of large amounts of Twitter data, as well as present the results in a useful manner.

Project Breakdown

system-diagram

The project was divided into three systems that make up the SANCTUM system (Scalable Analysis on a Cluster of Twitter-Data Using Mining).

Team members:

Matt Young: mattyoung305@gmail.com

Eric Dai: ericshangpingdai@gmail.com

Pieter van der Walt: pieter072@gmail.com

Supervisor:

Jivashi Nagar: onirjivashi@gmail.com

Professor Hussein Suleman: hussein@cs.uct.ac.za