Non-spatial Component


Requirements

A means of integrating non-spatial data from Microsoft Excel and Access files into the single repository was required. Datasets were typically non-static, with records likely to be added and the types and range of data subject to change. New members of the Institute were also to collect and integrate new datasets over the course of time. The data initially provided was riddled with inconsistencies and omissions on a per-file basis, and so facility to update and partially clean data was also necessary. Finally, certain data, such as visitor details and park finances, was to be accessible by only allowed members.


A method of posing queries across the unified collection of datasets was required. In particular, users needed facility to link datasets, request data from fields across those datasets, and filter the results by means of applying constraints. The intended users had little to no prior database experience, and so the querying process needed to abstract away the technicalities of SQL.
Raw query results would typically prove insufficient for analysis purposes and so results had to be available in other formats, such as pivot tables.



Design and Implementation

This section will be described according to the two reasonably disparate topics of user-driven data integration and the general query interface.


User-driven Data Integration

A Java tool was developed to export Microsoft Excel sheets containing non-spatial data into tables of a specified PostgreSQL database. The Apache POI library was used for .xlsx file processing. Sheet-defined configuration parameters guide the export process, where users record these parameters in each sheet on initial integration and can make alterations as desired over the course of a dataset's lifetime. Access database tables can be easily exported to Excel sheets from which point configuration specification takes place as no cell-based formatting is necessary. The various configuration parameters include:


  1. Field name: the column's name in the resulting database table
  2. Data type: the data type of the field in the database. Simple parse-based cleaning is performed here as, for example, text values will be ignored in an integer column
  3. Fixed value constraints: values in this column must be equal to one of the values specified here, else a warning will be printed
  4. Cross-reference constraints: a table and field name are specified, and the values of the column in the sheet are checked for a match against values therein
  5. Confidential field: two fields are created in the database - one with the actual values [visible to only the dataset owner and users with elevated privileges] and another with anonymised values [visible to all]
  6. Field comment: a description or note on the field that will be readable from the query interface

An example of a sheet formatted with these constraints can be seen below:


Excel file with export configuration specified

A screenshot of the tool, following the export of an Excel file, can be seen below:


Non-spatial exporter tool in action



General Query Interface

The non-spatial component of the query interface covers the majority of the layout of graphical query construction and results viewing. Extensive use was made of custom jQueryUI widgets for graphical elements. For information on the spatial aspects and plugins refer to the spatial component page. Users tick the fields that they want to see in the results.


Selection of fields to be displayed in the results

Constraints can then be placed as necessary any field. Tables are linked by placing an appropriate constraint, and errors are highlighted if any errors are found. Boolean operator brackets can be used to group constraints for enhanced expressibility.


Examples of constraints used to construct a query

Query results were made available in multiple formats which facilitated different forms of analysis. The DataTables plugin for jQuery was used for rendering the data, as well as requesting pages of data using the server-side processing option. The various tables of results are listed below:


Raw results
Raw results table


Group by and one-field aggregates
Group by and one-field aggregate table


Multiple fields, one aggregate function
Multiple fields, one aggregate function table


Pivot table
Pivot table


No results
KITTY!


Results

Highly flexible tools for both the integration of arbitrary (albeit formatted) non-spatial Excel datasets and the querying and results viewing of the target database were developed. User testing, performed by the only three current future users of the system, suggested that non-spatial-related aspects of query building and result analysis were easy enough to grasp and use. A subsequent SUS questionnaire suggested that the components of the querying and analysis interface were well integrated and revealed very little inconsistency. They also responded positively towards wanting to use the system frequently in the future.