Imporoving Text-Based Programming Error Messages for South African Students

A 2023 Computer Science Honours Project at UCT

Hero Imgs Learn More

Overview

Problem
  • (1) Programming error messages are a vital source of feedback but are often cryptic.
  • (2) No work has quantified the use of L1 (Afrikaans and isiXhosa) instead of English when preferred by programmers.
Aims
  • (1) Improve error message readability using established guidelines.
  • (2) Present improved messages isiXhosa and Afrikaans.
  • (3) Investigate the impact of the improved and translated error messages on debugging performance and user preference.

img

Improving Python Error Messages
(By Mandisa Tunzi)

This study investigates impact of improving Python error messages using the previously researched guidelines and the impact of translating them into isiXhosa for English and isiXhosa-speaking novice programmers.

read more
img

Improving Java Error Messages
(By Danny Guttmann)

This study investigates impact of improving Java error messages using the previously researched guidelines and the impact of translating them into Afrikaans for English and Afrikaans-speaking novice programmers.

read more

Introduction

The process of learning programming is often challenging, with a critical aspect being the identification and correction of errors in code. However, error messages provided by programming tools are notorious for their cryptic and confusing nature, causing frustration among developers. Research efforts have aimed to enhance these error messages by formulating readability guidelines. Recent work by Becker et al. [2021] reports on an investigation into defining readability and assessing its impact. They introduced four readability guidelines: economy of words, simple vocabulary, removal of jargon, and the use of complete sentences. This paper applies these guidelines to Python and Java error messages, utilizing a template-based approach with predefined text structures filled with data values. This project investigates the impact of these guidelines in improving the readability and comprehension of error messages for first-year Computer Science students. The study also explores the effect of presenting error messages in isiXhosa and Afrikaans for novice programmers who do not speak English as their first language.

img

Improving Python Error Messages

Mandisa Tunzi

Overview

This completed research focuses on enhancing the readability and understandability of error messages in the widely-used Python programming language, for novice programmers. The study introduces a template-based approach based on natural language generation and readability guidelines. These templates are designed by experts to generate improved error messages for common Python errors. A user study with novice computer science students was conducted to evaluate the approach. The results showed that the template-based approach improved the readability and comprehensibility of Python error messages, as perceived by users. However, these improvements did not lead to faster debugging times when the messages were enhanced in English. Translating the improved messages into isiXhosa, a different language spoken by some participants, had limited impact on debugging speed, although it proved more effective for certain complex tasks for isiXhosa-speaking participants. This suggests the potential benefits of localized error messages for non-native English-speaking programmers.


Research Questions

RQ1: Will applying a set of readability guidelines to transform text-based Python error messages in to natural language using a template-based approach improve the readability and comprehension of Python error messages?
RQ2: Will these enhanced error messages allow first year Computer science students debug code faster than with default error messages?
RQ3: Will first year Computer Science students debug code more efficiently when presented with enhanced error messages in isiXhosa, another language they speak, compared to using the default error messages presented in English.


Background

The study introduces a template-based Natural Language Generation (NLG) approach, which uses predefined sentence structures or templates to create coherent and grammatically correct sentences. These templates are tailored to the specific domain and purpose of the NLG system. In this research, the template-based approach is applied to enhance the readability of Python error messages. Predefined templates, designed by programming experts, follow established readability standards and include designated slots filled with relevant data values from encountered error instances. This method aims to make error messages more comprehensible and user-friendly.

Methods

- Research initiated with the assembly of a dataset containing 92 frequently encountered Python error messages, focusing on those common among novice programmers.
- Systematic categorization of error messages and generation of erroneous code examples provided insights into their structure.
- 62 unique error message templates were crafted, retaining core structure while integrating designated placeholders for variable information.
- An online survey was conducted, seeking input from programming instructors and tutors to enhance the templates based on readability guidelines.
- Survey responses led to the improvement of 44 templates while maintaining a consistent four-line structure.
- The final phase involved translating the templates into isiXhosa, ensuring linguistic coherence through a native speaker's review.
- The research aimed to enhance the comprehensibility of Python error messages and explore the benefits of localized error messages for non-English-speaking programmers.

System: PyPly: The Template-based Python Error Message Generation System

pyply-comp
PyPly: System Components
pyply-IO
PyPly: Input and Output

PyPly is a purpose-built system designed to generate new, improved, and translated error messages. It accomplishes this by extracting error-specific details from standard error messages and seamlessly integrating them into enhanced or translated templates. The system comprises three core components: the lexing mechanism, the parsing mechanism, and the message builder. It takes standard Python error messages as input, extracts error-specific data, retrieves suitable enhanced or translated templates from a database, incorporates the extracted details into appropriate slots within the templates, and finally, generates newly formulated error messages in either English or isiXhosa.

User Study

  1. Pre-Survey and Quiz: Prior to the study, participants completed a pre-survey and a quiz to assess their programming knowledge. The quiz focused on Python concepts and was designed to gauge their proficiency.
  2. Group Assignments: The three participants were randomly assigned to different groups. One was the control group, exposed to standard error messages, while the other two were part of intervention groups. Intervention group A received improved-English error messages, and intervention group B received improved-Xhosa error messages.
  3. Debugging Task: The participants were given a debugging task involving 15 Python programs with various errors categorized by difficulty. They were asked to identify and correct the errors using a text editor. The error messages provided were based on their group assignments.
  4. Automarker Integration: An Automarker was used to evaluate the participants' solutions. It was integrated with PyPly, allowing it to generate and display the appropriate improved or translated error messages based on the participants' groups.
  5. Measuring Debugging Time: The primary measure of the study was the time taken by participants to debug each task. This was calculated as the time difference between the first successful execution of the current task and the previous one.
  6. Error Message Templates: The study used 15 distinct error message templates, both in English and isiXhosa versions. These templates were designed to improve the clarity and readability of error messages.
  7. Post-Survey: After completing the debugging tasks or when the 80-minute time limit was reached, participants filled out a post-survey. This survey collected their feedback on task difficulty, the use of external debugging tools, preference between standard and improved error messages, and their perception of the effectiveness of the error messages they received.

Results

IE
Debugging-time: Standard vs. Improved-English
IX
Debugging-time: Standard vs. Improved-Xhosa

Conclusions

Enhanced Readability and Comprehension

Applying readability guidelines improved error message clarity and understanding.
Consistent positive feedback for both English and isiXhosa messages.
Enhanced messages positively impacted novice programming students' debugging experience.


Debugging Time Impact in English

Participants receiving improved messages took significantly more time than the control group.
Improved readability did not equate to faster debugging for this group.


Debugging Time Impact in isiXhosa

Enhancing isiXhosa messages did not significantly accelerate overall debugging.
Nuanced analysis showed improved effectiveness for medium and hard tasks.
Participants preferred first-language messages for clarity and comprehension, even when overall debugging time didn't decrease.

Limitations

Small Sample Size: The study's limited sample size (three participants) reduces result generalizability.

Quality and Quantity of Improved Messages: The improved error messages were primarily based on one person's input, potentially missing diverse user perspectives.

Measurement of Debugging Time: The study's method for measuring debugging time lacks consideration of potential influential factors, such as idle time and adaptation periods.

img

Improving Java Error Messages

Danny Guttmann

Overview

Programming error messages are an important source of feedback for programmers. Due to their lack of detail and use of complicated jargon, many novices do not find them useful, and ultimately find programming inaccessible. To make Java programming more accessible to South African students, Java error messages were improved by applying guidelines for writing effective error messages from previous research in this area, and then translated into Afrikaans, the most commonly spoken language in the Western Cape province.

Methods

COLLECTION OF SUBSET OF JAVA ERROR MESSAGES

A subset of commonly encountered Java error messages was extracted from the BlueJ Blackbox database.


TEMPLATE FORMATION

Error message templates were formulated for the 100 most commonly encountered error messages.


IMPROVED TEMPLATE FORMATION

(1) The suggestions from the tutors were used to formulate improved error message templates.
(2) The improved templates were translated into Afrikaans by a Afrikaans speaking third computer science student at UCT.

Parser Generation

The Python module RE (Regular Expressions) was used to generate a parser program, that took a Java stack trace as input, and output the name of the Java Exception class, the error message, and any unique data contained in the error message (data not part of the template). This output was given as input to another program that output an improved message by matching the standard template with an improved one.


pyply-IO
Architectural Diagram of the Java System used for Lab Evaluation

User study

EVALUTION PREPARATION

(1) 7 programs were written. Each raised a particular Java Exception when run.
(2) An Automaticmarker script was written to run the 7 Java programs, and then display either the improved English or improved Afrikaans error messages.
(3) A pre-survey was prepared with Microsoft Forms, which contained questions relating to basic Java programming constructs.


EVALUTION

(1) First year computer science students at UCT were invited to a lab session to participate in the study.
(2) Two students arrived, and were asked to complete the pre-survey and the debug the seven Java programs.
(3) One student was given the improved Afrikaans error messages, and the other was given the standard error messages (as a control).
(4) Due to the low participation rate, a Microsoft Form was prepared. Each question contained Java code, and the error that was given when the code was run, with specific inputs, as well as the expected output. The participants were asked to rewrite the code so that the program produced the expected output.
(5) The Microsoft Form task was completed remotely by five participants.

Results

The results suggested that there was no significant difference between debugging performance of participants who received the standard error messages and the participants who received the improved error messages (in English or Afrikaans) was found. Due to the small sample size of participants and the limited amount of time, insufficient data was gathered to draw any meaningful conclusions about the efficacy of the improved error messages.

rslts
1. Java Evaluation Results
(GREEN: correctly answered question; GREY: no question asked; WHITE: incorrectly answered question)
evalmsgtble
2. Improved Java Error Messages Evaluated
jgraph
3. Evaluation Results

Meet the Team

img
Improving Python Error Messsages

Mandisa Tunzi

tnzman002@myuct.ac.za
img
Improving Java Error Messsages

Danny Guttmann

gttdan002@myuct.ac.za
img
Project Supervisor

Dr Zola Mahlaza

zmahlaza@cs.uct.ac.za
img
Second Reader

Aslam Safla

asafla@cs.uct.ac.za