Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests (ICER 2023 - Research Papers)

Who

Arto Hellas, Juho Leinonen, Sami Sarsa, Charles Koutcheme, Lilja Kujanpää, Juha Sorva

Track

ICER 2023 Research Papers

Time Zone

The program is currently displayed in (GMT-05:00) Central Time (US & Canada).

Use conference time zone: (GMT-05:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 8 Aug 2023 13:25 - 13:50 - Large Language Models

Abstract

\textbf{Background and Context:} Over the past year, \emph{large language models} (LLMs) have taken the world by storm. In computing education, like in other walks of life, many opportunities and threats have emerged as a consequence.

\noindent \textbf{Objectives:} In this article, we explore such opportunities and threats in a specific area: responding to student programmers’ help requests. More specifically, we assess how good LLMs are at identifying issues in problematic code that students request help on.

\noindent \textbf{Method:} We collected a sample of help requests and code from an online programming course; we then prompted two different LLMs (OpenAI Codex and GPT-3.5) to identify and explain the issues in the students’ code and assessed the LLM-generated answers both quantitatively and qualitatively.

\noindent \textbf{Findings:} Both LLMs frequently find at least one actual issue in each student program. Neither LLM is strong in finding all the issues, and false positives are common. The advice that the LLMs provide on the issues is often sensible. The LLMs perform better on issues involving program logic rather than on output formatting. Model solutions are frequently provided even when the LLM is prompted not to. Prompts in English produce somewhat better results than ones in a non-English language, but not massively so. GPT-3.5 outperforms Codex in most respects.

\noindent \textbf{Implications:} Our results continue to highlight the utility of LLMs in programming education. At the same time, the results highlight LLMs’ unreliability: LLMs make some of the same mistakes that students do, perhaps especially when formatting output as required by automated assessment systems. Our study informs teachers interested in using LLMs as well as future efforts to customize LLMs for the needs of programming education.

Arto Hellas

Aalto University

Finland

Juho Leinonen

The University of Auckland

New Zealand

Sami Sarsa

Aalto University

Charles Koutcheme

Aalto University

Finland

Lilja Kujanpää

Aalto University

Juha Sorva

Aalto University

Finland

Time Zone

The program is currently displayed in (GMT-05:00) Central Time (US & Canada).

Use conference time zone: (GMT-05:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 8 Aug
Displayed time zone: Central Time (US & Canada) change

13:00 - 14:15	Large Language ModelsResearch Papers Session Chair: James Prather

13:00 25m Talk		Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses Research Papers Jaromir Savelka Carnegie Mellon University, Arav Agarwal Carnegie Mellon University, Marshall An Carnegie Mellon University, Christopher Bogart Carnegie Mellon University, Majd Sakr Carnegie Mellon University
13:25 25m Talk		Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests Research Papers Arto Hellas Aalto University, Juho Leinonen The University of Auckland, Sami Sarsa Aalto University, Charles Koutcheme Aalto University, Lilja Kujanpää Aalto University, Juha Sorva Aalto University
13:50 25m Talk		From "Ban It Till We Understand It" to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot Research Papers Sam Lau University of California at San Diego, Philip Guo University of California at San Diego