Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests
\textbf{Background and Context:} Over the past year, \emph{large language models} (LLMs) have taken the world by storm. In computing education, like in other walks of life, many opportunities and threats have emerged as a consequence.
\noindent \textbf{Objectives:} In this article, we explore such opportunities and threats in a specific area: responding to student programmers’ help requests. More specifically, we assess how good LLMs are at identifying issues in problematic code that students request help on.
\noindent \textbf{Method:} We collected a sample of help requests and code from an online programming course; we then prompted two different LLMs (OpenAI Codex and GPT-3.5) to identify and explain the issues in the students’ code and assessed the LLM-generated answers both quantitatively and qualitatively.
\noindent \textbf{Findings:} Both LLMs frequently find at least one actual issue in each student program. Neither LLM is strong in finding all the issues, and false positives are common. The advice that the LLMs provide on the issues is often sensible. The LLMs perform better on issues involving program logic rather than on output formatting. Model solutions are frequently provided even when the LLM is prompted not to. Prompts in English produce somewhat better results than ones in a non-English language, but not massively so. GPT-3.5 outperforms Codex in most respects.
\noindent \textbf{Implications:} Our results continue to highlight the utility of LLMs in programming education. At the same time, the results highlight LLMs’ unreliability: LLMs make some of the same mistakes that students do, perhaps especially when formatting output as required by automated assessment systems. Our study informs teachers interested in using LLMs as well as future efforts to customize LLMs for the needs of programming education.
Tue 8 AugDisplayed time zone: Central Time (US & Canada) change
13:00 - 14:15 | |||
13:00 25mTalk | Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses Research Papers Jaromir Savelka Carnegie Mellon University, Arav Agarwal Carnegie Mellon University, Marshall An Carnegie Mellon University, Christopher Bogart Carnegie Mellon University, Majd Sakr Carnegie Mellon University | ||
13:25 25mTalk | Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests Research Papers Arto Hellas Aalto University, Juho Leinonen The University of Auckland, Sami Sarsa Aalto University, Charles Koutcheme Aalto University, Lilja Kujanpää Aalto University, Juha Sorva Aalto University | ||
13:50 25mTalk | From "Ban It Till We Understand It" to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot Research Papers |