Using the R Statistical Software to Analyze Data in a Computing-Education ContextICER 2023
Instructor: Stephen R. Piccolo, Associate Professor, Brigham Young University
Where: John Crerar Library Building at 5730 S. Ellis Ave, Chicago, IL 60637
Intended audience: Researchers at any career stage
Eligibility: Anyone who has a programming background
Description: The R statistical software is used widely in research. It is open source, and a myriad of packages are available for diverse kinds of analytical tasks, including data preparation, statistical analyses, machine learning, and data visualization. However, many in the CSEd community are less familiar with R than with alternative tools (such as proprietary statistical software).
The goals of this workshop are to help CSEd researchers 1) gain a conceptual understanding of the data types, structures, libraries, and programming strategies commonly used in R, 2) gain hands-on experience with programming in R, and 3) reach a point where they can begin using R independently in their own research. The workshop will focus on core skills on which participants can build to accomplish more advanced tasks.
The first part of the workshop will alternate between didactic instruction and hands-on programming exercises. During the didactic portions, the instructor will use traditional lecture slides, complemented by active-learning exercises to help participants assess their understanding of the concepts. The exercises will be delivered via a Web-based automatic grader so that participants can receive immediate feedback on their code. This automatic grader will remain available for at least three months after the workshop has ended so that students can continue to access their code and hone their skills as desired. In the second part of the course, participants will analyze a “real world” CSEd dataset. This dataset comes from a recent study in which programming exercises were submitted to ChatGPT and its ability to generate functional code was assessed. Workshop participants will analyze the data to replicate the study’s findings and potentially to explore their own questions.
Below is an outline of the topics that will be covered (subject to change):
- Working with vectors, lists, and data frames
- Visualizing data (using the ggplot2 package)
- Transforming data (using the dplyr package)
- Manipulating strings and factors (using the stringr and forcats packages)
- Tidying data (using the tidyr package)
- Joining data (using the dplyr package)