Appendix F: Courses in programming

Back to the Report: Generative Artificial Intelligence for Education and Pedagogy

Generative AI has shown impressive ability to solve coding tasks, particularly when provided a clear specification of the goals. Researchers have shown that the current generation of models can already solve most of the programming assignments given in introductory classes. As such, programming is one of the areas where it is most critical for instructors to both be aware of the abilities of current models and to incorporate these tools into their future planning.

Current LLMs are trained extensively on program code repositories, so their impact on programming education is of particular concern. Programming is a challenging and game-changing skill. But learning to program requires understanding many abstract concepts, and an extensive knowledge of languages, package APIs, and system environments. Generative AI can both help and hinder these learning objectives.

Two factors are significant in how students relate to programming courses and what support they need. First, courses vary in the expected programming skill: a student just learning about variables, functions, and control structures has different needs than a student who is comfortable with programming. Second, students vary in motivation: some are intrinsically motivated and aim to increase programming skill, while others are extrinsically motivated and aim to build tools or products that happen to require programming.

In an introductory programming course, the goal is to help emerging programmers gain core skills such as reading code, mentally tracing execution and variable states, and identifying bugs. For such a course, GAI code generation is actively harmful to student learning, because students can generate code that meets specifications without understanding it. Rather, the better use of GAI would be as an "assistant", offering suggestions and helping students get unstuck. In a more application-oriented programming course, such as web development or introductory statistics, direct code generation may be a way to get students to a high level of performance quickly, creating curiosity and empowerment rather than boredom and confusion. Finally, in more advanced classes, students face complicated APIs and frameworks and alternative programming languages. In such cases, generative AI can be a "turbocharger" that decreases time spent consulting tutorials and documentation.

Opportunities

Three primary opportunities for LLMs in programming education are:

  • generation of code from specifications (text to code)
  • generation of ancillary tools such as test cases (code to code)
  • generation of explanations or suggestions (code to text)

The relative value of these interventions depends on course objectives. There are ample opportunities for teaching the use of LLMs as a programming tool, both to facilitate more efficient programming and to provide guidance to students learning the discipline. It is clear that generative tools will be a valuable and inescapable part of programming practice.

The first area of opportunity is to use generative tools to facilitate students’ building of more complex artifacts as part of their training. Building realistic software systems is challenging and often requires writing significant amounts of "boilerplate" code to interact with external libraries and tools. As such, even in programming intensive courses, it is common to simplify systems or provide large amounts of scaffolding code. When combined with careful instruction, the ability of LLMs to efficiently generate code for specific use cases may make it easier to have students move closer to the realistic environments. For example, current systems can generate code for basic user interfaces from user descriptions. While this may not be critical for the material in the course itself, improving the appearance of the final output can increase student motivation.

A related area of opportunity is supporting software engineering and encouraging best practices for software development. For example, students are often taught to produce detailed specifications and provide unit tests for their systems, but many non-intro courses do not require students to carry out this activity as part of assignments. LLMs can be effective at generating test cases from specifications with much less manual effort from students. This ability provides opportunities to incorporate these best practices into courses directly.

Finally, while it is not yet possible with current systems, future generative systems may open up programming and data science to those without knowledge of the programming language. This presents a remarkable opportunity to teach students for whom programming presents a large barrier of entry, but who nonetheless have an interest in using computer science tools for their applications. Students will be able to build useful systems for application-oriented projects without studying introductory programming.

Generative systems are also potentially a valuable tool for supplementing the teaching process of programming classes. It has already been reported that Harvard's CS50 course is planning to add a tutor bot based on ChatGPT. Students benefit greatly from having teaching assistants answer questions to get them unstuck. However, they can find it intimidating to ask questions or feel too behind to know what to ask. Their reticence leads to widespread use of online resources that are poorly curated and often misleading. GAI can potentially provide better targeted answers to specific issues in code than online forums. In addition to being able to retrieve specific issues that others have experienced, generative models can provide targeted responses to users' code and errors. Current models already provide explanations of code snippets, corrections of mistakes, and guidance for where to look in codebases. Additionally, there are numerous tools to integrate this guidance into code editors and search engines. With the caveat that advice as currently provided is poorly targeted to the specific level and learning objectives of students, this capability could improve the learning and debugging process.

Concerns

At its best, programming is empowering: a few simple primitives can be combined to create an infinite variety of customized solutions. Predictions of the demise of programming due to GAI seem premature. GAI does not replace the core abilities necessary to generate working, understandable, and maintainable code. The primary risk of GAI is that it may prevent students from engaging with the full expressive power of coding. Students may be able to achieve powerful results with little skill, but be unable to move beyond the solutions favored by an AI system, to identify and fix problems, or at worst, to even recognize that alternatives exist.

While intro courses emphasize small, well-structured problems, a large part of software engineering jobs is understanding and managing very large complex code bases written by others. We see this skill becoming even more important in a world where some fraction of code is written by AI systems. These heterogeneous code bases, written both by humans and AI, will require extra care to understand the limitations of each type of coder. Cornell will need to continue to train students to manage these systems, and it would be a major negative outcome if the existence of AI coders led to an assumption from students that it is no longer important to navigate and reshape large code bases.

Currently, there is a high likelihood that LLMs are widely being used as a shortcut for students in programming-heavy classes. LLMs are very good at many aspects of programming, and are particularly adept at the type of self-contained, short programs in intro programming classes, because they have been trained on many similar programs. They are also adept at translating program specifications directly into working code. For this reason, it is important that instructors convey the learning objectives of their courses in a way that motivates students to learn the underlying languages themselves.

We caution instructors against aiming to outwit the ability of models to solve problems by underspecifying details of assignments; these techniques are likely only beneficial in the short term, and future models will be able to work around these issues. In some cases, it may be possible to have students specify assignments themselves, which at least requires understanding the request and being able to check that the code is correct.

Even in instances when a system is able to generate effective code for an assignment, there is concern that systems generate “Frankenstein" code by gluing together different techniques for solving a problem. This may lead to a short-term answer, but will be hard to maintain and work with in a large-scale code library. This is a good opportunity for teaching elements of high-level engineering through either critiquing the inconsistencies of functions output by a model, or by learning to reprompt a system to produce cleaner and more consistent outputs.

Recommendations

As with other areas, the key goal is to convey to the student why it is critical to learn how to read, understand, and write code, as opposed to the goal of simply producing it. If LLMs are viewed as an assistant in the process of learning to write and edit code, they can be powerful assistants to learning and improving as a programmer. However, if they are treated as a black-box oracle, they can hinder the early developmental phase of coding. In this sense, they are like a friend with experience in the subject. Used as a tutor, they can provide guidance; used as an answer key, they can be a dead end.

The other recommendation is that it is likely there will be no long-term method to prevent use of GAI. The conflict is that homework assignments in programming must be specified in sufficient detail for students to be sure they have done the work correctly and for course staff to grade their work, but any specification that is clear enough for these purposes will necessarily also be solvable by GAI tools. While it may be possible to craft specifications that thread this needle in the short term, the variety of programming chatbots available in the future will make this a time-consuming and ultimately futile pursuit. Course staff will need to know how current bots interact with existing and future assignments, and understand the capabilities they provide to students using them.

Sample assignments

Debugging skills – preliminary to coding assignments in intro classes

Provide students with snippets of code that have clear errors or issues with them or strange error messages. Teach how to use GAI as a debugging tool to help out in these situations.

Ask students to:

  1. Run the snippets of code through an interpreter or compiler.
  2. Diagnose the issue themselves based on the output and write a short description of the problem.
  3. Run the code snippet through GAI and construct a prompt to ask for its description of the problem.
  4. Compare the output of the model with the written description: was it sufficient, and does it lead to a different code change than the original student suggestion?

Specification skills – assignment in a class emphasizing software engineering

In a class where students are developing their own design and specifications for systems, assignments can explore the use of code generation as a way of translating specifications to code.

Ask students to:

  1. Develop a human-readable specification for a software system in written language, making clear user requirements and expectations.
  2. Write a prompt that communicates to a GAI system these specifications and examples of test cases that would enforce a subset of them.
  3. Run the prompt through a GAI system to produce test cases for the underlying specification.
  4. Assess the output code to argue whether or not it was able to provide full coverage of the expected test cases. Produce examples that either incorrectly pass or fail the tests.
  5. If necessary, reiterate on B.

Simulation and visualization – assignment in a non-programming class

In this assignment students will utilize code generation and graphing as a way to run simulations based on natural language descriptions that would have traditionally required more advanced programming knowledge.

Ask students to:

  1. Describe a scenario relevant to the class, for example:
    • In a statistics class, the results of summing up weighted coin flips
    • In a linguistics class, the co-occurence of words in a document
    • In a biology class, the result of random mutations
  2. Write a prompt that describes the mechanism of the event in detailed language and ask for a graph describing the results of the simulation.
  3. Run the prompt through GAI and then run the resulting code to see the output graph of the system.
  4. Write a description of the result, analyzing whether the graph seems to correctly represent the underlying phenomenon and what aspects of the code either agree or disagree with expectations.

This assignment has potential pitfalls as it asks students to trust code they may not understand. However, the use of graphing and knowledge of the underlying subject should equip them with tools to at least notice and assess errors. We note as well that even in classes where students do know programming often convince themselves incorrect implementations are correct.

High-level LLM-assisted design – assignment in an upper level class

Provide students with an extremely detailed assignment specification describing the full requirements of the system. The goal of the assignment is to focus on system design with the use of GAI.

Ask students to:

  1. First run the specification as is through a GAI system to produce an output code base. This can be done through the API to produce a full longform output.
  2. Perform a code review of the AI output. Which parts are good, and which are poorly designed, inefficient, or confused.
  3. Construct a high-level design plan for the codebase: for example, giving the module structure and describing high-level implementation choices.
  4. Produce a detailed prompting plan describing this structure to the GAI system.
  5. Regenerate the code base, and submit the assignment.
  6. Student output will be evaluated based on this output as if they had written it directly. Goal being to convey responsibility for systems even with the use of AI tools.

Back to the Report: Generative Artificial Intelligence for Education and Pedagogy