Appendix E: Mathematics, physical sciences, and engineering

Back to the Report: Generative Artificial Intelligence for Education and Pedagogy


Technical and mathematical courses have adjusted well in the past to incorporate new technologies. For example, the widespread infusion of computing and visualization tools such as Matlab over the past two decades has allowed instructors to ask more of students and to assign exercises to help students gain deeper understanding. As a complement to solving a differential equation, for instance, students can plot out the solution and visualize how results depend on key model parameters. While GAI is not precisely analogous to these methods, there are similar opportunities to enhance education in the space of mathematics, physical sciences, and engineering. These opportunities include utilizing LLMs in an explanatory capacity for technical topics and supplementing technical topics with the ability to synthesize code to support data analysis and visualizations.

While current GAI systems struggle with producing novel mathematical reasoning, they are adept at retrieving and summarizing complex arguments on technical topics. Although they do still often hallucinate or skip steps in the process, a careful and curious student can use them as a supplement to explore a topic. As an example, we consider open-ended physics questions. An instructor might make an assignment that asks a student to have a dialogue with an LLM about a specific question, e.g., “would I weigh more at the equator or the north pole?”. The student could then turn in the transcript plus a reflection about what they have learned. LLMs can also be used in the reverse direction to assess the reasoning of the student in a devil's advocate-like scenario. If a student is asked to make an argument with technical justification, they could be asked to utilize an LLM to assess their own reasoning and justify why or why not the LLM’s critique of their technical argument is valid.

In their current form, LLMs present a nice opportunity for data analysis and visualization. Current models are excellent at generating complex graphs and visualizations from user specifications. For example, in an assignment collecting experimental data, an instructor could ask for a graphical presentation of results in a precise manner. Students could then use an LLM in conjunction with a coding environment like Jupyter to produce and iterate on their output. This assignment is possible even in courses where students do not have advanced programming knowledge, as they can observe the output of the system. Similar approaches are also possible for data science and analysis assignments, where LLMs allow students to query and filter data without having to fully understand code output.


While the current generation of LLM systems are not yet capable of solving unseen mathematics problems due to limits on their reasoning and computational power, they are extremely capable of replicating and combining previously seen solutions to commonly used problems. As such, it is critical that instructors are aware not only of the capabilities of current systems but also of their behavior on specific problems.

The most transparent short-term concern for technical classes is the student use of LLMs to shortcut learning objectives on homework assignments. Specifically, if students become accustomed to using LLMs in intro classes where it returns correct, or even partially correct answers, they may become reliant on them in more challenging classes or in the workforce. We caution that the capabilities of these systems, specifically in terms of mathematical reasoning, are increasing rapidly, and that just in the last several months systems have improved greatly in this area. For classes where material is solvable by these systems, and the instructor believes this is problematic, we recommend both openly discussing the benefits of manual computation and course learning objectives with the class, as well as establishing clear guidelines about its use with the course. Additionally, instructors may consider altering assignment specifications or including additional requirements that deviate from standard versions of the problems.

Current LLM systems are more than willing to provide answers for questions that they are nowhere close to being able to solve. For example, when asked to solve a quadratic equation, the current generation of ChatGPT will proceed through the steps of solving by producing hallucinated intermediate values, despite not having access to a calculator. A striking aspect of these responses is that they are written in a tone that mimics the confidence of a textbook. The mismatch between the tone and the technical ability can lead to overconfidence from the reader. For this concern, we recommend an assignment in intro classes that requires students to go through the exercise of understanding these issues. For early proof-based courses, we suggest that students be asked to "provide a plausible proof generated by a language model and to find gaps or mistakes in the proof."


Our overall recommendation is to encourage productive engagement with GAI while highlighting potential pitfalls.

For productive student use, we encourage communicating regularly and clearly to students about when these tools can and should be used. Just like we ask students to solve differential equations by hand when Wolfram Alpha can solve it, instructors should make clear the need to practice and apply concepts, principles, and methods.

Even when LLMs can be used, it should be made explicitly clear to students that they are responsible for their own work, even if they used AI to generate all or part of a solution. If the AI answer is wrong and the student turns it in, the student is still responsible for the submitted answer. Provide examples where they can see how it fails in practice.

For instructors, we encourage understanding the precise abilities and limitations of these models. Submit problems to a LLM before assigning them to students to see what it can solve. What mistakes does it make? What grade would it get? Is this an assignment you still want to give and if so, do you ban the use of LLM, or allow its use with attribution? For example, in an engineering mathematics course where learning to solve certain types of equations is a learning outcome that will be assessed by in-person exams, you may wish to ban the use of GAI. But if the assignment is to use solutions to probe or to understand the physical problem at hand, you might ask students to attribute the use of AI to solve the problem.

Sample assignments

We demonstrate some examples that can be posed to GAI (ChatGPT in this case) and the responses from GAI to help illustrate educational opportunities and issues, (ChatGPT responses edited for brevity). These could be used as written HW assignments or as points of discussion in recitation sections.

1. Conceptual question in physics

Ask students to:

A. Prompt GAI with a question such as: “If I weigh myself at the equator will I weigh more, less or the same as at the North Pole?”

ChatGPT (for example) will return:

Yes, your weight would be slightly different at the equator compared to the North Pole, but the difference is quite small and may not be noticeable without highly precise equipment. This weight difference is primarily due to the variation in Earth's shape and centrifugal force caused by its rotation. …

B. Then ask students to perform the analysis underlying the qualitative answer above and to compute numerically their weight at the pole and at the equator as a way to gain understanding into the magnitude of this effect.

2. Mathematical proofs

Ask students to:

A. Prompt GAI to “Prove that the solution to the 2D steady-state heat equation is unique”.

ChatGPT (for example) will return:

To prove the uniqueness of the solution to the 2D steady-state heat equation, we can use the method of contradiction.
Consider the 2D steady-state heat equation with homogeneous Dirichlet boundary conditions:
∇²u = 0,
where u(x, y) represents the temperature distribution in the xy-plane, and ∇² is the Laplacian operator. Suppose there are two solutions to this equation, denoted as u1(x, y) and u2(x, y), that satisfy the same boundary conditions.

B. Ask students to assess the accuracy of the proof (in the above example, the solution is not unique, but nonetheless ChatGPT “proves” that it is, in this case, by restricting the boundary conditions).

C. As a follow up, and along the lines of teaching students to use GAI prompts, students could be asked to edit the prompt to produce a correct response. For example, given the prompt “Is the solution to the 2D steady-state heat equation unique?”,

ChatGPT (for example) will return:

No, the solution to the 2D steady-state heat equation is not necessarily unique. The uniqueness of solutions for the 2D steady-state heat equation depends on the specific boundary conditions and the geometry of the problem.

D. Then ask the student to provide examples of problems with and without unique solutions.

3. Understanding assumptions baked into engineering analyses

In engineering problems, there are often many underlying assumptions that students may or may not be aware of. Ask students to:

A. Prompt GAI with a question such as “What are the underlying assumptions in beam theory?”

ChatGPT (for example) will return:

  1. Linear Elasticity: Beam theory assumes that the material of the beam behaves elastically and follows Hooke's law. It assumes that the beam material is isotropic and that it deforms linearly under applied loads.
  2. Small Deflection: Beam theory assumes that the deflection of the beam is small compared to the dimensions of the beam itself. This assumption ensures that the geometry of the beam remains relatively unchanged during deformation.

B. Ask students to reflect on and assess the accuracy of the ChatGPT response (in this case the full ChatGPT response is mostly, but not fully correct), or to provide examples of applications where the underlying assumptions (at least those that are correct) are no longer valid.

Back to the Report: Generative Artificial Intelligence for Education and Pedagogy