4 Chapter 4: Starting with AI

In this chapter, we turn to the practical question that has been behind and driven the more theoretical parts of this book. How does one actually go about teaching humanities courses in the era of AI, utilizing it when appropriate, and giving assignments that make its use unlikely when we need students to do the work themselves? We invite readers to join us on the journey, learn from these early explorations of a passionate humanities colleague, and forge new paths in your own classrooms.

In introductory courses, the content is often at a level that is well represented online, and thus likely to be handled well by AI. It has been educators who teach such courses who have seen the biggest uptick in AI-generated submissions from students, and who have felt the most disheartened. Fear not! Even when the skills you need to teach are simpler or more mechanical in nature, there are solutions.

In many cases, introducing AI explicitly early in a course (or more generally in a curriculum) is the best way to help students understand what this technology does well and what its shortcomings are. This establishes a healthy relationship with (and open conversation about) AI. It empowers the educator to define the arena for the course, rather than reacting to the students’ assumed ruleset. Furthermore, it gives students the clear message that the faculty member is both aware of state-of-the-art tools as well as how they interact with the educator’s specific area of content expertise. This is a conversation that is better to lead than to follow. In the next few sections, we discuss several ways you can initiate this conversation.

Showing What AI Can Do

Most courses are scaffolded, in that earlier assignments are easier, more manageable, or designed to build fundamental skills that are later improved or practiced until students master those skills. In such scenarios, there will likely be assignments that an LLM can answer credibly or perhaps even well. One technique to avoid the inevitable arms race is to flip the assignment: to get full points, produce a better answer than the LLM generates. You provide a decent answer, generated by an LLM, perhaps even generated during class if you are either bold or familiar enough with the technology to do so. And now, you’ve turned what is an otherwise potentially disastrous assignment and made it a competition between the student and a computer.[1]

The best part is, a carefully organized assignment will not require policing students, because there is nothing left to police. And, as an ancillary benefit, students will have a concrete, objective standard by which to measure themselves and the works of others.[2] For such assignment types, it will be important to specify (or at least provide examples of) ways in which their work can or must be better. These will likely include the following:

Style: LLM answers tend to be bland, since they follow the dominant patterns in human speech. Students need to find their voice and figure out what they can say not only in terms of content, but in terms of style that makes reading their work preferable to reading AI-generated content. Challenge students to write in an engaging style and distinguish that style from that of an LLM’s output. Care must be taken here to not be outlandish just for the sake of doing so. Human-written text carries with it the weight of intentionality.

Detail: LLM answers tend to leave out important details that are relevant or necessary to a proper discussion. Recalling from our chapter on “Understanding Generative AI,” LLMs determine “importance” by frequency of recurring patterns in its training data, which means, of course, that it has no true idea of importance. To best the AI example, students must either do their own research or fact-check responses from their detailed query chains with an LLM.

Accuracy: An LLM can fabricate details or facts, and such inaccuracy and imprecision is likely. Even if the error is easy to miss or relatively minor, students must take ownership of every word, sentence, and paragraph that they submit.

Presentation: AI is notoriously bad at generating and connecting images with text. Students must cohesively integrate graphs, charts, and images to leverage that visual form to the benefit of the reader.

Analysis: LLMs have limited ability to reason, deduce, or analyze. With sufficient guided queries and direction from humans, they can serve as an insightful thought partner. Students must provide deeper analysis in their work as appropriate.

Interaction with sources: This topic is a subset of the “accuracy” metric, as it relates specifically to cited sources. LLMs are notorious for inventing quotes that do not exist, referring to books that have never been written, or incorrectly navigating the relationship between different sources or points of view. Students should engage with sources actively and both identify and correct these issues. This will almost certainly be an area where students can improve on AI-generated content, by quoting and discussing specific quotations from primary and secondary sources.

When provided with clear guidance of the sort we’ve sketched out above, you shift the learning focus to the human part of the process and the strengths thereof. There are invariably things that an LLM does better than a human being – at a surface level. The key to framing an activity in this way is to highlight its shortcomings, but not in the form of an easily ignored laundry list of weaknesses and fearmongering, but rather as an object lesson on real LLM interactions. You are essentially leading by example.

Students need clear guidance about how to exceed the product of an LLM. Ideally, what it means to outperform an AI should be defined by them—or at least, with their input—in an assignment to the group, while you add things to the list that they fail to. The process here is important, not just the resulting list. Having students look closely at the LLM-generated text and evaluate its shortcomings is important to their understanding this technology, what it can do, and what they can do better than it can. One of the things that makes this type of assignment especially useful is that it provides an opportunity to focus attention directly on the human abilities that will—at least, for the foreseeable future, and perhaps always—outpace AI. These skills will be in most demand in the coming years. Using AI to generate text quickly may seem advantageous to many companies. They will nevertheless also want their content to stand out, grab interest, and persuade. There will be many jobs that look for this sort of collaboration with AI, using it to get a draft done fast, then editing, fact-checking, and livening up that draft in the manner that only a human being can.

The next step in this assignment might be for students to share an example of completed coursework and engage in peer evaluation. Who made the best improvements? How did they do so? What did they focus on? This type of assignment has another subtle but important benefit: honing the craft of constructive criticism. Students, at least in some cultural contexts, are loathe to criticize the work of others because it could be perceived as impolite, inappropriate, or hurtful. When an educator models for students how to disagree with someone and still hold them in high esteem, and trains them to do likewise, students can acquire and practice this important skill.[3]

Even if students are willing to engage in that activity, they are often not able to articulate why they like or dislike something, or why a particular writing sample or argument is weak or strong. As preparation for evaluating the work of their fellow human beings and cultivating this important skill, you can require students, as part of their grade, to offer an initial evaluation of LLM-generated content. Indeed, giving an assignment that challenges them to outperform AI may, in many instances, be timed to come after students have first submitted and shared their evaluations of LLM-generated content so that their own assessments, discussed in class, serve as the basis for the next stage of seeking to improve it.

In a course that is not focused on students developing their own authorial skills, but rather on the content in their writing, you might tell students that they start out with a perfect grade in the class and will lose points if they fail to submit work or it contains factual errors or is inadequately focused. They may then be allowed to freely use AI to generate text quickly, but will need to carefully examine the results and rewrite where necessarily. If they do not do so, once they see their grade begin to drop as a result, they will likely change their approach. In such an activity, the instructor might decide to have students share and evaluate each other’s work, which would provide an opportunity to compare work done without AI, with some AI usage, and relying heavily on AI. The result will be students appreciating how AI might enhance productivity, while also fully appreciating the impossibility of relying on it to produce content without the human in the loop.

Showing What AI Cannot Do Well

An educator can instead approach AI from the opposite direction, namely, highlighting what AI does poorly. The intent of such an approach is to motivate students to use or develop their own skills, rather than lean on AI. After all, if they believe that their newly acquired skills are soon to be obsolete, they will be less motivated to work on them. It’s never bad to be reminded that what students (and educators) do matters.

ChatGPT was released just before I taught a new course on women in early Christianity. I decided to jump right in and have students experiment with it. It was already my custom to use a points-based rather than percentile grading system (as described in the previous chapter). I encouraged students to share with me the results of their experiments. They were not required to do so, but I offered this as one possible way to earn points. The one rule was that they needed to tell me what they did. Thus, typing in a question and getting an answer could earn you a couple of points. Evaluating what ChatGPT produced was worth many.

One student asked the AI chatbot why Mary Magdalene is widely thought to be a prostitute.[4] It correctly responded that this viewpoint is widely held but incorrect. It even added a solid cautionary conclusion: “It’s important to be cautious of accepting long-standing beliefs or traditions without evidence, especially when they perpetuate harmful stereotypes or misconceptions.” The student then asked for sources and was directed to the Bible, historical documents, and biblical scholarship. ChatGPT provided four examples of the latter. The student asked where to find those sources and the LLM replied with the recommendation to try university libraries, bookstores, academic databases (mentioning JSTOR and Project MUSE as examples), and open access repositories. The student also asked for links to the specific sources the LLM had mentioned, and it happily complied. When the student tried them, none of them worked.

The student told ChatGPT this, and it apologized profusely in its characteristic manner, explaining that things sometimes move to different locations on the internet. It was at this point that the student tried Googling the authors and titles the chatbot had provided and discovered the truth that none of them were real books. In fact, one of the recommended non-books was supposedly written by an author who had made a guest appearance in the class, and so this student had the opportunity to meet them in person. Confronted about this, ChatGPT’s reply was as follows:

I apologize for any confusion. It appears that some of the books I listed in my previous response may not exist or have been misidentified. This can sometimes happen when a model like me, which has been trained on a large corpus of text, makes an error.

If you’re still interested in learning more about the topic of Mary Magdalene, I would recommend starting with a reputable biblical commentary or a scholarly work in religious studies or biblical studies. You may also want to consider consulting a biblical scholar or a theologian for additional insights and perspectives.

If you have experimented with an LLM, this interaction should not surprise you. At least, I hope it won’t. Yet the truth is that I have seen academics online who have neither learned about LLMs nor fact-checked what they produce. They have not discovered for themselves how LLMs operate, because they have never probed the contents it generates the way the student in the Mary Magdalene example did. They are, in this sense, no different than our aspiring student. Recall that an LLM simply produces patterns in its training data set and does not truly understand anything, so it does not recognize in the patterns of text on which it was trained how the component parts of a citation must always stick together. Applying that logic, it is not a stretch to understand why it might generate plausible-sounding citations that might appear to “verify” the statements the LLM has made—unless the human using it refused to simply assume the output of an LLM is factual and trustworthy and did their due diligence investigating whether that was the case.[5] In the Mary Magdalene example we have been discussing, ChatGPT mixed and matched authors, titles, and subtitles of real citations. Due largely to the media attention it received, ChatGPT has since improved its results in this specific type of fabrication error. (It did so by retraining with better examples and being more cognizant of when it had gaps in knowledge.)

The true value of this exchange for the student was to see for themselves what an LLM does and what its limitations are. It is advisable to either have students undertake their own experiments in fact-checking an LLM, or present them with an example of this sort. If you share examples, it is best to generate your own that are relevant to the course. In a course I teach on the Bible and music, I have long had a scavenger hunt activity that requires students to find biblical allusions in popular secular music. As I was considering whether I could still keep this assignment without students being able to simply get answers from ChatGPT, I decided to ask the chatbot for such examples to see what it came up with. The first batch was a mixture of songs the majority of which did not contain biblical allusions. I pointed this out and asked it to do better. Its second attempt was better, but also included one of my favorite ChatGPT fabrications to date. On the list was “The Ballad of John and Yoko” which the chatbot confidently told me was about the biblical story of John the Baptist and Yoko. Seriously.[6]

Since there is no way students can find large numbers of biblical allusions without doing research to direct them to examples, using ChatGPT is a tool that they may use just as they would an internet search. In both cases, fact-checking is necessary. If an LLM may fabricate a biblical connection, fans of songs have also done so in their effort to make greater sense of song lyrics. Warn students about the propensity of both music fans and AI tools to make things up, and then you can grade an activity like this without having to police AI use. If incorrect and unsubstantiated information is included, there can be a consequence, without any need for the educator to spend time trying to determine whether a human or an AI produced it. Once a student includes something, they are responsible for that content.

In the same course, the final project is to create a song. Since this is a core curriculum course, it is not normally taken by music majors, but many students who do not major in music may nonetheless have a significant musical background. Other students have little or none. Thus, while students with the ability are free to write an original song, the default form of the final project is that students write new lyrics adapted from the Bible to a recent song. This is something they all can do. The advent of AI song generators has me more concerned about determining whether an original song is a student’s own work, although it is unlikely that any students in this particular course would have the wherewithal to not only write but record such a polished song. When I first heard the songs that the AI song generator Suno created in seconds in response to my prompts, I initially thought perhaps I had misjudged what AI had become and what was going on “under the hood.” However, the language of music is even more constrained and formulaic than the language of speech. It is no surprise that, just as an LLM can create new texts based on patterns discerned in existing ones, an AI system can generate new songs in the same way.[7]

When it comes to the students who are just creating new lyrics, requiring them to choose a recent song helps minimize the potential for them to simply ask an LLM to produce the lyrics. If you ask ChatGPT to paraphrase a particular text in common meter, so that it can be sung to a tune such as that associated with “Amazing Grace,” it will do so reasonably well in a matter of seconds. If you ask it to do the same except the words need to be singable to “Hotel California” by the Eagles, it will fail miserably. Here, the key weakness you are exploiting to encourage students to write their own lyrics is a facet of how LLM output relates to its training data. Although those lyrics (like all others) are represented online, they do not represent a recognizable pattern that it can easily emulate. Common meter, on the other hand, is everywhere in music and poetry. That’s why one can sing the lyrics to “Amazing Grace” to other tunes, such as “House of the Rising Sun” and the theme songs from Gilligan’s Island and Pokémon.

As educators, we are typically already designing and redesigning our courses and assignments on a regular basis. In the next wave of revisions, educators just need to also incorporate our response to AI. Teaching students about shortcomings of AI is a key component of both their and our education. In chapter 2, we emphasized that it is not enough to only rely on the limitations of AI as educators. That would be like saying that Wikipedia is sometimes wrong, and so we don’t need to worry about student use of it. If a student submits work based on information in Wikipedia that is incorrect, receives a poor grade as a result, and in the process learns the limits of Wikipedia, that is to their benefit. As true as that may be, as educators we should be working to help students understand the character of sources and their implications, so that we are encouraging them to make good choices, rather than leaving them on their own to eventually learn from their mistakes.

Wikipedia tends to be largely correct, but you don’t know where the information comes from and what expertise, if any, lies behind it. The same is true of the propensity of LLMs to make things up. We do not have any mechanism that can automate finding reliable sources of information. Google Scholar finds material with footnotes that looks scholarly, but purveyors of pseudo-scholarship have long had their own ecosystem of journals, periodicals, books, and even museums that imitate these features. Discernment cannot be automated.

This has been true all along, in the finding and use of sources, and in the determination of which ones are reliable. The arrival of LLMs has not fundamentally changed this aspect of scholarship. We have to understand, and help our students understand, that LLMs producing output that is factually wrong, at least on occasion, is a feature not a bug. Only if we teach this, and students take it to heart, will humanity be charting a path towards using AI wisely. It is not enough that students will make poor arguments or write weak essays because their reliance on AI leads them to include fabricated claims and false sources and then fail to learn the necessary skills as a result. Ideally, we want them to avoid this disastrous outcome. We need to educate with their understanding and best interest in mind. If we have done that and they still fabricate despite our best efforts, then consequences will follow, as has always been true in the realm of education.

When teaching students about the limitations of AI systems, I often use the Kentucky Fried Chicken Twitter account as an example.[8] In 2022, the company was criticized for a social media post recommending that people commemorate Kristallnacht with cheesy chicken.[9] The reason for such an inappropriate tweet? Reliance on a bot that mentioned whatever was on the national calendar and invited people to commemorate it with chicken. Note that the concept is not entirely bad, and the issue is not that one cannot legitimately automate aspects of advertising. Rather, the issue is that it requires a human mind to put together an appropriate calendar and then have the automated system act accordingly. Suggesting that people celebrate Yom Kippur with your chain’s special variety of pork tenderloin sandwich or bacon cheeseburger might seem perfectly fine to some human beings, but even if not quite as bad as the KFC flub, it is still a terrible and easily avoidable gaffe.

You might think that it’s easy to automate such an ad bot to avoid this gaffe. You would be right. The challenge with AI is not that you can’t develop an AI system to handle a particular gaffe, a group of explicitly listed gaffes, or even a category of gaffes that fit some sort of pattern. It’s that you can’t anticipate all possible gaffes. Said another way, computer scientists can write programs that handle specific issues with an advertising bot (by not suggesting a Philly cheesesteak during Diwali) but cannot write a generalized program that can anticipate any perceived slight. That ability describes artificial general intelligence (AGI) in a nutshell. (And speaking of which, nutshells can also be triggering because of peanut allergies.) It doesn’t take long to reach the edge of technology and realize that the humanities help you navigate these computational edges. As long as the task is straightforward, rote, and mechanistic, computers can handle it. But the moment you step outside of those bounds, you can no longer depend on it. So, in some sense, when you teach a student about AI, you are implicitly teaching them what it means to be human.

That’s pretty cool. We can take these lessons learned a step further. Students who can articulate these sorts of subtle points are themselves a walking advertisement of the value of a humanities-informed education. They will get more out of their classes and be poised to succeed after graduation. They will be prepared for whatever AI-shaped future we may eventually find ourselves in.

One last area to draw students’ attention to is the Google Search AI knowledge panel that draw text directly from websites using AI. Whereas traditional search engine results send you to source links, this panel synthesizes the information from those websites and provides a direct answer to your query on the search page itself. This has two different layers of abstraction: keyword matches AND word density analysis. An example from my own experience shows what can go wrong with this. At one point when searching for articles on a topic related to my field, Google provided me with a statement in the knowledge panel that clearly did not reflect the views of contemporary scholarship. Being interested in AI, I clicked through to the article to figure out why and how this had happened. Google’s AI system had found the most relevant keywords, the clearest statement on the topic, in a particular paragraph. What it had no capacity to understand was that its result was drawn from a section that described a viewpoint held in the nineteenth century. The article went on to explain the compelling reasons why this older view is now rejected. A human reading the article would immediately understand the context, but the AI system has no way of doing so. All it is doing is matching text to query based on word matching. This is precisely the sort of area where trusting AI as though it were capable of thinking, including discernment and evaluation, is clearly wrong. AI is extremely helpful as a tool to connect us with relevant resources. However, it needs to be used wisely by humans who understand what our minds can do that AI, however sophisticated, cannot.

Evaluating What AI Has Done

Yet another type of assignment involves giving students an AI-generated treatment of a topic or answer to a question and asking them to fact-check and evaluate it. The intent of this style of assignment is to foster critical reading skills and identify credible sources. In one of my courses focused on information literacy, I have long had an assignment that requires students to find reliable sources and annotate their bibliography to explain how they knew the sources were reliable. This was a useful exercise for students then, and it is even more important now, given the ease with which one can disseminate misinformation in today’s digital environment. Asking students to evaluate an LLM’s answer by reference to reliable sources, and then investigating the existence and quality of sources provided by an LLM, helps students learn and cultivate the same skills. You can also either provide an LLM-generated list of sources or make one that deliberately combines acceptable and unacceptable sources for a particular purpose and tell students to evaluate them.

Students are rarely clear on what constitutes an acceptable source. Unless explicitly instructed to reverse the process, students may find sources they deem relevant, use them, and only at the end try to provide justification for their choice of sources. Educators need to explicitly tell students that finding and evaluating sources needs to happen at the beginning of the research process, and any inappropriate sources set aside at the outset. Often, students will justify their use of a source by pointing out that they found lots of other sources that agree with it. Content from Wikipedia is widely reproduced online, but if the original content is not authoritative, its wide distribution does not in any way change its accuracy. Students will all have used Wikipedia, but few of them have ever clicked on its tabs. Show them the edit history on a Wikipedia page relevant to your course. Show them the many edits made by people whose IP address and nicknames reveal nothing about what expertise they have in that subject matter, if any. Emphasize that, while many Wikipedia editors have at least one academic degree, even among those that do, we cannot be sure that they are providing edits on areas in which they are familiar.[10] Furthermore, text from Wikipedia is duplicated in lots of places, so this logic proves that statements propagate throughout the internet regularly whose perspective and accuracy are unascertainable.[11]

In the present context, the point that can be made through the analogy between an LLM and Wikipedia pertains to epistemology. If you don’t know where a claim is coming from, what evidence and expertise stands behind it, why would you trust it? Of course, Wikipedia and LLMs cite sources. Showing students examples of made-up sources both in LLM-generated content and in a Wikipedia article can help them grasp the core issue here.

This phenomenon is further complicated by the fact that many organizations or individuals have a common underlying agenda, and so can (and do) create networks of web pages and articles in a parallel ecosystem of knowledge that is internally consistent, but is not consistent with the global understanding of fact. One of the authors is particularly well-known for joining flat-earth groups and annoying them with such global facts until they ban him.[12] In doing so, he learned that there is an awful lot of self-referential “evidence” that creates echo chambers of this kind. An LLM or a wayward student can get sucked into these types of rabbit holes. One possible assignment that may help students understand this is to have them write without AI in response to a prompt that you provide, and then and only then give the same prompt to a chatbot. The final part of the assignment is for the student to compare the two products and evaluate the quality of sources used, the accuracy and veracity of information included, and the clarity of expression. Even if the LLM does better in the last area, most students will be able to grasp that information verified by a human, even if expressed in a less polished way, is preferable to a polished statement that cannot be trusted.

You can also take time to explore and evaluate with students what an AI does to their own work. In this assignment style, each student completes an assignment and then gives it to an LLM with a request to improve it. The request may be specific and detailed or vague and general. You may decide to require students to all give the LLM the same instructions or allow students to decide for themselves. Either way, once the LLM takes their work and modifies it, then the fun can really begin. Students should then be tasked with evaluating the differences between their original and the modified version. Did the AI make it better? If so, in what way? Could they themselves have achieved the same result? If so, how? If students were free to use different prompts, then students can compare how they went about the process if some had more satisfactory results than others.

The intent of this assignment is to emphasize to students, that no matter what their instinct, they can produce better writing. Doing anything better is simply a matter of time and effort. If they do not write well, it is because they have not invested the time and effort to become better writers. They still can. This same point applies to other areas equally. We tend to think that we are “not good at math” or “not good at languages” when we may not have been taught well how to do so or may not have been motivated to work on these skills. Other people may have been exposed to music or languages or whatever else early enough that now it looks like they are just good at these things, rather than merely having had a head start. Point out to students that they are all reasonably fluent in at least one language. All it took was time and practice.

A variation of this assignment in a writing class is to ask an LLM to improve an excerpt from their favorite novel rather than their own writing. (This approach also uses a low-stakes progression for practicing constructive criticism that was a theme in another assignment.) I did this with a very famous work of young adult fiction. I won’t say which one, just in case the author were to read this while in a litigious mood. There were a lot of really interesting things that might be gleaned from the difference between the human-created version and the LLM revision. In at least one instance, I thought there was a genuine improvement over the original. In another instance, the change led to the text giving a significantly different impression than the original, and not in a good way. For the most part, the changes were completely unnecessary, merely different. That makes it very easy to compare an original and revised document and know that AI has been at work.

An LLM makes changes simply because it has been told to do so. It has no idea what constitutes an “improvement” and no ability to recognize that a word change that might be fabulous in one particular genre might be entirely out of place in another. We recommend that you don’t tell students all this when they first start the assignment. The best way for them to learn about AI is for them to closely examine and evaluate what it does for themselves. Ask them to report on their aesthetic judgment on what an AI did to the writing of their favorite author. Also require them to discuss what precisely made changes better, worse, or neutral. This will help them develop their skills both as reviewers of the work of others and as revisers of their own compositions.

One other skill that it is possible to hone in the context of this and several of the assignments included in this chapter is cultivating the ability to recognize AI writing. LLMs are imitating human speech and have a lot of examples to guide them. This is why they do it so well. Nevertheless, there are features that at least suggest AI involvement. Having students engage with and evaluate a range of sources with a mix of human and AI authors will help them to be appropriately impressed with what it does, and at the same time develop caution, because when they read things online, they may assume they are hearing a human perspective when they are not. Some of us were initially fooled quite often by AI-generated fake photos that showed up on social media. Many of us are now becoming quite adept at spotting their telltale characteristics. It is rather like special effects. Things that seemed impressive a couple of decades ago today seem dated and unrealistic. Recognizing LLM-generated text is something that is mathematically impossible to automate, but it is a skill that human beings most likely can develop. Working on this skill helps highlight yet another element of human superiority.

An assignment that asks students to conduct internet searches, quote the knowledge panel, and evaluate it is another type of activity that can be a helpful way to get students to learn content and research skills while engaging critically with an AI system. There are countless variations on this assignment that you can undoubtedly come up with that will fit your own specific course needs. What they share in common is asking students to exercise that skill known as information literacy. This should involve more than mechanically applying the CRAP test, which has its limitations.[13] Students needs to understand how intuition plays a role and how it can mislead. They need to develop habits of asking about an individual’s credentials and profession, and which ones indicate expertise in a specific area. Provide students of examples of individuals who are well qualified in their area of expertise making off-target comments in an area in which they lack expertise. Students also need to be introduced to concepts like peer review and consensus, so that they understand why the very latest article on a topic may not turn out to be persuasive, and how to judge when most experts agree on something. You can focus on evaluating these skills, while also ensuring that students are absorbing the relevant content area knowledge they need to as well.

Stump the Chatbot

If all students can be presumed to have ready access to an LLM, then you can incorporate it as a conversation partner. In this assignment, you will require students to interact with the chatbot on a particular topic, with a hidden challenge that they know about, but the LLM does not. Their real mission is to get the chatbot to slip up, to say something that isn’t correct or is inconsistent with what it has told the student previously.[14] You can offer bonus points if they can persuade the LLM to admit that it was wrong, though ChatGPT readily offers profuse apologies, so that might be too easy. The learning aim behind this assignment is that students need to know content well enough and be able to look things up from reliable sources well enough, to be able to tell when the AI’s output is correct and when it is not. What is unique about this assignment’s construction is its interactive nature, not too unlike a conversation that a student might have with a peer. Depending on the topic, the LLM may possess knowledge that surpasses that of the student. If peer-group learning works among humans, it is fair to try it with an LLM as well. Simply talking about content has been shown to have significant learning benefits.

Ironically, this assignment introduces the possibility that a student might fabricate text themselves and attribute it to ChatGPT in order to fulfill the requirement of the assignment quickly. I bet you didn’t see that one coming when you started reading this book: students submitting their own work pretending it is AI-generated! Pedagogically, this activity is very similar to “Evaluating what an AI has done” assignment where students evaluate LLM-generated content. The difference here is that we have introduced yet another gamification construct – that of competitive fun. In fact, an LLM is a relatively difficult opponent. A human will admit that they do not know the answer when they are stumped (or at least we ought to). An LLM will happily fabricate away until it is caught. For a student to know when they have prompted an LLM into making an error, they will need to be or become familiar with the content area of the course.

However, we caution the educator about using this technique in more introductory courses, or for assignments where the skills being learned are of a more mechanistic character. As you may recall, LLMs perform best on introductory-level content. It might take significant time, and entering into details, for students to be able to get the LLM to reach the limit of its ability in relation to such topics. The LLM could turn into an end-game raid boss that takes the work of an entire semester (or class of students, or both) to break. Most likely, the underlying neural network will lose its contextual focus over time and slip up and make a mistake. There is no way for us to generalize about how long such an activity could potentially last, nor how well any group of students may rise to the challenge, but as you experiment with implementing this activity you will find out, keep your students engaged, see evidence of their learning, and lead them to a better understanding of LLMs.

Reverse-Engineering LLM Output (or Seeing How the Sausage is Made)

Another assignment is to ask students to determine what content led to the output an LLM provided. The Google Search AI knowledge panel provides links to where it obtained the information. However, an LLM does not do that since it reflects patterns in a much larger text base. That does not mean, however, that students cannot search for and find the kinds of sources that were likely referenced to get the AI-generated answer. Students can search for those sources directly or even query the LLM for the list of sources it used to get those answers. Either way, they should be required to evaluate the sources that they find or are directed to.

The point is not to identify the precise sources the LLM used. After all, it likely used all relevant online sources and perhaps more. Rather, the point is to have students articulate what is involved in creating content of this source. What would a human being need to know? How would you get from the kinds of sources available to this summary? Students may also usefully be asked what they would have done differently with the sources they found, and how they would have presented the information in a more useful, interesting, and engaging way. In the process, students are engaged in both the evaluation of sources and their own original research.

Grading Student Prompts

Educator Mike Kentz uses the interesting activity of requiring students to get an LLM to generate a suitable output, within a specified limited number of prompts. He then grades their prompts.[15] What he found is that students faced with these constraints will think long and hard about their prompts, will revise and edit them carefully, and in the process will invest time and effort into their writing. This activity encourages students to hone the same skills as if they were writing the short essay that the LLM will be made to produce. This sort of assignment can also be valuable because it shows students just how difficult it can be to get an AI to do precisely what you want it to do. Certainly, if you ask it to write about George Washington, it will. However, if you try to get it to revise its original output so as to conform to precise specifications, your results will vary. The point in this assignment is that you get students to understand technology and learn to use it effectively while seeing and grading what they do.[16]


  1. This structure is a callback to the gamification conversation we had earlier, and how such methods could succeed by adding “fun.”
  2. This latter point is particularly applicable when a humanities professor is working with a student for whom the course is in some way a “reach” for them, but the technique applies even for major students.
  3. Consider how difficult it was for Haley in Modern Family (the "Queer Eyes, Full Hearts" episode) to criticize the fashion designer’s assistant’s style. She eventually did it (and was wonderfully savage as well), and this event gave her confidence for the rest of the show.
  4. You can read the full exchange on the humanities author’s Patheos blog. See James McGrath, “ChatGPT on Mary Magdalene,” ReligionProf (blog), March 8, 2023, https://www.patheos.com/blogs/religionprof/2023/02/chatgpt-on-mary-magdalene.html.
  5. Interestingly, if you’re looking for inspiration for your next book, look no further than ChatGPT: the creator of compelling titles, where the content is just waiting for you to write it!
  6. On brand for this book, and a story we would definitely read if anyone were to write it.
  7. Indeed, if one were to believe Ed Sheeran, any pop song can be played with just four chords. You can hear him actually do it here to defend himself during his copyright lawsuit: https://www.youtube.com/watch?v=NcCKlsTgjeM and a more fun version of his defense is here https://www.unilad.com/music/news/ed-sheeran-four-chords-pop-song-theory-824786-20240513
  8. The hunger theme continues. Writing a book takes longer than the time between consecutive meals, it seems.
  9. See, for instance, the BBC reporting on the topic: Alex Binley, “KFC Apologises After German Kristallnacht Promotion,” BBC, November 10, 2022, https://www.bbc.com/news/world-europe-63499057.
  10. The Wikipedia article about Wikipedians provides some indication of who the editors claim to be, but the fact that their precise identity and expertise in relation to the pages they edit cannot be verified is the point. Explaining this will help students use Wikipedia wisely, as a first stop that can lead them on to verified sources. (Though we should have expected this, it did not occur to at least one of the authors that Wikipedia had a Wikipedia page about itself.)
  11. For this reason, I encourage students to become Wikipedia contributors rather than mere consumers, and I sometimes assign this as an activity for which they can earn points. Note as well that, whereas editors constantly edit and update Wikipedia in an effort to improve it, snapshots of its earlier content are reproduced in many places online, and those are not updated when the main Wikipedia site is.
  12. Said author is amused that the word global serves double duty in this case. As he often likes to say, flat earthers have nothing to fear but sphere itself.
  13. In case you thought that was just us emphasizing with all caps and not an acronym, let us specify that this is a real test and CRAP stands for currentness, relevance, authority, and perspective (or purpose or point of view). The original CRAP test was developed by Molly Beestrum at Dominican University, which now has updated recommendations for fact checking and evaluating sources. Check out their libguide here: https://research.dom.edu/evaluatinginfo. Gigerenzer, How to Stay Smart in a Smart World, pp.219-223 also points out that modern fact checking practices, especially leaving a site to research its funding and affiliation, is crucial to the process.
  14. It may not be necessary to clarify for humanities educators, but being wrong and being inconsistent are not the same kind of error.
  15. Mike Kentz, “The AI Era Demands Curriculum Redesign: Stories from the Frontlines of Change,” AI EduPathways (blog), January 5, 2025, https://mikekentz.substack.com/p/the-ai-era-demands-curriculum-redesign. I am grateful to Stephen Garner for drawing it to my attention.
  16. For some additional writing-focused assignments see open access online book Annette Vee, Tim Laquintano, & Carly Schnitzler (Eds.) TextGenEd: Teaching with Text Generation Technologies. The WAC Clearinghouse. 2023. https://doi.org/10.37514/TWR-J.2023.1.1.02. Another useful book containing assignments that integrate use of ChatGPT is Kevin Lee, Kirby Whittington, Erin Doggette, and Laurie Uttich, "ChatGPT Assignments to Use in Your Classroom Today" (2023). UCF Created OER Works. 8. https://stars.library.ucf.edu/oer/8

License

Icon for the Creative Commons Attribution-NoDerivatives 4.0 International License

Real Intelligence: Teaching in the Era of Generative AI by James F. McGrath and Ankur Gupta is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License, except where otherwise noted.