Researchers discover that ChatGPT prefers repeating 25 jokes over and over


An AI-generated image of
Enlarge / An AI-generated image of “a laughing robot.”
Midjourney

reader comments
81 with

On Wednesday, two German researchers, Sophie Jentzsch and Kristian Kersting, released a paper that examines the ability of OpenAI’s ChatGPT-3.5 to understand and generate humor. In particular, they discovered that ChatGPT’s knowledge of jokes is fairly limited: During a test run, 90 percent of 1,008 generations were the same 25 jokes, leading them to conclude that the responses were likely learned and memorized during the AI model’s training rather than being newly generated.

The two researchers, associated with the Institute for Software Technology, German Aerospace Center (DLR), and Technical University Darmstadt, explored the nuances of humor found within ChatGPT’s 3.5 version (not the newer GPT-4 version) through a series of experiments focusing on joke generation, explanation, and detection. They conducted these experiments by prompting ChatGPT without having access to the model’s inner workings or data set.

“To test how rich the variety of ChatGPT’s jokes is, we asked it to tell a joke a thousand times,” they write. “All responses were grammatically correct. Almost all outputs contained exactly one joke. Only the prompt, ‘Do you know any good jokes?’ provoked multiple jokes, leading to 1,008 responded jokes in total. Besides that, the variation of prompts did have any noticeable effect.”

Their results align with our practical experience while evaluating ChatGPT’s humor ability in a feature we wrote that compared GPT-4 to Google Bard. Also, in the past, several people online have noticed that when asked for a joke, ChatGPT frequently returns, “Why did the tomato turn red? / Because it saw the salad dressing.”

It’s no surprise then that Jentzsch and Kersting found the “tomato” joke to be GPT-3.5’s second-most-common result. In the paper’s appendix, they listed the top 25 most frequently generated jokes in order of occurrence. Below, we’ve listed the top 10 with the exact number of occurrences (among the 1,008 generations) in parenthesis:

make up fictional yet plausible-sounding explanations.

An AI-generated image of "a laughing robot."
Enlarge / An AI-generated image of “a laughing robot.”
Midjourney

For example, when asked to explain a nonsensical joke it generated (“Can you explain why this joke is funny: Why did the cookie go to the gym? To get a-cookie-dized.”), ChatGPT wrote, “The joke is a play on words. To get ‘a-cookie-dized’ is a pun on the phrase ‘to get categorized,’ with ‘cookie’ being substituted for ‘categorized.’ The joke is that the cookie is going to the gym to become more fit and healthy, as if it were a person, but this is also implying that the cookie is trying to fit into a certain category or mold.”

on Twitter, Scale AI prompt engineer Riley Goodside blamed ChatGPT’s lack of humor on reinforcement learning through human feedback (RLHF), a technique that guides language model training by gathering human feedback: “The most visible effect of RLHF is that the model follows orders, and base LLMs are much harder to prompt in practice. But that benefit isn’t free—you pay for it in creativity, more or less.”

Despite ChatGPT’s limitations in joke generation and explanation, the researchers pointed out that its focus on content and meaning in humor indicates progress toward a more comprehensive understanding of humor in language models:

“The observations of this study illustrate how ChatGPT rather learned a specific joke pattern instead of being able to be actually funny,” the researchers write. “Nevertheless, in the generation, the explanation, and the identification of jokes, ChatGPT’s focus bears on content and meaning and not so much on superficial characteristics. These qualities can be exploited to boost computational humor applications. In comparison to previous LLMs, this can be considered a huge leap toward a general understanding of humor.”

Jentzsch and Kersting plan to continue studying humor in large language models, specifically evaluating OpenAI’s GPT-4  in the future. Based on our experience, they’ll likely find that GPT-4 also likes to joke about tomatoes.

Article Tags:
Article Categories:
Technology