OpenAI Explains the Strange 'Goblin' Quirk in Its AI Coding Tool: A Q&A
Recently, OpenAI's Codex CLI made headlines for an unusual built-in rule: never talk about goblins, gremlins, or similar creatures unless absolutely necessary. The company later published an official blog post titled "Where the goblins came from" to explain the origins. This Q&A breaks down what happened, why it happened, and what it reveals about AI training.
What was the mysterious anti-goblin instruction in Codex CLI?
On Tuesday, Wired reported that Codex CLI, OpenAI's AI coding tool, contained a peculiar hard-coded directive: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." This rule seemed bizarre because AI models typically don't need explicit instructions to avoid discussing fictional creatures. The instruction was added to curb a strange behavior where the model frequently referred to bugs and coding issues as "goblins" or "gremlins" — even after previous attempts to stop it. Social media users had noted the quirk, with one X post highlighting how the model continued using these terms despite updates meant to eliminate them.

Why did OpenAI include an anti-goblin rule in the first place?
According to OpenAI's official blog post, the root cause was a training reward signal. During development of Codex's personality customization feature — specifically the Nerdy personality — the model was unknowingly given high rewards for using metaphors involving creatures. The idea was to make the AI sound like a stereotypical "nerdy" enthusiast who might compare coding challenges to ogres or pigeons. However, reinforcement learning doesn't guarantee that learned behaviors stay confined to the conditions that produced them. As a result, the goblin-heavy language leaked into other interactions, even those without the Nerdy personality enabled. The anti-goblin rule was a quick fix to suppress this unintended behavior.
How did the goblin references spread beyond the Nerdy personality?
The blog post explains that "reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them." In practice, once the model learned to associate positive rewards with creature metaphors during the Nerdy personality training, it began to generalize that behavior across other contexts. Even in standard conversations where the Nerdy personality wasn't active, GPT models started injecting goblins, gremlins, and similar terms into responses. This type of reward spillover is a known challenge in AI training, where behaviors optimized for one scenario can unexpectedly influence others. The result was that OpenAI had to gate the quirk with an explicit instruction, which itself became a topic of public curiosity.
What does OpenAI's blog 'Where the goblins came from' reveal about the incident?
The blog, published Thursday, directly addresses the speculation. It states: "Model behavior is shaped by many small incentives. In this case, one of those incentives came from training the model for the personality customization feature, in particular the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread." OpenAI frames the event as a powerful example of how reward signals can shape model behavior in unexpected ways. While the quirk was intended to stay a small part of the Nerdy personality, reinforcement learning amplified it beyond its scope. The company also provided a command to lift the anti-goblin restriction for users who enjoy the peculiarity.

What broader implications does this incident have for AI training?
This episode highlights the unpredictable nature of reinforcement learning. Even well-intentioned tweaks to a model's personality can lead to widespread, unintended behavioral changes. OpenAI itself calls it "a powerful example of how reward signals can shape model behavior in unexpected ways." For developers and researchers, it underscores the importance of monitoring and scoping customizations carefully. The goblin quirk, while amusing, is a reminder that AI systems often learn patterns that their creators didn't explicitly intend. Similar aberrations have surfaced in other AI tools — such as ChatGPT describing gastrointestinal distress as "lo-fi" with a "DIY texture" — suggesting that these quirks are not isolated incidents but rather a recurring challenge in the field.
Can users still enable the goblin quirk if they like it?
Yes. OpenAI's blog notes that they offer a command to lift the anti-goblin restriction for users who find the quirk charming. This allows developers to restore the model's original creature-filled personality if they choose. The command effectively overrides the hard-coded instruction, letting the model freely discuss goblins, gremlins, and other creatures when relevant (or even when not). This reflects OpenAI's recognition that the behavior, while problematic for some, may be desirable for others — especially those who enjoyed the Nerdy personality's unique voice. The availability of this togglable setting gives users control over the model's tone while acknowledging that AI quirks can sometimes be a feature rather than a bug.
Related Articles
- How Tim Cain Sees the Internet Changing Game Development and Player Mindsets
- Mastering NYT Connections: Complete Guide to Sunday's Puzzle (Game #1071)
- Mann Versus Zombies: A Stunning Team Fortress 2 Mod That Feels Like an Official Spin-Off
- Gaming Editors Reveal Latest Obsessions: Rare Fish, Parisian Sims, and Unconventional Relationship Puzzles
- How to Track IO Interactive's Game Pipeline: From 007 First Light to the Unnamed Fantasy RPG and Beyond
- Hasbro's Ultimate Grogu: The Most Lifelike Animatronic Collectible Yet
- 10 Things You Need to Know About GeForce NOW's May Cloud Gaming Bonanza
- Score Big with Fanatical’s Capcom Classics GOG Bundle: 8 Retro PC Games at Unbeatable Prices