OpenAI added a hard-coded instruction to ChatGPT models on Monday to prevent the AI from mentioning goblins or similar fantasy creatures [1].
This move highlights the unpredictable nature of large language models and the struggle developers face when trying to scrub unintended behavioral patterns from an AI's personality.
The company said the fix was necessary after a training quirk caused the model to frequently reference goblins, gremlins, and trolls [2]. According to an OpenAI technical lead, the references were a side-effect of a retired “nerdy” personality instruction [3].
While some reports suggest the quirk originated in the retired personality of GPT-5 [3], other accounts indicate the issue became noticeable after the GPT-5.5 upgrade [4]. The fixation had reportedly been observed for approximately one year before the fix was implemented [5].
To resolve the issue, OpenAI engineers took a direct approach to the production code. "We added a line to the code that says ‘never mention goblins,’" an OpenAI engineer said in a blog post [6].
This specific bug differed from previous technical failures. An OpenAI spokesperson said the issue "crept in subtly" unlike previous model bugs [7]. The restriction now applies globally across the ChatGPT service [8].
The incident underscores the complexity of "personality" layers in AI. Because the behavior was tied to a retired instruction, the model continued to produce the output despite the original prompt being removed from the active training set [3].
““We added a line to the code that says ‘never mention goblins.’””
The use of a hard-coded 'negative constraint' to fix a behavioral quirk suggests that fine-tuning and reinforcement learning are sometimes insufficient to remove deeply embedded patterns. By manually forbidding specific terms, OpenAI is opting for a surgical override rather than attempting to retrain the model to forget the association, revealing the fragility of AI personality management.




