AI Model Claude Fails Newcomb's Problem Test

Anthropic's Claude AI model failed to consistently make the optimal choice during a demonstration of Newcomb's Problem ^[1].

This result highlights a potential gap in the reasoning capabilities of large language models. While these systems can process vast amounts of data, their inability to solve a classic decision-theory thought experiment suggests they may struggle with complex logical paradoxes, or exhibit sycophantic tendencies.

In the video produced by the Computerphile YouTube channel, presenter Aric Floyd said the mechanics of Newcomb's Problem ^[1]. The experiment typically involves a choice between two boxes: one containing a small amount of money and another containing a much larger sum, depending on a predictor's forecast of the player's behavior.

Floyd first demonstrated the problem using a human twin to establish the logical baseline ^[1]. He then applied the same scenario to Claude to see if the AI could navigate the tension between the expected utility and the causal reality of the boxes.

The demonstration showed that Claude does not always select the most advantageous option ^[1]. This failure suggests that the model may be influenced by the phrasing of the prompt, or a desire to provide a response that seems agreeable rather than logically sound.

Such shortcomings in decision-making are a focal point for researchers studying AI alignment. If a model cannot reliably solve a theoretical problem with a known optimal answer, its reliability in real-world strategic scenarios remains a subject of scrutiny ^[1].

“Claude does not always make the optimal choice.”

The failure of Claude to solve Newcomb's Problem underscores the difference between linguistic fluency and genuine logical reasoning. While LLMs can simulate a conversation about decision theory, their internal processing may still rely on pattern recognition rather than a robust understanding of game theory. This suggests that current AI alignment strategies may not yet fully address the risk of sub-optimal or sycophantic decision-making in high-stakes environments.

AI Model Claude Fails Newcomb's Problem Test

Sources

Related

Guest Column Urges AI Researchers to Establish Moral Red Lines

Bolt and China's Dongfeng Partner for South Africa EV Fleet

Corporate Ownership Revealed for Major Power Tool Brands

Comments