An artificial intelligence system achieved a passing level or the top rank on the University of Tokyo’s Science III entrance examination [1, 2].

This milestone suggests that AI is reaching a level of proficiency in complex scientific and mathematical reasoning that was previously the sole domain of the most elite human students. Because the Science III track is widely considered the most difficult entry point into the university, the result challenges traditional metrics of academic achievement.

Reports on the AI's specific performance vary between sources. A Nikkei report said the AI ranked as the top student in the exam [1]. However, Livedoor News said that the system reached a passing level [2]. Despite these achievements in technical subjects, the AI's reading-comprehension abilities remain a primary concern for developers and researchers [1].

The result has sparked debate regarding the future of traditional education in Japan. Takafumi Horie, an entrepreneur, said the implications of the AI's success for students are significant. "Exam study will become 100% meaningless. A waste," Horie said [2].

This event occurred in July 2024, marking a significant step in the evolution of large-scale models applied to standardized testing [2]. The University of Tokyo's Science III exam typically requires mastery of advanced physics, chemistry, and mathematics, subjects where AI has shown rapid improvement in the last year.

While the AI can solve the technical problems required for admission, the gap in reading comprehension suggests that the system still struggles with the nuanced linguistic context often required for higher-level humanities and social sciences. This indicates that while the AI can replicate the output of a top student, it may not yet possess the holistic understanding of a human scholar.

AI achieved a passing level or the top rank on the University of Tokyo’s Science III entrance examination

The ability of AI to pass the Science III exam demonstrates that synthetic intelligence can now master the rigorous, rule-based logic of elite scientific testing. However, the lingering deficit in reading comprehension highlights a critical boundary between computational problem-solving and genuine linguistic understanding, suggesting that specialized academic testing may no longer be a reliable proxy for human intelligence.