An Ontario audit found that AI note-taking tools used by doctors frequently generate incorrect, incomplete, or fabricated information [1].

The findings raise concerns about patient safety and the reliability of clinical records. If AI-generated hallucinations enter a patient's permanent medical file, it could lead to incorrect diagnoses or treatments.

Auditor General Shelley Spence said artificial intelligence note-taking tools intended for use by Ontario doctors provided incorrect and incomplete information or demonstrated "hallucinations," and were not evaluated adequately [3]. The report noted that all AI scribe systems from the 20 approved vendors showed one or more inaccuracies at the procurement testing phase [2].

The audit concluded that these transcription tools were not adequately evaluated during the procurement process [1]. This lack of oversight allowed tools with known errors to be deployed in clinical settings [4].

Despite the audit's warnings, some medical professionals view the technology as a net positive. Ontario doctors said AI scribes are helpful and save time [1]. This creates a tension between the operational efficiency praised by clinicians and the data integrity requirements demanded by the auditor general.

Some doctors have pushed back against the audit's conclusions. They said the auditor general missed a crucial quality-control step in the assessment of how these tools are used in real-world practice [1].

The report was presented at 11 a.m. on Saturday [3]. It highlights a systemic gap in how the provincial government vets emerging technology before it reaches the bedside.

All AI scribe systems from the 20 approved vendors showed one or more inaccuracies at the procurement testing phase.

This conflict illustrates the growing gap between the rapid adoption of generative AI for administrative efficiency and the rigorous validation required for medical accuracy. While doctors prioritize reducing burnout through time-saving tools, the Auditor General's findings suggest that the procurement process failed to ensure these tools meet clinical safety standards, potentially shifting the burden of error-correction entirely onto the physician.