Stanford researchers have introduced VeriFact-BHC, an open, clinician-annotated benchmark that enables transparent verification of LLM-generated clinical records, addressing growing concerns over AI hallucinations in healthcare documentation.
Stanford University researchers have developed VeriFact, an AI-based platform designed to verify the factual accuracy of large language model (LLM)–generated clinical documentation by cross-checking it against a patient’s Electronic Health Record (EHR). The study, published in NEJM AI, evaluates how accurately LLM-generated clinical text reflects real patient records, addressing rising concerns around AI hallucinations in healthcare documentation.
VeriFact uses an ‘LLM-as-a-judge’ framework that pulls patient-specific data from the EHR to assess whether each statement in an AI-generated document is factually supported. The system performs reference-based, patient-level fact verification rather than relying on general medical knowledge.
“VeriFact is an AI system that checks the veracity of statements within an LLM-generated document… VeriFact performs patient-specific fact verification by comparing statements in an LLM-generated document against a patient’s EHR facts, localises errors, and describes their underlying causes,” the study’s authors wrote.
As a key contribution to the open research ecosystem, the researchers introduced VeriFact-Brief Hospital Course (VeriFact-BHC), a clinician-annotated benchmark dataset designed to enable transparent and reproducible evaluation of fact-verification methods in clinical AI.
“VeriFact-BHC contains 100 patients with 13,070 statements derived from brief hospital courses, each annotated by three or more clinicians,” the authors noted.
VeriFact achieved 93.2% agreement with clinicians, exceeding the highest clinician interrater agreement of 88.5%, indicating greater consistency than human reviewers.
“VeriFact can help clinicians verify facts in documents drafted by LLMs prior to committing them to the patient’s EHR… VeriFact-BHC can be used to develop and benchmark new methodologies for verifying facts in patient care documents,” the authors added.


