Eight algorithms tested on 1.2M retinal images in large-scale evaluation.
Artificial intelligence (AI) may be built on code, but this time it was put through a very human trial: proving its fairness, accuracy and trustworthiness inside the world’s first real-world, head-to-head testing system for diabetic eye screening.
A landmark study published in The Lancet Digital Health shows that eight commercial AI algorithms performed as well as, and in some cases better than, human graders on 1.2 million retinal images from the UK’s National Health Service (NHS) diabetic eye screening program.* The evaluation marks one of the most ambitious attempts yet to test whether AI can safely support national screening at scale, particularly across diverse populations.
A team from City St George’s, University of London, Moorfields Eye Hospital, Kingston University and Homerton Healthcare NHS Trust built the platform with an unusually simple goal: leveling the playing field. Every vendor faced the same images, the same rules and no behind-the-scenes tuning.
An arena for competing algorithms
Twenty-five CE-marked companies were invited to participate and eight accepted. Their AI systems were connected to a trusted research environment where they could analyze images but never access human grading data. It was a clean test environment, designed to prevent shortcuts and keep every algorithm honest.
The dataset spanned 202,886 screening visits from the North East London Diabetic Eye Screening Program, including images from 32% white, 17% Black and 39% South Asian participants.* In ophthalmology, diversity is not decoration. It is the only way to know whether an algorithm will behave consistently across a population as varied as the one served by the NHS.
“Our revolutionary platform delivers the world’s first fair, equitable and transparent evaluation of AI systems to detect sight-threatening diabetic eye disease,” said Prof. Alicja Rudnicka in a press release. “We have shown that these AI systems are safe for use in the NHS by using enormous data sets, and most importantly, showing that they work well across different ethnicities and age groups.”
READ MORE: Retinal Imaging and AI-Powered Analysis in the Palm of Your Hand
Accuracy that keeps pace with busy clinics
Across the eight algorithms, accuracy for detecting diabetic eye disease (that may need clinical intervention) ranged from 83.7% to 98.7%. Moderate-to-severe disease accuracy reached 96.7% to 99.8% and accuracy for proliferative disease was 95.8% to 99.5%.*
In comparison, previously published NHS work reported human accuracy ranging from 75% to 98%. The algorithms landed squarely within that range (sometimes above it) and did so at a pace no human could match.
Processing time per patient ranged from 240 milliseconds to 45 seconds, while human graders may spend up to 20 minutes reviewing a full screening case. At the scale of millions of images a year, that time savings stops being a convenience and starts becoming extra clinical bandwidth.
The platform also measured false-positive rates, confirming that accuracy alone was not the only metric that mattered. Performance stayed consistent across ethnic groups, a key test for any medical AI system in a diverse country like the UK.
“This groundbreaking study sets a new benchmark by rigorously testing AI systems to detect sight-threatening diabetic eye disease before potential mass rollout,” said co-principal investigator Adnan Tufail of Moorfields Eye Hospital.
Equity is not automatic, so the study measured it
The UK has learned the hard way that technology can behave differently depending on who is sitting in front of it. Pulse oximeters, for example, have shown lower accuracy in people with darker skin, prompting national scrutiny of medical device equity.
More than four million people in the UK live with diabetes, and each requires regular retinal imaging.* Any tool placed in that workflow carries real consequences, especially if accuracy varies by skin tone, age or disease severity. The multiethnic nature of the North East London dataset allowed the researchers to evaluate equity directly rather than assume it.
The algorithms showed consistent performance across demographic groups. In a field where technology can drift when the patient changes, this result matters as much as any accuracy percentage.
A blueprint for national AI infrastructure
These researchers envision a future where the NHS moves from local pilots to a central AI backbone. In their proposal, approved algorithms would be hosted on a common platform that screening sites could access through secure uploads. AI-generated results would integrate directly into electronic health records, removing the need for individual trusts to build their own infrastructure.
Centralization could standardize quality, reduce implementation costs, and allow new algorithms to be evaluated and deployed consistently as the evidence base evolves. Beyond diabetic eye screening, the same backbone could serve as a template for other high-burden imaging specialties, including cancer screening and cardiology.
READ MORE: PPP Guidelines for Diabetic Retinopathy at AAO 2024
A turning point for diabetic eye screening
The evaluation shows that AI can support the NHS diabetic eye screening program at the scale needed to keep pace with rising demand. A transparent platform that tests accuracy, equity and safety in one place gives the health system a rare chance to adopt AI with both confidence and accountability.
If expanded nationally, this approach could reduce workloads, shorten diagnostic timelines and help ensure that sight-threatening disease is detected earlier, whether a patient lives in Birmingham or Barking. For a program that screens millions and carries heavy consequences for late detection, the implications stretch well beyond efficiency.
*Rudnicka AR, Shakespeare R, Chambers R, et al. Automated retinal image analysis systems to triage for grading of diabetic retinopathy: A large-scale, open-label, national screening program in England. Lancet Digit Health. 2025:100914. .
Editor’s Note: This newsroom synopsis is based on the study and press release from City St George’s, University of London, and the related publication in The Lancet Digital Health. This content is intended exclusively for healthcare professionals. It is not intended for the general public. Products or therapies discussed may not be registered or approved in all jurisdictions, including Singapore.