Developing and Validating AI Algorithms in Prostate MRI – An Evolving Journey 


Prostate cancer remains one of the most challenging diseases for radiologists to detect and diagnose accurately. The task becomes even more daunting with its multifaceted nature and the intricacies of interpreting multiparametric MRI (mpMRI) images 

This blog examines the complexities of developing and validating AI algorithms in prostate MRI to detect prostate cancer, exploring the challenges and best practices in this evolving journey. 


Understanding the complexity of prostate cancer detection 

Prostate cancer detection poses unique challenges due to the variability in presentation and the diverse anatomical structures within the prostate gland. Radiologists must navigate through multi-sequence MRI images, distinguishing between benign and malignant lesions, particularly in the transition zone (TZ), where cancers can be elusive and easily misclassified with benign hyperplasic nodules, resulting in false positives, which would then undergo an unnecessary biopsy. 

Such a challenge is exemplified by radiologists’ clinical performance when detecting csPCa, with up to 20% of csPCa+ patients often not being identified and false positive rates of up to 70%. Equally important is the high inter-reader variability, exemplified by a wide range in Positive Predictive Values (PPV) for PI-RADS scores >= 3 lesions when compared against biopsy outcomes.  

A key factor contributing to this challenge is the heterogeneity of acquisition protocols in prostate MRI. Before the creation of the PI-RADS guidelines, there was no consensus on the recommended acquisition protocols for an efficient prostate MRI read, describing the needed sequences and parameters to be assessed. PI-RADS was born to bring a single ground truth describing best practices for prostate MRI acquisition T2-weighted sequences captured in various acquisition planes can be considered the protocol’s core. These sequences are instrumental in providing crucial anatomical information that is particularly relevant in the transitional zone (TZ). However, assessing the peripheral zone (PZ) heavily relies on the findings derived from the diffusion-weighted imaging (DWI) sequence. DWI offers insights into the movement of water molecules, which can be indicative of various pathologies, including prostate cancer (PCa). 

According to PI-RADS 2.1, while T2w and DWI sequences take precedence in the evaluation process, the dynamic contrast-enhanced sequence (DCE) also plays an important role in specific scenarios. Additionally, supplementary series are often acquired to ease the diagnostic process. These may include a T1-weighted sequence, utilized for detecting metastases or biopsy-induced bleeding, and a full pelvis DWI, employed to uncover any incidental findings in prostate adjacent tissues. 


Developing AI-based CAD software in prostate MRI  

The lack of qualified body radiologists who can confidently report prostate MRI, the high inter-reader variability, as well as the potential for improvement in both sensitivity and specificity for radiologists have created an opportunity space for CAD devices to support radiologists in their clinical routine.  

The development of these devices, especially when leveraging AI-based models, is equally challenging and requires careful consideration by their manufacturers.  


Source of ground truth 

There are 3 main sources of ground truth to be used when developing and validating AI-based models in the detection of csPCa, namely PI-RADS annotations, biopsy outcomes, and radical prostatectomy.  


PI-RADS annotations 

PI-RADS represent the most readily available ground truth to obtain, as these scores are manually given by a radiologist when reporting a prostate MRI study. However, there are 2 key shortcomings on using PI-RADS scores as ground truth for training and validation purposes: 

    • PI-RADS score suffers from a high inter-reader variability. Inexperienced radiologists tend to overcall benign findings as PIRADS>3, thus heavily influencing the ground truth reference point depending on the reporting radiologist. Independent Review Panels (IRPs) have been suggested as a control measure, where a 2+1 approach is usually followed (e.g., 2 radiologists independently evaluate the study, and a 3rd one reviews it only if there are discrepancies between the former two). Despite the incorporation of measures to alleviate the inter-reader variability, this process can often be timeconsuming yet still fail at reducing the existing biases within the reporting radiologist cohort.  
    • PI-RADS scores are not confirmed csPCa outcomes. The radiological assessment from the reporting radiologist is an indication of their confidence in a given lesion being csPCa. However, some PI-RADS 5 lesions turn out to be non-csPCa, as well as lesions not identified by the radiologist manifest as csPCa upon systematic biopsy. Thus, using PI-RADS scores to determine patient outcomes presents a suboptimal strategy.  

These factors collectively present a formidable challenge when relying solely on radiology as the ground truth. 


Biopsy outcomes 

Using biopsy outcomes helps overcome the limitations outlined with PIRADS scores. By defining a csPCa lesion with an associated Gleason score equal or higher than 7, the inter-reader variability as well as the lack of definitive diagnosis are greatly minimized.  

However, prostate biopsies are usually performed by extracting a very small sample of tissue across evenly distributed regions of the prostate (systematic biopsy) as well as in those regions where a radiologist identified a concerning lesion (targeted biopsy).

The match between the radiological finding and the tissue extracted significantly depends on the urologist’s ability to detect the lesion accurately in the real-time ultrasound image, often leading to undersampling the desired region. New techniques, including MRI+US fusion biopsy procedures, have tremendously helped overcome this challenge, though their adoption remains limited and is usually confined to patients with a targeted biopsy.  

Another challenge arising from the use of biopsy outcomes as ground truth is the lack of biopsy data on patients for whom a biopsy was not clinically warranted upon radiological review. Automatically assuming those cases to be negative risks introducing confirmation bias, and further clinical followup is needed before being able to consider them true negatives. 


Radical prostatectomy 

Radical prostatectomy overcomes the lack of spatial resolution present in biopsy outcomes. When the whole prostatic gland is removed and histologically assessed, a full 3D picture at a pathological level is obtained, thus being able to directly match the csPCa regions for training and validation.  

Similarly to using biopsy outcomes, the main challenge with this approach relies on the lack of radical prostatectomies performed compared to the number of MRIs. Despite being the best scientific standard, it would not be ethical to perform this procedure on every patient solely for a validation study. Thus, its use remains limited due to the small sample size available and the inherent selection bias in this population.

Mesa de trabajo 2 copia


Clinical input  

Demonstrating safety and effectiveness through robust training and clinical validation is critical to ensuring the success of these products. However, the interaction between the device output and the radiologist and its integration in their clinical workflow are often overlooked elements deserving of equal importance. 

Clinical input plays a pivotal role in developing and validating AI algorithms. Collaborating with healthcare professionals ensures that algorithmic design choices align with clinical workflows and decision-making processes.  

Usability testing and human factors engineering ensure that the device’s output is correctly understood by the user, intuitive to use, and clear about the device’s capabilities and limitations.  

Moreover, involving clinicians in algorithm development and validation facilitates relevant subgroup analyses, allowing for tailored approaches to patient care. 

Developing and validating AI algorithms for prostate cancer MRI is a multifaceted journey. By acknowledging the challenges inherent in prostate cancer diagnosis, addressing the limitations of current methodologies, and incorporating clinical expertise into algorithm development and validation, we can pave the way for more accurate and reliable diagnostic tools. As technology continues to evolve, so too will our ability to combat prostate cancer, ultimately improving patient outcomes and quality of life.  


Quibim’s decision 

At Quibim we like to embrace the hard stuff, by handling hard better. That is precisely why we took the decision to use biopsy outcomes as ground truth. Our aim is to have a positive impact in patient care, by detecting clinically significant cancer. The outputs of AI need to be actionable for them to be useful and avoid and overburden in the healthcare system. An AI model for csPCa detection trained on pathology, allows for a high sensitivity, low false positive rate (FPR) per case, and high negative predictive value. These tools empower radiologists to standardize their reporting criteria, reducing dependency on individual experience levels. Over time, they will become as integral to diagnosis as a car’s adaptive cruise control, potentially even seen as unsafe to practice without by default.