Wilmer Glaucoma ML
Exploring the intersection of glaucoma and machine learning to improve patient care.

Our lab focuses on developing and assessing machine learning (ML) techniques that strive to reduce the risk of glaucoma-related vision loss. We are an interdisciplinary group of faculty, research associates, and students from the Wilmer Eye Institute and the Malone Center for Engineering in Healthcare at Johns Hopkins.
Selected Publications
- Chris Bradley, Alex Pham, and Jithin YohannanAJO international Jul 2025
OBJECTIVE: Determine how sensitivities below the measurement floor of the Humphrey Field Analyzer change when transitioning from Swedish Interactive Thresholding Algorithm (SITA) Standard to SITA-Fast and SITA-Faster strategies. DESIGN: Retrospective descriptive study. PARTICIPANTS: A total of 21,468 24-2 SITA-Standard, 4872 SITA-Fast and 3468 SITA-Faster VFs from 7917 glaucoma and glaucoma suspect eyes with at least 5 VFs between 1997 and 2023 at the Wilmer Eye Institute. METHODS: At each test location of the 24-2 test pattern, we measured the probability that \textless0 dB at a given test location on two baseline SITA-Standard VFs was M dB or higher on the first SITA-Fast or SITA-Faster post-baseline VF for different values of M \textgreater 0. Results were compared to using the same test strategy for both baseline and post-baseline VFs. MAIN OUTCOME MEASURES: Probability of \textless0 dB at baseline being measured as M \textgreater 0 dB or higher on the first post-baseline VF. RESULTS: At M = 7 dB, which was approximately one standard deviation above the mean for post-baseline SITA-Standard sensitivities, average percent change from \textless0 dB across all test locations was 10.3 % for SITA-Standard, 15.8 % for SITA-Fast and 25.5 % for SITA-Faster. Percent change from \textless0 dB for all M tested (up to M = 20) was consistently higher near the macula compared to overall averages: on average 1.3 % higher for SITA-Standard, 1.5 % higher for SITA-Fast, and 6.3 % higher for SITA-Faster. CONCLUSIONS: Increased caution is advised when following the progression of \textless0 dB defects during a transition from SITA-Standard to SITA-Fast or SITA-Faster.
- Alex T. Pham, Annabelle A. Pan, Chris Bradley, and 5 more authorsTranslational Vision Science & Technology Aug 2024
PURPOSE: Compare the use of optic disc and macular optical coherence tomography measurements to predict glaucomatous visual field (VF) worsening. METHODS: Machine learning and statistical models were trained on 924 eyes (924 patients) with circumpapillary retinal nerve fiber layer (cp-RNFL) or ganglion cell inner plexiform layer (GC-IPL) thickness measurements. The probability of 24-2 VF worsening was predicted using both trend-based and event-based progression definitions of VF worsening. Additionally, the cp-RNFL and GC-IPL predictions were combined to produce a combined prediction. A held-out test set of 617 eyes was used to calculate the area under the curve (AUC) to compare cp-RNFL, GC-IPL, and combined predictions. RESULTS: The AUCs for cp-RNFL, GC-IPL, and combined predictions with the statistical and machine learning models were 0.72, 0.69, 0.73, and 0.78, 0.75, 0.81, respectively, when using trend-based analysis as ground truth. The differences in performance between the cp-RNFL, GC-IPL, and combined predictions were not statistically significant. AUCs were highest in glaucoma suspects using cp-RNFL predictions and highest in moderate/advanced glaucoma using GC-IPL predictions. The AUCs for the statistical and machine learning models were 0.63, 0.68, 0.69, and 0.72, 0.69, 0.73, respectively, when using event-based analysis. AUCs decreased with increasing disease severity for all predictions. CONCLUSIONS: cp-RNFL and GC-IPL similarly predicted VF worsening overall, but cp-RNFL performed best in early glaucoma stages and GC-IPL in later stages. Combining both did not enhance detection significantly. TRANSLATIONAL RELEVANCE: cp-RNFL best predicted trend-based 24-2 VF progression in early-stage disease, while GC-IPL best predicted progression in late-stage disease. Combining both features led to minimal improvement in predicting progression.
- Chris Bradley, Kaihua Hou, Patrick Herbert, and 5 more authorsPloS One Aug 2024
Linear regression of optical coherence tomography measurements of peripapillary retinal nerve fiber layer thickness is often used to detect glaucoma progression and forecast future disease course. However, current measurement frequencies suggest that clinicians often apply linear regression to a relatively small number of measurements (e.g., less than a handful). In this study, we estimate the accuracy of linear regression in predicting the next reliable measurement of average retinal nerve fiber layer thickness using Zeiss Cirrus optical coherence tomography measurements of average retinal nerve fiber layer thickness from a sample of 6,471 eyes with glaucoma or glaucoma-suspect status. Linear regression is compared to two null models: no glaucoma worsening, and worsening due to aging. Linear regression on the first M ≥ 2 measurements was significantly worse at predicting a reliable M+1st measurement for 2 ≤ M ≤ 6. This range was reduced to 2 ≤ M ≤ 5 when retinal nerve fiber layer thickness measurements were first "corrected" for scan quality. Simulations based on measurement frequencies in our sample-on average 393 ± 190 days between consecutive measurements-show that linear regression outperforms both null models when M ≥ 5 and the goal is to forecast moderate (75th percentile) worsening, and when M ≥ 3 for rapid (90th percentile) worsening. If linear regression is used to assess disease trajectory with a small number of measurements over short time periods (e.g., 1-2 years), as is often the case in clinical practice, the number of optical coherence tomography examinations needs to be increased.
- Patrick Herbert, Kaihua Hou, Chris Bradley, and 5 more authorsOphthalmol. Glaucoma Mar 2023
PURPOSE: Assess whether we can forecast future rapid visual field (VF) worsening using deep learning models (DLMs) trained on early VF, OCT, and clinical data. DESIGN: Retrospective cohort study. SUBJECTS: 4,536 eyes from 2,962 patients. 263 (5.80%) of eyes underwent rapid VF worsening (MD slope <-1dB/yr across all VFs). METHODS: We included eyes that met the following criteria: 1) followed for glaucoma or suspect status 2) had at least five longitudinal reliable VFs (VF1, VF2, VF3, VF4, VF5) 3) had one reliable baseline Optical Coherence Tomography (OCT) scan (OCT1) and one set of baseline clinical measurements (Clinical1) at the time of VF1. We designed a DLM to forecast future rapid VF worsening. The input consisted of spatially oriented total deviation values from VF1 (including or not including VF2 and VF3 in some models) and retinal nerve fiber layer thickness values from the baseline OCT. We passed this VF/OCT stack into a vision transformer feature extractor, the output of which was concatenated with baseline clinical data before putting it through a linear classifier to predict that eye’s risk of rapid VF worsening across the five VFs. We compared the performance of models with differing inputs by computing area under receiver operating curve (AUC) in the test set. Specifically, we trained models with the following inputs: Model V: VF1; VC: VF1+ Clinical1; VO: VF1+ OCT1; VOC: VF1+ Clinical1+ OCT1; V2: VF1 + VF2; V2OC: VF1 + VF2 + Clinical1 + OCT1; V3: VF1 + VF2 + VF3; V3OC: VF1 + VF2 + VF3 + Clinical1 + OCT1. MAIN OUTCOME MEASURES: AUC of DLMs when forecasting rapidly worsening eyes. RESULTS: Model V3OC best forecasted rapid worsening with an AUC (95% CI) of 0.87 (0.77, 0.97). Remaining models in descending order of performance and their respective AUC [95% CI] were: Model V3 (0.84 [0.74 to 0.95]), Model V2OC (0.81 [0.70 to 0.92]), Model V2 (0.81 [0.70 to 0.82]), Model VOC (0.77 [0.65, 0.88]), Model VO [0.75 [0.64, 0.88], Model VC (0.75 [0.63, 0.87]), Model V (0.74 [0.62, 0.86]). CONCLUSION: DLMs can forecast future rapid glaucoma worsening with modest to high performance when trained using data from early in the disease course. Including baseline data from multiple modalities and subsequent visits improves performance beyond using VF data alone.