Aim: Discover combinations (i.e. bins) of HLA amino acid mismatches (AA-MMs) that are predictive of kidney graft failure (GF) risk using the Feature Inclusion Bin Evolver for Risk Stratification (FIBERS 2.0) machine learning (ML) algorithm.
Methods: Data on 166932 adult deceased donor kidney transplants from 2010-2023 were obtained from the Scientific Registry of Transplant Recipients. We assigned AA polymorphisms for ten replicate high resolution imputations of HLA-DRB1, -DQA1, -DQB1, and -DRB3,4,5 alleles from serologic antigen specificities using haplotype frequencies. 171 AA-MMs remained after removing MMs with <1% frequency. ML analyses were run with and without - DRB345 AA-MMs, employing FIBERS 2.0 (Fig. 1) to discover AA-MM bins that stratify donor/recipient pairs into high/low GF risk based on the bin’s burden threshold (i.e. threshold of AA-MM count). FIBERS 2.0 was extended by (1) discovering the optimal AA-MM burden threshold for bins; rather than assuming that ‘any’ AA-MM in the bin leads to high-risk assignment and (2) using deviance residuals to uniquely take estimated covariate effects into account during algorithm training (to focus on novel risk factor discovery). Covariates included donor/recipient characteristics (e.g. demographics, comorbidities) and HLA-A, B, C, HLA-DRB1, DQA1, DQB1, DRB345 antigen MMs.
Results: Across 10 imputed datasets, top FIBERS 2.0 bins yielded similar adjusted HRs (1.087-1.096 with and 1.078-1.1 without DRB345) supporting the efficacy of deviance residuals. Estimated AA-MMs in antigen recognition sites of HLA-DRB1, DQB1, and DRB3,4,5 account for a significant incremental risk of GF. Notably, several of these AA-MMs separate DRB3, DRB4 and DRB5 haplotypes and DRB1 serologic specificities (Fig. 2). AA-MM position variability was observed between analyses with and without DRB345, likely due to linkage disequilibrium between DRB345 and DRB1 and/or DQB1. Top bins were identified with AA-MM burden thresholds ranging from 0-5 (without DRB345) and 0-4 (with DRB345) supporting the utility of this extension in conjunction with discovering AA-MM positions.
Conclusion: FIBERS 2.0 effectively automates the discovery of a diverse set of AA-MM bins predicting GF risk while taking covariate effects into account during training and not assuming a pre-defined AA-MM burden threshold for assigning risk groups.