Aim: The IPD-IMGT/HLA Database curates ~40,000 allelic variants for 46 genes in the HLA region. Protein and DNA alignment files for these genes are available for public use, alongside descriptions of genes and their allele names, on the ANHIG/IMGTHLA GitHub repository. Since 2010, these resources have grown significantly in size and number, and have become central for the application of HLA genotype data for stem-cell and organ transplantation therapy, as well as disease association, evolutionary biology, and population genetics research. To foster standardized, local use of these key resources, we developed HLAtools, a free software package that allows computation on IPD-IMGT/HLA Database resources.
Methods: HLAtools is written in R, the language of statistical computing. We developed functions that consume the static reference files defining gene and allele names, and sequence alignments available in the ANHIG/IMGTHLA GitHub repository, making their contents computable. We developed reference objects that compile this information, allowing it to be searched, implemented the capacity to consume files from all version 3.*.* releases, and developed new reference objects that summarize these data for each release, and functions that use this information to support data analyses.
Results: HLAtools includes reference objects that organize HLA-region genes into useful categories, identify the names of all alleles in all releases, define gene-feature boundaries in all genes, and annotate the unusual gene features of pseudogenes and gene fragments.
Package functions allow installation of computable versions of any or all protein, codon, coding nucleotide or genomic alignments in IPD-IMGT/HLA Database releases 3.0.0 to 3.56.0; identification of sequence differences between any pair of alleles at a gene; construction of user-defined, multi-gene amino-acid, codon, coding nucleotide or genomic nucleotide alignments; translation of GL String Codes across release versions; conversion between GL String and UNIFORMAT datasets; multi-allele, multi-locus stratification of BIGDAWG-formatted datasets for stratified Odds Ratio or Relative Risk analyses; conversion of BIGDAWG-datasets into PyPop version 3 datasets; and calculation of Relative Risk association values in BIGDAWG-formatted datasets.
Conclusion: HLAtools integrates IPD-IMGT/HLA Database resources with extant R functionalities, making these data locally available for immunogenetic research applications.