Neoantigen × AI
Daily research signal
Glossary · concept

Pan-allele model

An HLA predictor that generalizes across alleles — including rare ones never measured — by learning from the HLA molecule's own sequence rather than training a separate model per allele.

There are thousands of HLA alleles, and only a minority have enough experimental binding data to train a dedicated, allele-specific model. A pan-allele (or pan-specific) model sidesteps this by feeding the predictor a representation of the HLA molecule itself — NetMHCpan encodes each class I allele as a 34–37-residue "pseudosequence" drawn from the positions lining the peptide-binding groove. The model learns the general relationship between HLA sequence, peptide, and binding, so it can make predictions for an allele it has never seen, as long as related alleles were in training.

This is the design philosophy behind NetMHCpan and pan-allele tools like MHCflurry. The payoff is coverage: predictions for any HLA of known sequence, including rare alleles common in non-European populations that allele-specific methods simply can't serve. The caveat is graceful, not free — accuracy is highest when the query allele resembles well-characterized training alleles and degrades for sequence-distant ones, and independent work has flagged reduced performance on alleles from underrepresented ancestries.

So what for investors: pan-allele prediction is what makes population-scale and equitable neoantigen design feasible — a personalized vaccine has to work across patients' diverse HLA types, not just the handful of common alleles. It is also a fairness and addressable-market issue: methods that falter on non-European alleles narrow who a therapy can serve. When evaluating a platform, ask how it handles rare and non-European HLA alleles and whether it has validated performance there, rather than assuming uniform accuracy across the allele space.

Learn more
What makes a model "pan-allele"?

Instead of training a separate predictor for each HLA allele, a pan-allele model learns from a sequence representation of the HLA molecule itself (for NetMHCpan, a pseudosequence of the binding-groove residues). This lets it predict binding for any HLA of known sequence, including alleles with little or no experimental data.

Why does pan-allele prediction matter for HLA diversity and equity?

HLA is extremely polymorphic and allele frequencies vary by ancestry. Allele-specific models only cover well-studied (often European-skewed) alleles, so pan-allele models are essential to predict for rare and non-European alleles — making neoantigen design work across diverse patient populations rather than a narrow subset.

Are pan-allele predictions equally accurate for all alleles?

No. Accuracy is best when the query allele is similar to well-characterized alleles in the training data and degrades for sequence-distant or poorly represented alleles. Independent evaluations have reported reduced performance on non-European alleles absent from training, so per-allele validation matters.