Predicting whether a given T-cell receptor (TCR) will recognize a given peptide–HLA complex is one of the central open problems in computational immunology, and it sits directly upstream of neoantigen vaccine design and TCR-based cell therapy. If you can rank, in silico, which receptors in a repertoire are likely to engage a tumor neoantigen, you can prioritize candidates, design panels, and interpret single-cell data without running every pairing through an assay. Over the past five years a dense thicket of models has grown up to attempt exactly this, spanning convolutional networks, recurrent autoencoders, attention models, protein language models, and, most recently, structure prediction. The field is genuinely moving — but it is also widely misread, because the headline accuracy numbers in individual papers describe very different tasks under very different conditions.
This page is a plain-language map of the major TCR–pMHC prediction models, written for people deciding whether and how to use them. It is deliberately not a leaderboard. Published metrics are not comparable across papers: each model is trained and tested on different data, uses a different strategy for generating the negative (non-binding) examples that dominate any such dataset, and splits its data differently. A model reporting an AUROC of 0.98 and one reporting 0.62 may both be honest — they are simply answering different questions. What follows is a description of what each model does, what it ingests, where it is strong, and where the whole field still falls down.
It helps to separate two problems that are routinely conflated. The first is HLA-binding prediction: will a peptide be presented on a given HLA molecule? This problem is, by the standards of the field, largely solved. Tools such as NetMHCpan are trained on hundreds of thousands of measured binding and mass-spectrometry-eluted-ligand data points and generalize well, even to HLA alleles they were not explicitly trained on. When people say “the HLA part is the easy part,” this is what they mean.
The second problem — TCR specificity — is the hard one. Given that a peptide is presented, will a specific TCR recognize it? Three structural features of the problem make this brutal. First, scale and diversity: the theoretical TCR repertoire is astronomically large, and the hypervariable CDR3 loops that dominate recognition differ from person to person. Second, data scarcity and imbalance: high-confidence, experimentally validated TCR–epitope pairs number in the low tens of thousands across all public databases, heavily skewed toward a handful of well-studied viral epitopes, and real binders are a tiny fraction of all possible pairings. Third, degeneracy: one epitope can be recognized by structurally diverse TCRs, and one TCR can engage multiple epitopes, so there is no clean one-to-one mapping to learn.
The consequence is a sharp split between two regimes. In the “seen epitope” regime — predicting binders for an epitope with many known examples in training — several models do well. In the “unseen epitope” regime — generalizing to a target the model has never been trained on, which is exactly the neoantigen case — performance collapses. This is not a quirk of one model; it has been documented repeatedly, including in dedicated analyses of predictors failing to generalize to unseen peptides and in the IMMREP23 community benchmark, where prediction for unseen peptide–HLA targets was reported as an unsolved problem with the strongest entry capped well below seen-epitope levels. Any honest reading of the field treats unseen-epitope prediction as an open research question, not a deployable capability.
The table below groups the most-cited models by their core approach. “Input” notes what the model consumes — critically, whether it uses only the TCR’s beta chain (CDR3β) or paired alpha and beta chains, since paired data consistently carries more specificity signal. “Strength” describes the regime each model was built for, not a claim that it beats the others; remember that the numbers in the source papers are not cross-comparable.
| Model | Approach | Input | Built-for strength | Availability |
|---|---|---|---|---|
| NetTCR-2.x | Convolutional neural network; pan- and peptide-specific training modes | Paired CDR3α/β + peptide | Peptide-specific prediction; paired-chain data; well-documented tooling | Open source (GitHub) |
| ERGO / ERGO-II | LSTM + autoencoder encoders | CDR3β (α, V/J, MHC, CD4/CD8 optional) + peptide | Flexible feature sets; classic baseline | Open source (GitHub) |
| TITAN | Bimodal context-attention network; epitopes as SMILES | CDR3β + epitope (atomic-level encoding) | Separating generalization to unseen TCRs vs. unseen epitopes | Open source (GitHub) |
| STAPLER | BERT-style transformer; pre-train then fine-tune | Full-length paired TCRαβ + peptide | Antigens with little related data; flags a negative-sampling data-leakage trap | Open source (preprint) |
| MixTCRpred | Transformer encoder + classifier | Paired CDR3α/β (per-epitope models) | Epitopes with ≥50 training TCRs; QC for single-cell TCR-seq | Open source (GitHub) |
| TABR-BERT | BERT transfer learning; separate TCR and pMHC embeddings | CDR3β + peptide + MHC | Reported robustness toward unseen epitopes | Open source (GitHub) |
| TULIP | Transformer; unsupervised, no negatives required | TCR + peptide (handles incomplete data) | Avoiding negative-sampling bias; unseen-peptide generalization | Open source (PNAS) |
| tcrLM | Lightweight masked protein language model | TCR CDR3 + epitope | Large self-supervised TCR pre-training; neoantigen scoring | Open source (GitHub) |
| pMTnet | Transfer-learning network | CDR3β + antigen + class I HLA | Neoantigen/antigen TCR pairing from sequence | Open source (GitHub) |
| epiTCR | Random forest on BLOSUM62 encodings | CDR3β + peptide (±MHC) | High sensitivity; simple, fast baseline | Open source (GitHub) |
| TCR-ESM | Protein-language-model (ESM) embeddings + classifier | Paired CDR3α/β + peptide + MHC | Leveraging pretrained protein embeddings; paired chains | Open source (GitHub) |
| AF3 / structure-based | MSA-based structure prediction + interface scoring | Sequences → predicted 3D complex | Modeling the actual TCR–pMHC interface; reranking by CDR3 confidence | Varies (AlphaFold3, TCRmodel2, NetTCR-struc) |
Two trends define the 2023–2026 cohort. The first is the move from bespoke architectures to protein language models. Approaches such as TABR-BERT, STAPLER, TULIP, tcrLM and TCR-ESM all lean on transformer-style representations — either pretrained general protein models like ESM, or domain models pretrained on very large unlabeled TCR repertoires (tcrLM was pretrained on a corpus of over 100 million distinct CDR3 sequences). The appeal is that self-supervised pretraining can extract structure from the abundant unlabeled sequence data, partly sidestepping the scarcity of labeled pairs. TULIP additionally addresses a subtler problem: because real datasets contain almost no verified non-binders, the standard practice of fabricating negatives can leak information and inflate scores. STAPLER’s authors made the same point sharply, documenting a data-leakage failure mode in common negative-generation strategies. Anyone reading benchmark tables should keep this in mind: how the negatives were made can matter as much as the model.
The second trend is structure. Rather than learning recognition from sequence statistics, structure-based methods predict the 3D TCR–pMHC complex and score the interface. A 2025 benchmark of TCR–pMHC structure prediction across MSA-based, language-model-based, and docking-based methods found that MSA-based approaches — AlphaFold3 in particular — produced the best docking quality, with median DockQ scores above 0.6 for both class I and class II complexes, while faster AlphaFold2 variants such as TCRmodel2 and ColabFold retained competitive accuracy at far lower cost. Notably, the confidence of the predicted CDR3 region (its pLDDT) carried functional signal and could be used to rerank candidates. Structure does not yet solve the unseen-epitope generalization problem either, but it offers a mechanistically grounded, complementary signal to the sequence models.
Start by identifying your regime, because it determines whether any of these tools can help. If your target epitope already has many validated binders in public databases, you are in the favorable seen-epitope setting; epitope-specific models such as MixTCRpred (designed for epitopes with at least ~50 training TCRs) or NetTCR’s peptide-specific mode are reasonable starting points, and you should expect usable ranking. If your target is a private neoantigen with no known binders, you are in the unseen-epitope setting where the entire field is weak — use predictions only to triage, never to conclude, and lean on models explicitly built for generalization (TULIP, TABR-BERT) while keeping expectations modest.
Second, match the model to your data. If you have paired TCRαβ sequences, prefer models that exploit both chains (NetTCR-2.x, MixTCRpred, STAPLER, TCR-ESM), since the alpha chain carries specificity signal that beta-only models discard. If you only have CDR3β, several models (ERGO-II, pMTnet, epiTCR) are designed for that constraint. Third, run more than one model and look for agreement: because each was trained on different data with different biases, concordant predictions across architectures are a stronger signal than any single score. Fourth, calibrate against your own positive controls where possible — a model’s absolute score is rarely meaningful out of the box, but its ranking of your candidates relative to known binders is informative. Finally, where the stakes justify it, layer in structure: use a fast AlphaFold2 variant for high-throughput screening and reserve AlphaFold3 for confirmatory interface modeling of top candidates, using CDR3 confidence as a reranking signal. In every case, the output is a prioritized hypothesis list for the wet lab — not a substitute for it.
- NetTCR-2.2 (GitHub) — CNN, paired-chain, pan/peptide/pretrained modes
- NetTCR-2.0 paper (Communications Biology)
- ERGO-II (GitHub) — LSTM + autoencoder; flexible features
- TITAN (GitHub) — Bimodal attention; SMILES epitope encoding
- STAPLER preprint (bioRxiv) — Transformer; documents negative-sampling leakage
- MixTCRpred (GitHub) — Per-epitope transformer; Nature Comms 2024
- TABR-BERT (GitHub) — BERT transfer learning; Briefings in Bioinformatics
- TULIP paper (PNAS) — Unsupervised; no negatives required
- tcrLM (GitHub) — Lightweight masked LM; 100M+ CDR3 pretraining
- pMTnet (GitHub) — Transfer learning; Nature Machine Intelligence 2021
- epiTCR paper (Bioinformatics) — Random forest; high-sensitivity baseline
- TCR-ESM paper (CSBJ) — ESM embeddings; paired chains + MHC
- IMMREP23 benchmark (ImmunoInformatics) — Community benchmark; unseen pMHC remains unsolved
- On TCR predictors failing to generalize (Front. Immunol.)
- NetMHCpan-4.1 (DTU Health Tech) — Reference HLA-binding predictor
This is a living comparison. The TCR–pMHC field publishes new models and benchmarks at a steady clip, and we revise this map as the evidence changes — adding entrants, correcting scope, and updating the solved-versus-open framing. Last reviewed 2026-05-30. If a model’s scope or availability has changed, or a new benchmark shifts the picture, the table above is where we record it.