What is a TCR–pMHC prediction model?

It is a computational model that takes a T-cell receptor sequence and a peptide–HLA target and estimates whether that receptor will recognize that target. Most published models work from sequence alone (the TCR’s CDR3 region, the peptide, and sometimes the HLA allele and the CDR3α chain); a newer class uses predicted 3D structure. The output is usually a binding score or a yes/no recognition call, not a measured affinity.

What is the difference between TCR–pMHC prediction and HLA-binding prediction?

HLA-binding prediction asks whether a peptide is displayed on a given HLA molecule — a comparatively well-characterized problem with large, clean training data and mature tools such as NetMHCpan. TCR–pMHC prediction asks the next, much harder question: given that a peptide is displayed, will a particular T-cell receptor engage it? TCR diversity is enormous, paired binding data is scarce, and the same epitope can be recognized by very different receptors, so accuracy lags far behind HLA-binding prediction.

Can AI reliably predict TCR specificity for a new epitope?

Not yet, in the general case. Models can be strong when a target has many known binders in the training set (the “seen epitope” regime), but performance drops sharply on epitopes the model was never trained on. The IMMREP23 community benchmark found that prediction for unseen peptide–HLA targets remained an unsolved problem, with the best method capped well below the accuracy seen on familiar targets. Treat predictions on novel epitopes as hypotheses to test, not answers.

Which TCR–pMHC model is the best?

There is no single “best” model, and published accuracy numbers are not comparable across papers because each uses different datasets, negative-sampling strategies, and train/test splits. The right question is which model fits your regime: an epitope with abundant training data, a paired-chain dataset, a fully novel neoantigen, or a structure-first workflow. This page is a map, not a leaderboard.

What inputs do TCR–pMHC prediction models need?

Most models need at least a TCR CDR3 sequence and a peptide; stronger models may also use paired alpha/beta chains, the HLA allele, peptide–MHC context, or predicted 3D structure. The best input set depends on whether you have bulk repertoire data, single-cell paired TCRs, known epitopes, or a private neoantigen with no known binders.

TCR–pMHC models compared: tools, inputs, and limits (2026)

Compare TCR–pMHC prediction models — NetTCR, ERGO-II, TITAN, STAPLER, TULIP, tcrLM, pMTnet, TCR-ESM, and structure-based workflows — by inputs, use case, availability, and where each still fails on unseen neoantigens.

Predicting whether a given T-cell receptor (TCR) will recognize a given peptide–HLA complex is one of the central open problems in computational immunology, and it sits directly upstream of neoantigen vaccine design and TCR-based cell therapy. If you can rank, in silico, which receptors in a repertoire are likely to engage a tumor neoantigen, you can prioritize candidates, design panels, and interpret single-cell data without running every pairing through an assay. Over the past five years a dense thicket of models has grown up to attempt exactly this, spanning convolutional networks, recurrent autoencoders, attention models, protein language models, and, most recently, structure prediction. The field is genuinely moving — but it is also widely misread, because the headline accuracy numbers in individual papers describe very different tasks under very different conditions.

This page is a plain-language map of the major TCR–pMHC prediction models, written for people deciding whether and how to use them. It is deliberately not a leaderboard. Published metrics are not comparable across papers: each model is trained and tested on different data, uses a different strategy for generating the negative (non-binding) examples that dominate any such dataset, and splits its data differently. A model reporting an AUROC of 0.98 and one reporting 0.62 may both be honest — they are simply answering different questions. What follows is a description of what each model does, what it ingests, where it is strong, and where the whole field still falls down.

Quick chooser: start with your data, not the leaderboard

If you have	Start with	Why
Paired TCRαβ + peptide data	NetTCR, MixTCRpred, STAPLER, TCR-ESM	Paired chains carry specificity signal beta-only models discard
Only CDR3β + peptide	ERGO-II, pMTnet, epiTCR	These models tolerate beta-only inputs and are useful baselines
A novel private neoantigen	TULIP, TABR-BERT, structure-based reranking	Unseen-epitope generalization is the weak regime; use multiple weak signals
A top-candidate shortlist	AlphaFold3 / TCRmodel2 structure workflows	Interface geometry can add mechanistic evidence after sequence screening

Use this as a triage guide. The model list below explains the trade-offs and caveats in detail.

What makes this hard — and what is actually solved

It helps to separate two problems that are routinely conflated. The first is HLA-binding prediction: will a peptide be presented on a given HLA molecule? This problem is, by the standards of the field, largely solved. Tools such as NetMHCpan are trained on hundreds of thousands of measured binding and mass-spectrometry-eluted-ligand data points and generalize well, even to HLA alleles they were not explicitly trained on. When people say “the HLA part is the easy part,” this is what they mean.

The second problem — TCR specificity — is the hard one. Given that a peptide is presented, will a specific TCR recognize it? Three structural features of the problem make this brutal. First, scale and diversity: the theoretical TCR repertoire is astronomically large, and the hypervariable CDR3 loops that dominate recognition differ from person to person. Second, data scarcity and imbalance: high-confidence, experimentally validated TCR–epitope pairs number in the low tens of thousands across all public databases, heavily skewed toward a handful of well-studied viral epitopes, and real binders are a tiny fraction of all possible pairings. Third, degeneracy: one epitope can be recognized by structurally diverse TCRs, and one TCR can engage multiple epitopes, so there is no clean one-to-one mapping to learn.

The consequence is a sharp split between two regimes. In the “seen epitope” regime — predicting binders for an epitope with many known examples in training — several models do well. In the “unseen epitope” regime — generalizing to a target the model has never been trained on, which is exactly the neoantigen case — performance collapses. This is not a quirk of one model; it has been documented repeatedly, including in dedicated analyses of predictors failing to generalize to unseen peptides and in the IMMREP23 community benchmark, where prediction for unseen peptide–HLA targets was reported as an unsolved problem with the strongest entry capped well below seen-epitope levels. Any honest reading of the field treats unseen-epitope prediction as an open research question, not a deployable capability.

The models, side by side

The table below groups the most-cited models by their core approach. “Input” notes what the model consumes — critically, whether it uses only the TCR’s beta chain (CDR3β) or paired alpha and beta chains, since paired data consistently carries more specificity signal. “Strength” describes the regime each model was built for, not a claim that it beats the others; remember that the numbers in the source papers are not cross-comparable.

Model	Approach	Input	Built-for strength	Availability
NetTCR-2.x	Convolutional neural network; pan- and peptide-specific training modes	Paired CDR3α/β + peptide	Peptide-specific prediction; paired-chain data; well-documented tooling	Open source (GitHub)
ERGO / ERGO-II	LSTM + autoencoder encoders	CDR3β (α, V/J, MHC, CD4/CD8 optional) + peptide	Flexible feature sets; classic baseline	Open source (GitHub)
TITAN	Bimodal context-attention network; epitopes as SMILES	CDR3β + epitope (atomic-level encoding)	Separating generalization to unseen TCRs vs. unseen epitopes	Open source (GitHub)
STAPLER	BERT-style transformer; pre-train then fine-tune	Full-length paired TCRαβ + peptide	Antigens with little related data; flags a negative-sampling data-leakage trap	Open source (preprint)
MixTCRpred	Transformer encoder + classifier	Paired CDR3α/β (per-epitope models)	Epitopes with ≥50 training TCRs; QC for single-cell TCR-seq	Open source (GitHub)
TABR-BERT	BERT transfer learning; separate TCR and pMHC embeddings	CDR3β + peptide + MHC	Reported robustness toward unseen epitopes	Open source (GitHub)
TULIP	Transformer; unsupervised, no negatives required	TCR + peptide (handles incomplete data)	Avoiding negative-sampling bias; unseen-peptide generalization	Open source (PNAS)
tcrLM	Lightweight masked protein language model	TCR CDR3 + epitope	Large self-supervised TCR pre-training; neoantigen scoring	Open source (GitHub)
pMTnet	Transfer-learning network	CDR3β + antigen + class I HLA	Neoantigen/antigen TCR pairing from sequence	Open source (GitHub)
epiTCR	Random forest on BLOSUM62 encodings	CDR3β + peptide (±MHC)	High sensitivity; simple, fast baseline	Open source (GitHub)
TCR-ESM	Protein-language-model (ESM) embeddings + classifier	Paired CDR3α/β + peptide + MHC	Leveraging pretrained protein embeddings; paired chains	Open source (GitHub)
AF3 / structure-based	MSA-based structure prediction + interface scoring	Sequences → predicted 3D complex	Modeling the actual TCR–pMHC interface; reranking by CDR3 confidence	Varies (AlphaFold3, TCRmodel2, NetTCR-struc)

A map, not a leaderboard. Accuracy figures from each model’s paper are not comparable across rows because datasets, negative-sampling, and train/test splits differ.

The shift toward language models and structure

Two trends define the 2023–2026 cohort. The first is the move from bespoke architectures to protein language models. Approaches such as TABR-BERT, STAPLER, TULIP, tcrLM and TCR-ESM all lean on transformer-style representations — either pretrained general protein models like ESM, or domain models pretrained on very large unlabeled TCR repertoires (tcrLM was pretrained on a corpus of over 100 million distinct CDR3 sequences). The appeal is that self-supervised pretraining can extract structure from the abundant unlabeled sequence data, partly sidestepping the scarcity of labeled pairs. TULIP additionally addresses a subtler problem: because real datasets contain almost no verified non-binders, the standard practice of fabricating negatives can leak information and inflate scores. STAPLER’s authors made the same point sharply, documenting a data-leakage failure mode in common negative-generation strategies. Anyone reading benchmark tables should keep this in mind: how the negatives were made can matter as much as the model.

The second trend is structure. Rather than learning recognition from sequence statistics, structure-based methods predict the 3D TCR–pMHC complex and score the interface. A 2025 benchmark of TCR–pMHC structure prediction across MSA-based, language-model-based, and docking-based methods found that MSA-based approaches — AlphaFold3 in particular — produced the best docking quality, with median DockQ scores above 0.6 for both class I and class II complexes, while faster AlphaFold2 variants such as TCRmodel2 and ColabFold retained competitive accuracy at far lower cost. Notably, the confidence of the predicted CDR3 region (its pLDDT) carried functional signal and could be used to rerank candidates. Structure does not yet solve the unseen-epitope generalization problem either, but it offers a mechanistically grounded, complementary signal to the sequence models.

How to choose, and how to use them

Start by identifying your regime, because it determines whether any of these tools can help. If your target epitope already has many validated binders in public databases, you are in the favorable seen-epitope setting; epitope-specific models such as MixTCRpred (designed for epitopes with at least ~50 training TCRs) or NetTCR’s peptide-specific mode are reasonable starting points, and you should expect usable ranking. If your target is a private neoantigen with no known binders, you are in the unseen-epitope setting where the entire field is weak — use predictions only to triage, never to conclude, and lean on models explicitly built for generalization (TULIP, TABR-BERT) while keeping expectations modest.

Second, match the model to your data. If you have paired TCRαβ sequences, prefer models that exploit both chains (NetTCR-2.x, MixTCRpred, STAPLER, TCR-ESM), since the alpha chain carries specificity signal that beta-only models discard. If you only have CDR3β, several models (ERGO-II, pMTnet, epiTCR) are designed for that constraint. Third, run more than one model and look for agreement: because each was trained on different data with different biases, concordant predictions across architectures are a stronger signal than any single score. Fourth, calibrate against your own positive controls where possible — a model’s absolute score is rarely meaningful out of the box, but its ranking of your candidates relative to known binders is informative. Finally, where the stakes justify it, layer in structure: use a fast AlphaFold2 variant for high-throughput screening and reserve AlphaFold3 for confirmatory interface modeling of top candidates, using CDR3 confidence as a reranking signal. In every case, the output is a prioritized hypothesis list for the wet lab — not a substitute for it.

Repositories and primary sources

NetTCR-2.2 (GitHub) — CNN, paired-chain, pan/peptide/pretrained modes
NetTCR-2.0 paper (Communications Biology)
ERGO-II (GitHub) — LSTM + autoencoder; flexible features
TITAN (GitHub) — Bimodal attention; SMILES epitope encoding
STAPLER preprint (bioRxiv) — Transformer; documents negative-sampling leakage
MixTCRpred (GitHub) — Per-epitope transformer; Nature Comms 2024
TABR-BERT (GitHub) — BERT transfer learning; Briefings in Bioinformatics
TULIP paper (PNAS) — Unsupervised; no negatives required
tcrLM (GitHub) — Lightweight masked LM; 100M+ CDR3 pretraining
pMTnet (GitHub) — Transfer learning; Nature Machine Intelligence 2021
epiTCR paper (Bioinformatics) — Random forest; high-sensitivity baseline
TCR-ESM paper (CSBJ) — ESM embeddings; paired chains + MHC
IMMREP23 benchmark (ImmunoInformatics) — Community benchmark; unseen pMHC remains unsolved
On TCR predictors failing to generalize (Front. Immunol.)
NetMHCpan-4.1 (DTU Health Tech) — Reference HLA-binding predictor

Maintained reference

This is a living comparison. The TCR–pMHC field publishes new models and benchmarks at a steady clip, and we revise this map as the evidence changes — adding entrants, correcting scope, and updating the solved-versus-open framing. Last reviewed 2026-05-30. If a model’s scope or availability has changed, or a new benchmark shifts the picture, the table above is where we record it.