Predicting whether a given T-cell receptor will recognize a specific peptide–HLA complex — the step that decides whether a presented neoantigen actually triggers a T-cell response.
HLA-binding prediction answers "will this peptide be displayed?" TCR–pMHC prediction answers the harder downstream question: "will a T cell with a given receptor actually recognize that peptide–HLA complex and respond?" Recognition is the event that turns a presented peptide into an immune attack, so it sits one rung above presentation in the neoantigen-selection funnel. Models are typically trained on paired datasets of TCR sequences (often just the CDR3 loops) and their cognate epitopes, drawn from resources like VDJdb, IEDB, and McPAS-TCR.
The defining difficulty is generalization to unseen epitopes. Many published models score well when an epitope appears in both training and test data, but performance collapses for peptides the model has never seen — the realistic setting for a novel tumor neoantigen. The literature is unusually candid about this: independent benchmarks have shown that claimed out-of-distribution performance often evaporates under stricter splits, partly because training data is scarce and heavily skewed toward a handful of well-studied viral epitopes. Approaches range from epitope-specific models (one classifier per epitope) to pan-epitope models that attempt true zero-shot prediction, increasingly built on protein-language-model embeddings and attention.
So what for investors: presentation prediction (NetMHCpan and kin) is largely solved; TCR recognition is the open frontier and the current bottleneck on neoantigen-vaccine and TCR-therapy pipelines. A model that genuinely generalizes to unseen epitopes would let developers rank candidates by likely immunogenicity rather than mere presentation, cutting wet-lab validation cost and de-risking target selection. Treat any vendor claim of solved TCR–pMHC prediction with skepticism and ask specifically about unseen-epitope (zero-shot) benchmarks.
HLA-binding prediction estimates whether a peptide will be presented on an HLA molecule at all. TCR–pMHC prediction goes a step further and asks whether a particular T-cell receptor will recognize that peptide–HLA complex and mount a response. Presentation is necessary but not sufficient for immunogenicity, which is why the two tasks are distinct — and why TCR recognition is the harder, less-solved problem.
Training data is scarce and dominated by a few well-studied (mostly viral) epitopes, and the rules governing which TCRs recognize which peptides are highly complex and degenerate. Models often memorize epitope-specific patterns rather than learning transferable recognition logic, so accuracy drops sharply on epitopes absent from training — exactly the case for a novel tumor neoantigen.
A vaccine only works if the chosen neoantigens actually provoke T cells. Reliable TCR–pMHC prediction would let developers prioritize candidates by likely immunogenicity instead of presentation alone, reducing expensive wet-lab screening and improving the odds that a personalized vaccine hits immunogenic targets.