Glossary · concept

Protein language model (PLM)

A transformer pretrained on millions of protein sequences (e.g. ESM) whose learned embeddings — fine-tuned for immunology tasks — are increasingly the backbone of neoantigen prediction.

It's the same idea as a large language model, applied to biology: a transformer is pretrained on millions of protein sequences (ESM, ProtT5, etc.) until it captures structural and functional regularities, then adapted ("fine-tuned") on a smaller labeled set to predict binding, presentation, or immunogenicity.

The commercial bet is transfer learning: high-quality immunogenicity data is scarce and expensive to generate, so a model that already "understands" proteins can learn from far fewer examples than one trained from scratch. This is the most active AI frontier in the field — most novel-method papers in the digest are some variant of a pretrained sequence model fine-tuned on an immune dataset — and it's where a defensible data + model advantage is most likely to emerge.

Learn more