Neoantigen × AI
Daily research signal
Methods · 2026-05-30

NetMHCpan vs MHCflurry: which peptide–HLA predictor should you use?

A neutral, current head-to-head of the two dominant MHC class I prediction tools — how they're built, where they agree, where they diverge, and how to pick one for your pipeline.

If you are building a neoantigen pipeline, picking an MHC class I predictor is one of the first forks in the road — and the public guidance is thin. Most search results still point to a 2017 preprint or a years-old GitHub issue. This is a current, neutral comparison of the two tools that dominate the field: NetMHCpan-4.1 from DTU and MHCflurry 2.0 from OpenVax.

The short version: neither is uniformly better. They are trained on overlapping data, agree closely on strong binders, and diverge at the edges — non-9-mers, rare alleles, and the boundary between binding and presentation. The right choice depends on your licensing constraints, whether you need MHC class II, and how deeply you need to script the tool into a reproducible pipeline. We will also name the caveat that outranks the choice itself: presentation is not immunogenicity.

Both tools are pan-allele neural-network predictors: they take an HLA sequence and a peptide and score the pairing, generalizing across alleles rather than training one model per allele. The substantive differences are in training data, outputs, and packaging.

NetMHCpan-4.1 is trained on a combined set of more than 850,000 peptides spanning both quantitative binding-affinity (BA) measurements and mass-spectrometry eluted-ligand (EL) data, drawn from single-allele and multi-allele sources. It outputs both a BA score and an EL (presentation-likelihood) score, and it covers a broad set of MHC molecules across human and several non-human species. Its sibling NetMHCIIpan handles MHC class II — a capability MHCflurry does not provide.

MHCflurry 2.0 is structured as two stacked models. A binding predictor scores peptide–MHC affinity; an allele-independent antigen-processing predictor models effects such as proteasomal cleavage from the peptide's flanking sequence. A small logistic-regression "presentation" model combines the two into a composite presentation score, with the processing and binding components trained on mass-spec ligand data. The whole thing is an Apache-licensed Python package — pip-installable, fast over large peptide sets, and easy to call from code.

So the philosophies converge more than they differ: both now lean on mass-spec immunopeptidomics to predict presentation rather than affinity alone. NetMHCpan folds EL data directly into one pan-allele network; MHCflurry factors processing out as an explicit, separately inspectable module.

DimensionNetMHCpan-4.1MHCflurry 2.0
ApproachSingle pan-allele neural net over BA + EL dataStacked binding + antigen-processing models, combined into a presentation score
Training data>850k peptides: in-vitro binding affinity and MS eluted ligandsMS eluted ligands for binding and processing; logistic combiner
MHC classClass I; class II via companion NetMHCIIpanClass I only
Rare-allele coverageBroad; predicts any MHC with a known sequence, incl. many non-human speciesPan-allele over human HLA; narrower species/allele scope
Speed / scriptabilityStandalone binary or web server; usable in pipelinesPip-installable Python; fast on large sets, easy to script
OutputBoth BA and EL (presentation) scores, with %rankBinding affinity, processing, and composite presentation score
LicenseFree for academic/non-commercial; commercial license required; closed sourceApache 2.0 open source; commercial use allowed
Best forClass II needs, rare/non-human alleles, established academic workflowsOpen-source/commercial pipelines, reproducibility, programmatic batch scoring
NetMHCpan-4.1 vs MHCflurry 2.0 at a glance. Performance is benchmark-dependent; treat both as state-of-the-art for MHC class I.

Reach for NetMHCpan-4.1 when you need MHC class II — its companion NetMHCIIpan is the standard, and MHCflurry simply does not cover class II. It is also the safer default for rare or non-human alleles, since it predicts for any MHC molecule with a known sequence, and for fitting into established academic workflows where reviewers and collaborators already expect NetMHCpan outputs and %rank thresholds.

Reach for MHCflurry 2.0 when licensing and engineering matter. It is Apache-licensed, so a startup can put it in a commercial product without a separate agreement. It installs with pip, runs fast over large peptide libraries, and exposes its binding, processing, and presentation scores as separate, inspectable values — which makes it easy to drop into a reproducible, version-pinned pipeline and to reason about why a peptide scored the way it did.

In practice, many teams run both and look for agreement. The tools concur closely on strong binders, so consensus calls are robust. Disagreement clusters at the edges — longer or shorter than 9-mers, rare alleles with sparse training data, and peptides near the presentation threshold — and those are exactly the cases worth flagging for wet-lab follow-up rather than trusting either score alone.

Two adjacent tools are worth knowing. MixMHCpred (GfellerLab) is a motif-based presentation predictor trained on eluted-ligand data, free for academic use with a separate commercial license; it is a useful third opinion. NetCTL bundles MHC binding with proteasomal cleavage and TAP transport for an older-style integrated CTL-epitope score. Neither displaces NetMHCpan or MHCflurry as the primary class I predictor, but both are reasonable cross-checks.

Here is the caveat that matters more than the NetMHCpan-vs-MHCflurry decision: a high binding or presentation score predicts that a peptide will be displayed on the cell surface. It does not predict that a T cell will recognize it. These are different questions, and conflating them is the most common way neoantigen pipelines overpromise.

The literature is blunt about the gap. Most peptides predicted to be presented never trigger a T-cell response, because immunogenicity also depends on TCR recognition of the peptide–MHC complex, central tolerance, and physicochemical peptide features that presentation models do not capture. Mutations sometimes raise binding affinity, sometimes leave it untouched while still creating a peptide different enough to be seen by T cells — affinity and immunogenicity are only loosely coupled.

The practical implication: use NetMHCpan or MHCflurry as a presentation filter, then layer dedicated immunogenicity or TCR-recognition models (PRIME, MixMHCpred's immunogenicity-oriented companions, and TCR-pMHC binding predictors) on top before committing wet-lab resources. The binding predictor narrows the field from thousands of candidates to a tractable shortlist; it does not pick the winners.

Treat both NetMHCpan-4.1 and MHCflurry 2.0 as state-of-the-art MHC class I predictors and choose on constraints, not on a leaderboard. Need class II, rare alleles, or a familiar academic workflow: NetMHCpan. Need open-source licensing, easy scripting, and inspectable component scores for a reproducible or commercial pipeline: MHCflurry. Running both and trusting the consensus is a defensible default. And whichever you pick, remember it answers "will this be presented?" — not "will this be immunogenic?"

Maintained note: this comparison was last reviewed on 2026-05-30 against NetMHCpan-4.1 and MHCflurry 2.0. If you spot a version change or a benchmark result that shifts the guidance, the recommendation here is to re-validate on your own alleles and peptides rather than to assume the ranking holds.

Is MHCflurry more accurate than NetMHCpan?

Not in a way that generalizes. On the held-out mass-spectrometry benchmarks in the MHCflurry 2.0 paper, the integrated MHCflurry presentation predictor outperformed NetMHCpan-4.0 and MixMHCpred 2.0.2; NetMHCpan-4.1 (released around the same time) reports similarly strong presentation performance. Head-to-head rankings depend heavily on the benchmark, the alleles, and which output mode you compare. Treat both as state-of-the-art and validate on your own data.

What is the difference between binding-affinity and eluted-ligand prediction?

Binding-affinity (BA) models are trained on in-vitro assays that measure how tightly a peptide binds an MHC molecule. Eluted-ligand (EL) models are trained on peptides actually recovered from cell surfaces by mass spectrometry, so they capture presentation — the combined effect of processing, transport, and binding — not just affinity. EL/presentation scores generally track real immunopeptidomes better; BA values are still useful when you want an interpretable nanomolar-style number.

Is NetMHCpan free?

NetMHCpan-4.1 is free for academic and non-commercial use via the DTU Health Tech web server, and the standalone binary is available through a license request form. Commercial use requires a separate license from DTU. MHCflurry, by contrast, is Apache-licensed open source, so commercial use needs no permission.

Does a high binding or presentation score mean a peptide is immunogenic?

No. A high score means the peptide is likely to be presented on the cell surface — a necessary condition, not a sufficient one. Most strongly predicted binders are never recognized by T cells, because immunogenicity also depends on TCR repertoire, central tolerance, and peptide features neither tool models. Binding and presentation are filters, not verdicts.