Multi-Token Prediction (MTP) support has been merged into llama.cpp, according to the merged pull request and community confirmation across r/LocalLLaMA threads.

MTP enables speculative decoding by predicting multiple tokens per forward pass rather than one. For compatible models, this reduces the number of sequential inference steps required, translating to lower token generation latency without changes to underlying hardware.

The feature is now in the main llama.cpp codebase. Availability for a given model depends on whether that model was trained with MTP heads — not all models will benefit.

Operators running llama.cpp-based inference stacks should verify model compatibility before expecting throughput gains. Where supported, MTP can reduce latency on existing hardware, making it a relevant update for local deployment configurations where inference speed is a constraint.