King's College London NLP
← Back to Seminars
SPECTRA Seminar Details
SPECTRA: Faster Large Language Model Inference with Optimized Internal and External Speculation

👨‍🏫 Speaker: Le-Minh Nguyen

📅 Time: 2025/07/16

🎥 Recording:
🎬
Recording will be available after the talk
(2025/07/16)

📄 Abstract:
Inference with modern Large Language Models (LLMs) is both computationally intensive and time-consuming. While speculative decoding has emerged as a promising solution, existing approaches face key limitations. Training-based methods require the development of a draft model, which is often difficult to obtain and lacks generalizability. On the other hand, training-free methods provide only modest speedup improvements.

In this work, we introduce SPECTRA — a novel framework designed to accelerate LLM inference without requiring any additional training or modifications to the original LLM. SPECTRA incorporates two new techniques that efficiently leverage both internal and external speculation, each independently outperforming corresponding state-of-the-art (SOTA) methods. When combined, these techniques deliver up to a 4.08× speedup across a variety of benchmarks and LLM architectures, significantly surpassing existing training-free approaches. The implementation of SPECTRA is publicly available.


👨‍🎓 Biography:
Le-Minh Nguyen is currently a Professor of the School of Information Science and the director of the Interpretable AI Center at JAIST. He leads the Machine Learning and Natural Language Understanding Laboratory at JAIST. He is currently taking his sabbatical at Imperial College London, UK (Until April 2026). His research interests include machine learning & deep learning, natural language processing, legal text processing, and explainable AI. He serves as an action editor of TACL (a leading journal in NLP), a board member of VLSP (Vietnamese language and speech processing), and an editorial board member of AI & Law, Journal of Natural Language Processing (Cambridge). He is a steering committee of Juris-informatics (Jurisin) in Japan – a research area that studies legal issues from informatics.




© 2025 Copyright: KCL NLP Group