← Back
vLLM
software
Visit website →
A high-throughput and memory-efficient inference and serving engine for large language models.
Also mentioned
(1)
Casual references without a clear endorsement
Sebastian Raschka
mentioned
"even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds..."
▶ 2:01:11