vLLM

software Visit website →

A high-throughput and memory-efficient inference and serving engine for large language models.

Topics

ai inference serving llm

Casual references without a clear endorsement

Sebastian Raschka mentioned "even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds..." ▶ 2:01:11