← Back
vLLM

vLLM

A high-throughput and memory-efficient inference and serving engine for large language models.

Also mentioned (1)

Casual references without a clear endorsement

Sebastian Raschka mentioned "even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds..." ▶ 2:01:11