sglang
by
sgl-project

Description: SGLang is a high-performance serving framework for large language models and multimodal models.

View sgl-project/sglang on GitHub ↗

Summary Information

Updated 50 minutes ago
Added to GitGenius on February 25th, 2026
Created on January 8th, 2024
Open Issues/Pull Requests: 2,756 (-7)
Number of forks: 5,275
Total Stargazers: 25,636 (+4)
Total Subscribers: 141 (+0)

Detailed Description

SGLang is a high-performance serving framework designed to accelerate the deployment and inference of large language models (LLMs) and multimodal models. Its primary purpose is to provide a robust and efficient infrastructure for serving these complex models, enabling low-latency and high-throughput performance across various hardware configurations, from single GPUs to large-scale distributed clusters. The project aims to be a leading solution for production LLM serving, as evidenced by its widespread adoption and continuous development.

The core functionality of SGLang revolves around its "Fast Runtime," which incorporates several key features to optimize performance. These include RadixAttention for efficient prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation to parallelize computation, speculative decoding to accelerate generation, continuous batching for improved throughput, paged attention to manage memory efficiently, and various parallelism techniques (tensor, pipeline, expert, and data parallelism) for scaling across multiple devices. Furthermore, it supports structured outputs, chunked prefill, quantization techniques (FP4/FP8/INT4/AWQ/GPTQ) to reduce memory footprint and improve speed, and multi-LoRA batching for efficient handling of multiple model adaptations.

SGLang boasts broad model support, making it compatible with a wide array of LLMs, including popular families like Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, and Mistral. It also supports various embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with an emphasis on easy extensibility to accommodate new models. This broad compatibility is further enhanced by its ability to work with most Hugging Face models and OpenAI APIs, simplifying integration with existing model ecosystems.

A significant strength of SGLang is its extensive hardware support. It is designed to run efficiently on a diverse range of hardware platforms, including NVIDIA GPUs (GB200/B300/H100/A100/Spark), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, and Ascend NPUs. This versatility allows users to leverage their existing infrastructure and choose the most suitable hardware for their specific needs and budget.

The repository highlights SGLang's active community and its widespread adoption within the industry. It is an open-source project, fostering collaboration and continuous improvement. The project provides comprehensive documentation, including installation guides, quick start tutorials, and developer guides, to facilitate user onboarding and contribution. The project also provides links to a blog, documentation, roadmap, and a Slack channel for community interaction. The project also provides links to slides and recordings of meetups and talks.

The project's news section showcases its rapid development and its ability to provide day-0 support for new models and hardware. The project has been recognized with an Open Source AI Grant by a16z, further validating its impact and potential. The project also highlights its integration with reinforcement learning and post-training frameworks, such as AReaL, Miles, slime, Tunix, and verl, demonstrating its versatility beyond basic inference. SGLang is deployed at large scale, generating trillions of tokens daily, and is trusted by a wide range of leading enterprises and institutions, solidifying its position as a key player in the LLM serving landscape.

sglang
by
sgl-projectsgl-project/sglang

Repository Details

Fetching additional details & charts...