sglang
by
sgl-project

Description: SGLang is a high-performance serving framework for large language models and multimodal models.

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on February 25th, 2026

Created on January 8th, 2024

Open Issues & Pull Requests: 4,104 (+6)

Number of forks: 7,064

Total Stargazers: 30,143 (+2)

Total Subscribers: 161 (+0)

Issue Activity (beta)

Open issues: 704

New in 7 days: 40

Closed in 7 days: 28

Avg open age: 36 days

Stale 30+ days: 349

Stale 90+ days: 26

Recent activity

Opened in 7 days: 28

Closed in 7 days: 22

Comments in 7 days: 79

Events in 7 days: 170

Top labels

inactive (2,832)
high priority (405)
good first issue (248)
bug (215)
help wanted (184)
deepseek (91)
enhancement (68)
npu (64)

Most active issues this week

#26340 CUDA Coredump Tracker - 42 events / 42 comments
#29465 Fully migrate IPC to msgpack: eliminate remaining PickleWrapper workarounds and flip default - 13 events / 5 comments
#30168 [Bug] Qwen3.6/GDN LoRA: `get_hidden_dim` not implemented for `in_proj_qkv`/`in_proj_z` — crashes server at startup - 12 events / 3 comments
#30158 [Bug] Compatibility issues with PP and Hicache causing errors - 10 events / 7 comments
#29857 [Bug] v0.5.14: With EAGLE/MTP on, KV pool profiler leaves ~50 GB VRAM idle on hybrid GDN model (Qwen3.6-27B NVFP4) - 6 events / 3 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 1.4 hours

Mean response time: 12.5 days

90th percentile: 8.9 days

Tracked items: 5,820

Most active contributors

zhyncs - 1,839 events, 746 issues
zhaochenyang20 - 1,287 events, 317 issues
Fridge003 - 1,194 events, 556 issues
b8zhong - 737 events, 373 issues
merrymercy - 700 events, 347 issues

Related by overlapping contributors

Detailed Description

SGLang is a high-performance serving framework for large language models and multimodal models, designed to deliver low-latency and high-throughput inference across single GPUs and large distributed clusters. The framework is written in Python and maintained as an open-source project under the LMSYS organization. According to GitGenius tracking, the repository has grown to 3927 open issues as of July 2026, with a median issue and pull request response latency of 1.4 hours and a mean latency of 299.2 hours across 5806 tracked items. The most active contributors tracked by GitGenius are zhyncs with 1839 events, zhaochenyang20 with 1287 events, and Fridge003 with 1191 events, indicating sustained development momentum.

The framework's core runtime features include RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding including the next-generation DFlash and Spec V2 technologies, continuous batching, paged attention, tensor and pipeline parallelism, expert parallelism, data parallelism, structured outputs, chunked prefill, and quantization support across FP4, FP8, INT4, AWQ, and GPTQ formats. SGLang also supports multi-LoRA batching for efficient serving of multiple fine-tuned variants. The framework extends beyond traditional language model inference to support diffusion models for video and image generation through SGLang Diffusion.

Model support is extensive, covering language models including Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, and Mistral, alongside embedding models like e5-mistral, gte, and mcdse, as well as reward models such as Skywork. The framework maintains compatibility with most Hugging Face models and OpenAI APIs. Hardware support spans NVIDIA GPUs including GB200, B300, H100, and A100, AMD GPUs including MI355 and MI300, Intel Xeon CPUs, Google TPUs, and Ascend NPUs, reflecting broad deployment flexibility.

Recent development highlights demonstrate rapid iteration and industry responsiveness. The repository achieved day-zero support for DeepSeek-V4 with verified reinforcement learning integration, DeepSeek-V3.2 with sparse attention optimization, and latest models including Nemotron 3 Ultra, Nemotron 3 Super, and Higgs Audio v3 TTS. Performance achievements include 25x inference acceleration on NVIDIA GB300 NVL72, 3.8x prefill and 4.8x decode throughput improvements on GB200 with prefill-decode disaggregation and large-scale expert parallelism, and native TPU support through the SGLang-Jax backend introduced in October 2025.

SGLang has established itself as a production-grade system powering over 400,000 GPUs worldwide and generating trillions of tokens daily. The framework serves as a rollout backend for training frontier models and is integrated with post-training frameworks including AReaL, Miles, slime, Tunix, and verl. Adoption spans major technology organizations including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, and leading academic institutions including MIT, Stanford, UC Berkeley, and Tsinghua University. GitGenius classification identifies SGLang across multiple domains including LLM Programming Language, Structured Output, Inference Acceleration, Model Orchestration, Caching Mechanisms, Concurrent Execution, and Agentic AI, reflecting its multifaceted role in the AI infrastructure ecosystem.

sglang
by
sgl-project

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

sglang
by
sgl-projectsgl-project/sglang

Repository Details

sglang by sgl-project

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

sglang by sgl-projectsgl-project/sglang

Repository Details

sglang
by
sgl-project

sglang
by
sgl-projectsgl-project/sglang