helion
by
pytorch

Description: A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

View on GitHub ↗

Summary Information

Updated 30 minutes ago

Added to GitGenius on January 31st, 2026

Created on April 22nd, 2025

Open Issues & Pull Requests: 225 (+0)

Number of forks: 156

Total Stargazers: 900 (+0)

Total Subscribers: 17 (+0)

Issue Activity (beta)

Open issues: 81

New in 7 days: 1

Closed in 7 days: 0

Avg open age: 151 days

Stale 30+ days: 76

Stale 90+ days: 53

Recent activity

Opened in 7 days: 1

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

ptc2025 (43)
autotuning (41)
language support (28)
user support (21)
Error Messages (14)
indexing (14)
UBN (13)
symbolic shape (12)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 1.5 hours

Mean response time: 4.3 days

90th percentile: 11.3 days

Tracked items: 282

Most active contributors

yf225 - 576 events, 194 issues
jansel - 211 events, 108 issues
oulgen - 173 events, 95 issues
choijon5 - 59 events, 41 issues
HanGuo97 - 24 events, 7 issues

Related by overlapping contributors

Detailed Description

Helion is a Python-embedded domain-specific language designed for authoring machine learning kernels that compile to Triton, a performant backend for GPU and accelerator programming. The project sits at an abstraction level between PyTorch and Triton, aiming to reduce manual coding effort while enabling extensive automation in kernel optimization and autotuning. The name references the helium-3 nucleus, complementing Triton's hydrogen-3 reference.

The core value proposition of Helion is its comprehensive autotuning system. Rather than requiring developers to manually specify kernel configurations, Helion automatically generates and evaluates hundreds of potential Triton implementations from a single kernel definition. This autotuning process typically takes approximately ten minutes and explores a significantly larger configuration space than manual approaches, resulting in kernels that are more performance portable across different hardware platforms. The autotuner handles decisions across seven major categories: tensor indexing strategies, masking optimization, grid sizes and program ID calculations, search space definition, kernel argument management, looping reduction strategies, and automated optimizations like PID swizzling and persistent kernel strategies.

Helion kernels are written using familiar PyTorch syntax and operations. Standard PyTorch operators including pointwise operations, reductions, views, and matrix multiplications are automatically mapped to Triton operations through TorchInductor. The language introduces the hl.tile function to subdivide iteration spaces into tiles that execute in parallel on GPUs, with tiling details automatically determined by the autotuner or explicitly specified via configuration arguments. Code outside the kernel loops executes on the CPU for tasks like tensor allocation and shape computation, while code inside the loops compiles to a single GPU kernel.

The configuration system provides granular control over kernel behavior through options including block sizes, loop orders, loop flattening, unroll factors, pipeline stages, multi-buffering, warp specialization, L2 cache grouping, memory indexing strategies, program ID mapping, and warp counts. Developers can specify explicit configurations to skip autotuning on subsequent runs, or provide multiple configurations for lightweight evaluation. For production deployment, the documentation recommends using ahead-of-time tuned configurations rather than runtime autotuning to ensure predictable performance and startup times.

According to GitGenius activity tracking, the repository shows strong engagement with a median issue and pull request response latency of 1.5 hours across 282 tracked items, though the mean latency of 102.9 hours indicates some variance in response times. The most active issue labels are ptc2025 with 43 occurrences, autotuning with 41, and language support with 28, reflecting the project's focus on performance tuning and language feature development. Primary contributors yf225, jansel, and oulgen have driven 576, 211, and 173 tracked events respectively. The project maintains connections with pytorch/pytorch, triton-lang/triton, and vllm-project/vllm through overlapping contributor networks, positioning it within the broader PyTorch and Triton ecosystems.

Helion supports both static and dynamic shape handling through a static_shapes parameter, allowing developers to trade off between performance specialization and autotuning time. The project provides interactive learning opportunities including a PLDI 2026 tutorial on writing performance-portable kernels, video talks, and Jupyter notebooks accessible through Google Colab and AMD DevCloud, making the technology accessible to researchers and practitioners interested in GPU kernel development.

helion
by
pytorch

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

helion
by
pytorchpytorch/helion

Repository Details

helion by pytorch

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

helion by pytorchpytorch/helion

Repository Details

helion
by
pytorch

helion
by
pytorchpytorch/helion