burn
by
tracel-ai

Description: Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

View tracel-ai/burn on GitHub ↗

Summary Information

Updated 2 hours ago
Added to GitGenius on July 23rd, 2025
Created on July 18th, 2022
Open Issues/Pull Requests: 226 (+0)
Number of forks: 827
Total Stargazers: 14,426 (+1)
Total Subscribers: 95 (+0)
Detailed Description

Burn is a modular and flexible PyTorch library focused on accelerating and simplifying the deployment of machine learning models, particularly large language models (LLMs), across a diverse range of hardware backends. It aims to bridge the gap between research and production by providing a unified interface for model compilation, optimization, and execution, while supporting various precision levels (FP32, FP16, BF16, INT8, INT4) and quantization techniques. The core philosophy revolves around composability – allowing users to mix and match different components to tailor the deployment pipeline to their specific needs and hardware constraints.

At its heart, Burn utilizes a graph-intermediate representation (IR) called `GraphTensor`. This IR decouples the model definition from the underlying hardware, enabling optimizations to be applied independently of the original framework. Models are first loaded from popular frameworks like PyTorch and then converted into this `GraphTensor` representation. This conversion process is handled by "backends" which are responsible for translating the PyTorch operations into the Burn IR. Currently, Burn supports a growing list of backends including CUDA, CPU, Metal (Apple Silicon), and WebGPU, with ongoing work to expand this list. This backend-centric design allows for easy addition of new hardware support without modifying the core Burn library.

A key feature of Burn is its support for various quantization techniques. It offers both post-training quantization (PTQ) and quantization-aware training (QAT). PTQ allows for quick and easy model compression with minimal retraining, while QAT provides higher accuracy at lower precision by incorporating quantization into the training loop. Burn’s quantization capabilities are designed to be highly configurable, allowing users to fine-tune the quantization parameters to achieve the best trade-off between accuracy and performance. Specifically, it supports techniques like GPTQ, AWQ, and GGML/GGUF formats, making it compatible with a wide range of quantized models.

Burn’s modularity extends to its compilation and optimization stages. Users can choose from different compilation strategies, including just-in-time (JIT) compilation and ahead-of-time (AOT) compilation. AOT compilation can significantly improve performance by pre-compiling the model for a specific hardware target. Furthermore, Burn incorporates various optimization passes, such as operator fusion, memory layout optimization, and kernel selection, to maximize performance on the target hardware. These optimizations are applied to the `GraphTensor` representation before execution.

The repository includes examples demonstrating how to load, quantize, and deploy models on different backends. It also provides tools for benchmarking and profiling, allowing users to evaluate the performance of their models and identify bottlenecks. Burn is actively developed and maintained, with a strong focus on community contributions and expanding its capabilities. Its goal is to become a leading solution for deploying LLMs and other machine learning models efficiently and effectively across a wide range of hardware platforms, making powerful AI accessible to more users and applications.

burn
by
tracel-aitracel-ai/burn

Repository Details

Fetching additional details & charts...