BitNet
by
microsoft

Description: Official inference framework for 1-bit LLMs

View on GitHub ↗

Summary Information

Updated 4 minutes ago

Added to GitGenius on October 9th, 2025

Created on August 5th, 2024

Open Issues & Pull Requests: 303 (+0)

Number of forks: 3,635

Total Stargazers: 39,649 (+6)

Total Subscribers: 347 (+0)

Issue Activity (beta)

Open issues: 191

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 252 days

Stale 30+ days: 187

Stale 90+ days: 176

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 2

Events in 7 days: 2

Top labels

No label distribution available yet.

Most active issues this week

#336 RISC-V Support - 2 events / 2 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 10.4 hours

Mean response time: 12.9 days

90th percentile: 19.3 days

Tracked items: 257

Most active contributors

sd983527 - 95 events, 63 issues
potassiummmm - 46 events, 34 issues
grctest - 19 events, 13 issues
Manamama - 16 events, 4 issues
Dead-Bytes - 13 events, 4 issues

Related by overlapping contributors

Detailed Description

The Microsoft BitNet repository introduces a groundbreaking approach to Large Language Models (LLMs) by implementing BitNet b1.58, a class of 1-bit LLMs designed to drastically reduce memory footprint and computational costs while maintaining competitive performance. Traditional LLMs rely on high-precision floating-point numbers (e.g., FP16 or FP32) for their weights and activations, leading to models that are prohibitively large and resource-intensive for many applications, especially on edge devices or with limited hardware. BitNet directly addresses this challenge by quantizing model weights to a single bit (binary values: -1 or +1) and activations to low-bit integers, typically 1.58 bits, hence the "b1.58" designation.

The core innovation lies in the `BitLinear` layer, which replaces standard linear layers in transformer architectures. This layer performs matrix multiplications using 1-bit weights, significantly reducing the memory required to store the model parameters by up to 32 times compared to FP32 models. Beyond storage, 1-bit operations are inherently more energy-efficient and faster to compute, as they can be implemented using simple bitwise operations rather than complex floating-point arithmetic. The repository provides a PyTorch implementation of this layer, along with the necessary infrastructure to build, train, and infer with 1-bit transformer models.

Training these highly quantized models presents unique challenges. The sign function, crucial for converting full-precision weights to 1-bit, is non-differentiable, making direct gradient-based optimization impossible. BitNet overcomes this by employing the Straight-Through Estimator (STE) during the backward pass. STE approximates the gradient of the sign function, allowing gradients to flow through the binary weights and enabling end-to-end training using standard optimization techniques like Adam. Additionally, the `BitLinear` layer incorporates learnable scaling factors for both weights and activations, which are critical for preserving information and achieving high accuracy despite the extreme quantization. These scaling factors are full-precision and are learned during training, providing a crucial degree of freedom.

The repository demonstrates that BitNet b1.58 models can achieve performance comparable to their full-precision counterparts, such as LLaMA models, across various benchmarks. This near-state-of-the-art performance with vastly reduced resource requirements opens up new possibilities for deploying powerful LLMs in environments previously deemed impossible. For instance, a 3-billion parameter BitNet model could potentially fit into memory typically allocated for a much smaller, less capable full-precision model, making LLMs accessible on mobile devices, embedded systems, or within resource-constrained cloud environments.

The GitHub repository is well-structured, offering a comprehensive toolkit for researchers and developers. It includes the `bitlinear.py` module, which defines the core 1-bit linear layer, along with examples of how to integrate it into a transformer architecture. It also provides scripts for training BitNet models from scratch, pre-trained model checkpoints, and instructions for inference. The project emphasizes reproducibility and ease of use, encouraging wider adoption and further research into extreme quantization for LLMs. By pushing the boundaries of model compression, BitNet represents a significant step towards democratizing access to advanced AI capabilities, making LLMs more efficient, sustainable, and universally deployable.

BitNet
by
microsoft

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

BitNet
by
microsoftmicrosoft/BitNet

Repository Details

BitNet by microsoft

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

BitNet by microsoftmicrosoft/BitNet

Repository Details

BitNet
by
microsoft

BitNet
by
microsoftmicrosoft/BitNet