compressed-tensors
by
vllm-project

Description: A safetensors extension to efficiently store sparse quantized tensors on disk

View on GitHub ↗

Summary Information

Updated 2 hours ago

Added to GitGenius on November 12th, 2024

Created on April 2nd, 2024

Open Issues & Pull Requests: 47 (+0)

Number of forks: 104

Total Stargazers: 299 (+0)

Total Subscribers: 10 (+0)

Issue Activity (beta)

Open issues: 9

New in 7 days: 1

Closed in 7 days: 1

Avg open age: 74 days

Stale 30+ days: 3

Stale 90+ days: 0

Recent activity

Opened in 7 days: 1

Closed in 7 days: 1

Comments in 7 days: 1

Events in 7 days: 6

Top labels

good first issue (11)
enhancement (5)
keep-open (5)
bug (4)
help wanted (2)
stale (2)
RFC (1)
documentation (1)

Most active issues this week

#758 [Bug] Unreachable GROUP+dynamic branch in QuantizationScheme.validate_model_after - 10 events / 2 comments
#746 Multi-XPU tests intermittently hang or time out across repeated torchrun invocations - 3 events / 1 comments
#762 [Bug] block_structure accepts non-positive dimensions - 1 events / 0 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 18.1 hours

Mean response time: 8.2 days

90th percentile: 22.0 days

Tracked items: 69

Most active contributors

dsikka - 74 events, 41 issues
brian-dellabetta - 62 events, 29 issues
kylesayrs - 41 events, 23 issues
yiliu30 - 11 events, 4 issues
HDCharles - 9 events, 7 issues

Related by overlapping contributors

Detailed Description

The compressed-tensors repository is a Python library that extends the safetensors format to provide efficient storage and management of sparse quantized tensors on disk. It addresses a critical problem in the machine learning deployment landscape: the fragmentation of compression formats across different quantization and sparsity techniques. Rather than requiring separate storage formats and loading procedures for each compression method, compressed-tensors offers a single, unified format capable of representing diverse optimization schemes including GPTQ, AWQ, SmoothQuant, INT8, FP8, and SparseGPT.

The library's core purpose is to simplify the handling of model compression by supporting multiple quantization approaches within one consistent framework. It enables weight-only quantization schemes such as W4A16, W8A16, and WnA16, as well as activation quantization like W8A8 and KV cache quantization. A particularly notable feature is its support for non-uniform quantization schemes, allowing different layers within a single model to be quantized using different methods. Beyond quantization, compressed-tensors handles both unstructured and semi-structured sparsity patterns, such as 2:4 sparsity, making it applicable to a broad range of neural network optimization techniques.

The repository is designed with practical deployment in mind, offering seamless integration with Hugging Face models and PyTorch. This integration strategy reflects its positioning within the broader vllm-project ecosystem, which includes related repositories like vllm and llm-compressor that share overlapping contributors. The library can be installed from PyPI in both stable and nightly release versions, or directly from source, making it accessible to developers at various stages of adoption.

From a community perspective, the repository shows active maintenance and engagement. GitGenius tracking data reveals a median issue and pull request response latency of 14.8 hours across 68 tracked items, indicating responsive project management. The most active contributors include dsikka with 74 tracked events, brian-dellabetta with 60 events, and kylesayrs with 40 events. The project emphasizes accessibility to new contributors, with good first issue labels appearing most frequently among tracked issue categories, followed by enhancement and keep-open labels. This suggests an intentional effort to maintain an open and welcoming development environment.

The library's classification across multiple domains including model optimization, neural networks, pruning, storage efficiency, quantization techniques, and sparse tensors reflects its comprehensive approach to the compression problem. By providing a unified checkpoint format that works with popular quantization methods while maintaining flexibility for experimental composition of different techniques, compressed-tensors aims to reduce the overhead of supporting multiple compression formats in inference engines. This directly supports the goal of simplifying model deployment pipelines for developers and researchers working with compressed neural networks.

compressed-tensors
by
vllm-project

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

compressed-tensors
by
vllm-projectvllm-project/compressed-tensors

Repository Details

compressed-tensors by vllm-project

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

compressed-tensors by vllm-projectvllm-project/compressed-tensors

Repository Details

compressed-tensors
by
vllm-project

compressed-tensors
by
vllm-projectvllm-project/compressed-tensors