megablocks
by
databricks

Description: MegaBlocks is a lightweight Python library developed by Databricks for mixture-of-experts (MoE) training, with a focus on efficient distributed machine...

View on GitHub ↗

Summary Information

Updated 46 minutes ago

Added to GitGenius on April 11th, 2024

Created on January 26th, 2023

Open Issues & Pull Requests: 48 (+0)

Number of forks: 229

Total Stargazers: 1,578 (+0)

Total Subscribers: 15 (+0)

Issue Activity (beta)

Open issues: 38

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 618 days

Stale 30+ days: 38

Stale 90+ days: 38

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

question (6)
enhancement (2)
help wanted (2)
bug (1)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 18.5 hours

Mean response time: 67.6 days

90th percentile: 205.1 days

Tracked items: 31

Most active contributors

mvpatel2000 - 24 events, 15 issues
Guodanding - 11 events, 1 issues
rtmadduri - 9 events, 2 issues
Muennighoff - 4 events, 4 issues
Venkatesh3132003 - 3 events, 2 issues

Related by overlapping contributors

Detailed Description

MegaBlocks is a lightweight Python library developed by Databricks for mixture-of-experts (MoE) training, with a focus on efficient distributed machine learning. The repository's core contribution is the implementation of dropless-MoE (dMoE), a reformulation of mixture-of-experts layers using block-sparse operations that eliminates the need for token dropping while maintaining hardware efficiency. The library also provides standard MoE layer implementations alongside the dMoE variants, giving users flexibility in choosing their preferred approach.

The library achieves significant performance improvements over existing MoE training solutions. According to the repository documentation, MegaBlocks dMoEs outperform MoEs trained with Tutel by up to 40 percent when compared to Tutel's best performing capacity_factor configuration. More impressively, when compared to dense Transformers trained with Megatron-LM, MegaBlocks dMoEs can accelerate training by as much as 2.4 times. These performance gains are achieved through the block-sparse reformulation approach, which also simplifies the training process by removing the capacity_factor hyperparameter entirely, reducing the complexity of model configuration.

MegaBlocks is tightly integrated with Megatron-LM, NVIDIA's large-scale language model training framework, supporting data, expert, and pipeline parallel training of MoEs across distributed systems. The repository indicates that tighter integration with Databricks libraries and tools is planned for future releases. The library can be installed via pip for use in other frameworks and packages, making it accessible beyond just Megatron-LM environments. For example, Mixtral-8x7B models can be run with vLLM using MegaBlocks with this installation method.

The installation process offers multiple configuration options tailored to different use cases. The standard installation requires numpy and torch as dependencies. For Megatron-LM training, the project recommends using NVIDIA's NGC PyTorch container, with a provided Dockerfile for building a specialized development environment. Optional dependencies enable additional features, such as grouped GEMM computation for dMoE, which is currently recommended for Hopper-generation GPUs. Development and testing installations are available for contributors and continuous integration workflows.

According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 18.5 hours across 31 tracked items, with a mean latency of 1622.7 hours reflecting occasional longer-running discussions. The most active contributors include mvpatel2000 with 24 tracked events, Guodanding with 11 events, and rtmadduri with 9 events. Enhancement and help-wanted labels appear most frequently in issue tracking. The repository maintains connections with other major machine learning projects including PyTorch, vLLM, and DeepSpeed through overlapping contributor networks, indicating its role within the broader ecosystem of distributed training frameworks.

megablocks
by
databricks

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

megablocks
by
databricksdatabricks/megablocks

Repository Details

megablocks by databricks

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

megablocks by databricksdatabricks/megablocks

Repository Details

megablocks
by
databricks

megablocks
by
databricksdatabricks/megablocks