unilm
by
microsoft

Description: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

View on GitHub ↗

Summary Information

Updated 24 minutes ago

Added to GitGenius on April 6th, 2024

Created on July 23rd, 2019

Open Issues & Pull Requests: 682 (+0)

Number of forks: 2,704

Total Stargazers: 22,158 (+0)

Total Subscribers: 297 (+0)

Issue Activity (beta)

Open issues: 493

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 1,041 days

Stale 30+ days: 492

Stale 90+ days: 491

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 20.7 days

Mean response time: 195.9 days

90th percentile: 680.6 days

Tracked items: 192

Most active contributors

donglixp - 29 events, 20 issues
YTianZHU - 27 events, 15 issues
Dod-o - 10 events, 5 issues
pengzhiliang - 10 events, 9 issues
werruww - 10 events, 1 issues

Related by overlapping contributors

Detailed Description

The UniLM repository represents Microsoft's comprehensive research initiative in large-scale self-supervised pre-training across diverse tasks, languages, and modalities. The project encompasses foundation model architectures and implementations spanning natural language processing, computer vision, speech processing, and multimodal AI systems.

The repository's architectural research focuses on three primary dimensions. DeepNet addresses training stability by scaling Transformers to 1,000 layers and beyond. Foundation Transformers, also known as Magneto, pursue generality across tasks and modalities including language, vision, speech, and multimodal combinations. The project also explores length-extrapolatable Transformers and X-MoE, a scalable sparse Mixture-of-Experts system designed for efficiency and transferability. Recent architectural innovations include BitNet, which implements 1-bit Transformers for large language models, RetNet as a Transformer successor, and LongNet, which scales to handle one billion tokens.

The foundation models section demonstrates the evolution of multimodal large language models through the Kosmos series, progressing from Kosmos-1 through Kosmos-2.5, which represents a multimodal literate model. MetaLM explores language models as general-purpose interfaces. The repository's language and multilingual capabilities include UniLM for unified pre-training across understanding and generation tasks, InfoXLM and XLM-E for 100+ languages, DeltaLM and mT6 for encoder-decoder pre-training in machine translation, and MiniLM for efficient models. Recent additions include EdgeLM for edge device deployment, SimLM for similarity matching, E5 for text embeddings, and MiniLLM for knowledge distillation of large language models.

Vision-focused models include BEiT and BEiT-2 for generative self-supervised pre-training, DiT for document image transformers, and TextDiffuser series implementing diffusion models for text generation. Speech processing is addressed through WavLM for comprehensive speech tasks and VALL-E, a neural codec language model for text-to-speech synthesis.

Multimodal systems represent a significant focus, with LayoutLM variants providing document foundation models for document AI applications handling scanned documents and PDFs. LayoutXLM extends this to multilingual document understanding. MarkupLM addresses markup language pre-training, while XDoc provides unified cross-format document understanding. Additional multimodal work includes UniSpeech variants for speech representation learning, SpeechT5 for spoken language processing, SpeechLM for enhanced speech pre-training, VLMo for vision-language pre-training, VL-BEiT extending BEiT to multimodal contexts, and BEiT-3 as a general-purpose multimodal foundation model representing the convergence of large-scale pre-training across tasks, languages, and modalities.

The repository includes practical toolkits such as s2s-ft for sequence-to-sequence fine-tuning and Aggressive Decoding for efficient decoding. Applications include TrOCR for transformer-based optical character recognition, LayoutReader for reading order detection, and XLM-T for multilingual neural machine translation.

According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 497.2 hours with mean latency of 4700.4 hours across 192 tracked items. Primary contributors include donglixp with 29 events, YTianZHU with 27 events, and Dod-o with 10 events. The repository shares overlapping contributors with microsoft/vscode, microsoft/typescript, and rust-lang/rust, indicating cross-project collaboration within Microsoft's research ecosystem.

unilm
by
microsoft

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

unilm
by
microsoftmicrosoft/unilm

Repository Details

unilm by microsoft

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

unilm by microsoftmicrosoft/unilm

Repository Details

unilm
by
microsoft

unilm
by
microsoftmicrosoft/unilm