bert
by
google-research

Description: TensorFlow code and pre-trained models for BERT

View on GitHub ↗

Summary Information

Updated 40 minutes ago

Added to GitGenius on November 21st, 2023

Created on October 25th, 2018

Open Issues & Pull Requests: 880 (+0)

Number of forks: 9,695

Total Stargazers: 40,053 (+0)

Total Subscribers: 0 (+0)

Issue Activity (beta)

Open issues: 790

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 2,290 days

Stale 30+ days: 790

Stale 90+ days: 790

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 514.1 days

Mean response time: 923.2 days

90th percentile: 2067.4 days

Tracked items: 19

Most active contributors

Beatriz75 - 2 events, 1 issues
AlbertoMQ - 1 events, 1 issues
BismaAyaz - 1 events, 1 issues
Hu-Pen-Chih - 1 events, 1 issues
Ismail-Ifakir - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

BERT, or Bidirectional Encoder Representations from Transformers, is a TensorFlow implementation of a pre-trained language model developed by Google Research. The repository provides both the code and pre-trained model checkpoints for BERT, which represents a fundamental shift in natural language processing by introducing the first unsupervised, deeply bidirectional system for pre-training language representations. The approach trains a general-purpose language understanding model on large text corpora like Wikipedia, which can then be fine-tuned for downstream NLP tasks including question answering, text classification, sentiment analysis, and semantic understanding.

The repository has evolved significantly since its initial release, with multiple model variants and improvements released over time. In May 2019, the project introduced Whole Word Masking models, which improved upon the original preprocessing approach by masking all tokens corresponding to a word simultaneously rather than individual WordPiece tokens. This technique demonstrated measurable improvements on tasks like SQuAD 1.1 and Multi NLI. The March 2020 release brought 24 smaller BERT models designed for environments with restricted computational resources, ranging from BERT-Tiny with 2 layers and 128 hidden units to BERT-Base with 12 layers and 768 hidden units. These compact models maintain the standard BERT recipe while enabling research in institutions with fewer computational resources.

The repository includes multilingual and language-specific variants. A multilingual cased model supporting 104 languages was released in November 2018, along with a Chinese-specific model using character-based tokenization. The multilingual model was later updated to remove normalization operations, making it more suitable for languages with non-Latin alphabets like Thai and Mongolian. The project also achieved state-of-the-art results on multiple benchmarks, including first place on the SQuAD v1.1 leaderboard with 87.4 test EM and 93.2 test F1 scores, and strong performance on natural language inference tasks like MultiNLI with 86.7 accuracy.

The codebase provides practical tools for researchers and practitioners, including TensorFlow Hub module integration demonstrated through example scripts like run_classifier_with_tfhub.py and Colab notebooks for tasks such as predicting movie reviews. The repository's implementation details include support for both cased and uncased variants, with comprehensive tokenization implementations that handle multiple languages and character sets.

According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 12338.7 hours with a mean of 22156.3 hours across 19 tracked items. The most active contributors tracked include Beatriz75 with 2 events, while AlbertoMQ and BismaAyaz each contributed 1 event. The repository shares overlapping contributors with major projects including microsoft/vscode, microsoft/typescript, and rust-lang/rust, indicating its significance within the broader research and development community. The work is grounded in the academic paper available at arxiv.org/abs/1810.04805, which provides comprehensive details on the methodology and experimental results.

bert
by
google-research

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

bert
by
google-researchgoogle-research/bert

Repository Details

bert by google-research

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

bert by google-researchgoogle-research/bert

Repository Details

bert
by
google-research

bert
by
google-researchgoogle-research/bert