sebastianruder/NLP-progress

Description: Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

View on GitHub ↗Jump to charts ↓

Summary Information

Updated 21 minutes ago

Added to GitGenius on May 25th, 2026

Created on June 22nd, 2018

Open Issues & Pull Requests: 40 (+0)

Number of forks: 3,598

Total Stargazers: 22,956 (+0)

Total Subscribers: 1,245 (+0)

Issue Activity (beta)

Open issues: 3

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 1,380 days

Stale 30+ days: 2

Stale 90+ days: 2

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 22.0 hours

Mean response time: 7.6 days

90th percentile: 14.3 days

Tracked items: 2

Most active contributors

Siddharth-Latthe-07 - 1 events, 1 issues
sebastianruder - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

The nlp-progress repository serves as a comprehensive tracking system for state-of-the-art results and benchmark datasets across Natural Language Processing tasks. Maintained by Sebastian Ruder, the repository provides researchers and practitioners with a centralized reference point for understanding current performance levels on standard NLP benchmarks. The project is accessible both through its GitHub repository and via a dedicated website at nlpprogress.com, making it a widely-referenced resource in the NLP research community.

The repository's scope is extensive, covering both foundational NLP tasks and contemporary challenges. For English, it tracks over forty distinct tasks including automatic speech recognition, constituency parsing, coreference resolution, dialogue systems, machine translation, named entity recognition, question answering, sentiment analysis, and summarization, among many others. Beyond English, the repository extends coverage to multiple languages including Vietnamese, Hindi, Chinese, French, Russian, Spanish, Portuguese, Korean, Nepali, Bengali, Persian, Turkish, German, and Arabic, with language-specific task documentation for each.

The organizational structure centers on detailed markdown files for each task, where researchers can find descriptions of benchmark datasets, evaluation metrics, annotated examples, and tables of state-of-the-art results sorted by performance. The repository emphasizes published results from peer-reviewed papers, with exceptions made for influential preprints. Contributors are encouraged to add implementation links, distinguishing between official and unofficial code implementations when available.

Community contribution is facilitated through straightforward processes documented in the repository. Users can edit task files directly through GitHub's interface to add new results, or submit pull requests for entirely new tasks and datasets. The contribution guidelines specify that results should come from published papers, datasets should have been evaluated in at least one published work beyond their introduction, and code links should be provided when available. The repository maintains a wish list of missing tasks including bilingual dictionary induction, discourse parsing, and knowledge base population, indicating areas where community contributions would be particularly valuable.

The repository provides structured data export capabilities, allowing users to extract all tracked information into machine-readable JSON format with parsed tasks, descriptions, and state-of-the-art tables. This functionality supports programmatic access to the data for researchers building tools or conducting meta-analyses of NLP progress.

According to GitGenius activity tracking, the repository shows median issue and pull request response latency of 22 hours across tracked items, with mean latency of 182.3 hours, indicating active but variable engagement patterns. The most active contributors tracked include Siddharth-Latthe-07 and sebastianruder. The repository shares contributors with other significant projects including kaldi-asr/kaldi, streamlit/streamlit, and tensorflow/models, reflecting its position within a broader ecosystem of machine learning and NLP infrastructure projects. The repository is classified across multiple domains including natural language processing, benchmarking, state-of-the-art tracking, research progress documentation, datasets, evaluation metrics, and performance comparison.

sebastianruder/NLP-progress

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

NLP-progress
by
sebastianrudersebastianruder/NLP-progress

Repository Details

sebastianruder/NLP-progress

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

NLP-progress by sebastianrudersebastianruder/NLP-progress

Repository Details

NLP-progress
by
sebastianrudersebastianruder/NLP-progress