Repository Issue Activity (beta feature)

bigcode-project/bigcode-dataset

Current issue state, recent activity, and per-issue timelines from the indexed issue data.

Open Issues
10
New in 7 Days
0
Closed in 7 Days
0
Average Open Age
862 days
Stale 30+ Days
10
Stale 90+ Days
9
Last 14 days
DateOpenedClosedCommentsEventsOpen Backlog
2026-04-0900000
2026-04-1000000
2026-04-1100000
2026-04-1200000
2026-04-1300000
2026-04-1400000
2026-04-1500000
2026-04-1600000
2026-04-1700000
2026-04-1800000
2026-04-1900000
2026-04-2000000
2026-04-2100000
2026-04-2200000
This week

Opened: 0

Closed: 0

Comments: 0

Events: 0

Top labels
help wanted (8)
TF: Dataset Curation and Filtering (7)
TF: PII redaction (3)
TF: The Stack 1.1 (2)
good first issue (2)
question (2)
TF: Dataset index (1)
TF: StackOverflow (1)
Issue explorer
IssueStateLabelsCommentsReactionsUpdated

#70 [Security Alert] Exposed API key(s) detected: AWS Access Key

Opened by hdhdn 1 month ago
open
No labels
001 month ago

#68 During your processing, have you ever encountered the need to extract part of the code? How was it handled?

Opened by cistinej 2 years ago
open
No labels
002 years ago

#65 Most CMake files missed when categorizing by extension

Opened by mdewing 2 years ago
open
No labels
002 years ago

#62 百度云 连接 cloud cleaned database?

Opened by willshion 2 years ago
open
No labels
002 years ago

#59 When I do pii_inference, cannot load bigcode/bigcode-encoder-pii-ner-v2

Opened by RuochenLowes 3 years ago
open
No labels
003 years ago

#55 Some file extensions excluded from the published dataset (Racket)

Opened by flobbit1 3 years ago
open
No labels
003 years ago

#54 HuggingFace Need Data Access Approval

Opened by heoun 3 years ago
open
No labels
003 years ago

#53 From GH Archive to bigcode/the-stack-github-issues

Opened by yunzheng-r 3 years ago
open
No labels
003 years ago

#44 Question: File Counts and Dataset Size

Opened by darien-schettler 3 years ago
open
No labels
103 years ago

#35 Deduplication also removes data < ngram_size

Opened by cceyda 3 years ago
closed - completed
No labels
303 years ago

#13 Build StackerFlow datasets

Opened by lvwerra 3 years ago
closed - completed
help wanted
TF: StackOverflow
003 years ago

#33 Create text-code pairs from Jupyter Notebooks

Opened by loubnabnl 3 years ago
closed - completed
No labels
003 years ago

#32 Define filters for git commits

Opened by lvwerra 3 years ago
closed - completed
No labels
103 years ago

#31 Define filters for cleaning GitHub issues

Opened by lvwerra 3 years ago
closed - completed
No labels
103 years ago

#30 Run language detection GitHub issues

Opened by lvwerra 3 years ago
closed - completed
No labels
503 years ago

#28 NER models for PII

Opened by loubnabnl 3 years ago
closed - completed
No labels
023 years ago

#27 Refactor PII Code

Opened by loubnabnl 3 years ago
closed - completed
No labels
003 years ago

#16 Decontaminate pretraining dataset from evaluation benchmarks

Opened by lvwerra 3 years ago
closed - completed
help wanted
TF: Dataset Curation and Filtering
003 years ago

#15 Build dataset index

Opened by lvwerra 3 years ago
closed - completed
help wanted
TF: Dataset index
003 years ago

#12 Create dataset with GitHub metadata

Opened by lvwerra 3 years ago
closed - completed
help wanted
TF: Dataset Curation and Filtering
003 years ago

#3 Suggest datasets for Code Dataset Catalogue

Opened by lvwerra 4 years ago
closed - completed
good first issue
help wanted
703 years ago

#2 Which languages to include?

Opened by lvwerra 4 years ago
closed - completed
question
2003 years ago

#6 Parse code dataset into AST

Opened by harm-devries 4 years ago
closed - completed
No labels
303 years ago

#19 Create dataset with git commits

Opened by lvwerra 3 years ago
closed - completed
TF: Dataset Curation and Filtering
003 years ago

#34 Convert Jupyter Notebooks to scripts

Opened by loubnabnl 3 years ago
closed - completed
No labels
013 years ago

Rows per page:

1–25 of 40