Repository Issue Activity (beta)

deepseek-ai/FlashMLA

Current issue state, recent activity, and per-issue timelines from the indexed issue data.

Open Issues
66
New in 7 Days
0
Closed in 7 Days
0
Average Open Age
266 days
Stale 30+ Days
64
Stale 90+ Days
60
Last 2 Weeks
DateOpenedClosedCommentsEventsOpen Backlog
2026-06-1100000
2026-06-1000000
2026-06-0900000
2026-06-0800000
2026-06-0700000
2026-06-0600000
2026-06-0500000
2026-06-0400000
2026-06-0300000
2026-06-0200000
2026-06-0100111
2026-05-3100000
2026-05-3000000
2026-05-2900000
This Week

Opened: 0

Closed: 0

Comments: 0

Events: 0

Top Labels

No label distribution is available yet.

Issue Explorer
IssueAuthorStateLabelsCommentsReactionsUpdated

#66 Ampere architecture FlashMLA bring-up

Opened 1 year ago
pzhao-eng
open
No labels
2910 days ago

#169 dual gemm

Opened 3 months ago
WalrusWTQ
open
No labels
1021 days ago

#180 Make FlashMLA a libtorch and cpython stable extension

Opened 1 month ago
janeyx99
open
No labels
001 month ago

#179 Do you open to add SM80 support

Opened 1 month ago
haosdent
open
No labels
001 month ago

#149 [Question] DSA VS MLA Prefill Benchmark On H100

Opened 5 months ago
ZavierXing
open
No labels
102 months ago

#171 CUTLASS Internal Error during run on SM100 with long sequences (seq_len=1M)

Opened 3 months ago
For-rest2005
closed - completed
No labels
002 months ago

#172 Use `pyproject.toml`

Opened 3 months ago
sakgoyal
open
No labels
003 months ago

#168 flash_mla.flash_mla_with_kvcache have plans to support FP8 for the q data in the future or FP8 insufficient to support model accuracy?

Opened 3 months ago
zhangfengwei
open
No labels
003 months ago

#166 [Question]Questions regarding INTERLEAVE vs. SW128 Layouts for SM90 Sparse Attention Decode

Opened 4 months ago
pengwubj
open
No labels
004 months ago

#125 Can MHA be used in the DSA prefill stage?

Opened 6 months ago
starwang1024
open
No labels
114 months ago

#165 [Question] why not using FP8 rope in sparse FP8 decoding?

Opened 4 months ago
lyppg
open
No labels
004 months ago

#164 mla

Opened 4 months ago
TZWX-0
open
No labels
004 months ago

#161 [CUDA/CUTLASS] Improvements for varlen option derivation, input validation, can_implement errors, and workspace handling

Opened 4 months ago
red1239109-cmd
open
No labels
004 months ago

#159 RuntimeError: CUBLAS_STATUS_INVALID_VALUE in ref_sparse_attn_decode on H800 (Hopper)

Opened 5 months ago
Socratesa
closed - completed
No labels
104 months ago

#155 [Question] What is MODEL1?

Opened 5 months ago
gary-wjc
open
No labels
7274 months ago

#158 [Bug/Correctness] Hardcoded device_id=0 + missing CUDAGuard can break multi-GPU correctness (wrong hw_info / stream mismatch)

Opened 5 months ago
red1239109-cmd
open
No labels
205 months ago

#153 Compilation Error: exceeds maximum register limit.

Opened 5 months ago
GrateVoyage
closed - completed
No labels
205 months ago

#154 Missing the included header file: #include <span>

Opened 5 months ago
GrateVoyage
closed - completed
No labels
205 months ago

#148 FlashMLA for Blackwell architecture (B200)

Opened 5 months ago
abdul7mohsen
open
No labels
015 months ago

#126 does flash_mla_with_kvcache work only in paged mode?

Opened 6 months ago
vince62s
open
No labels
006 months ago

#119 Built FlashMLA for Windows and Nvidia sm_120 (workstation/50s) cards

Opened 7 months ago
IISuperluminaLII
open
No labels
406 months ago

#124 support for rtx 6000 sm120

Opened 6 months ago
fernandaspets
open
No labels
016 months ago

#121 Build error on cuda13 arm64

Opened 7 months ago
icavanyu
open
No labels
107 months ago

#118 Why is it much slower than Tilelang?

Opened 8 months ago
zzyplaybasketball
closed - completed
No labels
207 months ago

#48 Why warp specialization is faster than the older traditional style?

Opened 1 year ago
sleepwalker2017
open
No labels
207 months ago

Rows per page:

1–25 of 100