MiniCPM-V
by
OpenBMB

Description: A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on May 21st, 2026

Created on January 29th, 2024

Open Issues & Pull Requests: 53 (-1)

Number of forks: 2,016

Total Stargazers: 25,792 (+1)

Total Subscribers: 165 (+0)

Issue Activity (beta)

Open issues: 33

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 64 days

Stale 30+ days: 31

Stale 90+ days: 17

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

question (158)
feature (21)
Finetune (18)
inference (10)
documentation (5)
duplicate (3)
llamacpp (3)
SPECIAL ATTENTION (1)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 3.7 hours

Mean response time: 6.0 days

90th percentile: 12.5 days

Tracked items: 919

Most active contributors

tc-mb - 655 events, 264 issues
Cuiunbo - 571 events, 343 issues
LDLINGLINGLING - 253 events, 172 issues
iceflame89 - 240 events, 138 issues
qyc-98 - 225 events, 128 issues

Related by overlapping contributors

Detailed Description

MiniCPM-V is a series of multimodal large language models designed for efficient deployment on mobile and edge devices, enabling strong performance in image, video, and text understanding on phones and other resource-constrained platforms. The repository, maintained by OpenBMB, represents a significant effort to bring advanced vision-language capabilities to consumer devices without requiring cloud infrastructure.

The current flagship model, MiniCPM-V 4.6, contains only 1.3 billion parameters yet surpasses larger models like Gemma4-E2B-it in performance while achieving approximately 1.5 times higher token throughput than Qwen3.5-0.8B. This efficiency is achieved through an intra-ViT early compression technique derived from LLaVA-UHD v4, which reduces visual encoding computation costs by more than 50 percent. The model supports mixed 4x and 16x visual token compression rates, allowing flexible performance-efficiency trade-offs depending on task requirements. MiniCPM-V 4.6 can be deployed across iOS, Android, and HarmonyOS platforms with open-sourced edge adaptation code.

The repository also maintains MiniCPM-o 4.5, a 9-billion-parameter omnimodal model that extends capabilities toward real-time end-to-end interaction. This model approaches Gemini 2.5 Flash performance in vision and speech tasks while supporting full-duplex multimodal live streaming, meaning input streams (video and audio) and output streams (speech and text) do not block each other. This architecture enables simultaneous seeing, listening, and speaking in real-time conversations, plus proactive interactions like automated reminding.

Community engagement around the repository is substantial. GitGenius tracking shows 919 issues and pull requests with a median response latency of 3.7 hours and mean latency of 143.8 hours. The most active issue category is questions with 158 tracked items, followed by feature requests with 21 items and fine-tuning discussions with 18 items. Top contributors tc-mb, Cuiunbo, and LDLINGLINGLING have driven 655, 571, and 253 events respectively. The repository's contributor base overlaps with major projects including Microsoft's VSCode and TypeScript repositories, plus the Rust language repository, indicating involvement from experienced systems developers.

The project has achieved significant recognition and integration milestones. MiniCPM-V 4.6 was merged into Ollama's official model library, and both MiniCPM-V 4.6 and MiniCPM-o 4.5 now have API services available with public free API keys. The models have topped GitHub Trending and Hugging Face Trending multiple times. Integration support extends across major inference frameworks including llama.cpp, vLLM, and LLaMA-Factory, with ongoing work for additional frameworks like SGLang.

The repository is classified across multiple domains including multimodal models, vision-language systems, large language models, image understanding, text generation, AI assistants, deep learning, natural language processing, visual reasoning, and machine learning. Documentation includes technical reports, API specifications, and a comprehensive cookbook for diverse user scenarios. The project maintains bilingual support with both English and Chinese documentation, and provides community channels through Discord and Feishu for user support and collaboration.

MiniCPM-V
by
OpenBMB

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

MiniCPM-V
by
OpenBMBOpenBMB/MiniCPM-V

Repository Details

MiniCPM-V by OpenBMB

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

MiniCPM-V by OpenBMBOpenBMB/MiniCPM-V

Repository Details

MiniCPM-V
by
OpenBMB

MiniCPM-V
by
OpenBMBOpenBMB/MiniCPM-V