minicpm-o
by
openbmb

Description: A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone

View openbmb/minicpm-o on GitHub ↗

Summary Information

Updated 2 hours ago
Added to GitGenius on September 5th, 2025
Created on January 29th, 2024
Open Issues/Pull Requests: 81 (+0)
Number of forks: 1,845
Total Stargazers: 23,901 (+1)
Total Subscribers: 164 (+0)
Detailed Description

The `minicpm-v` repository, developed by OpenBMB, presents a collection of resources for training and evaluating small language models (SLMs) specifically designed for the medical domain. It focuses on the Chinese medical language, aiming to provide a readily accessible and reproducible platform for research and development in this area. The core contribution is a suite of datasets, a base model, and training/evaluation pipelines, all geared towards making medical SLMs more attainable for researchers with limited computational resources.

At the heart of the project is the MiniCPM-V model, a 2.7 billion parameter language model pre-trained on a massive corpus of general domain Chinese text and then further fine-tuned on a curated collection of medical data. This two-stage approach leverages the benefits of pre-training on broad knowledge while specializing the model for the nuances of medical language. The model architecture is based on GLM (General Language Model), known for its strong performance in various natural language understanding and generation tasks. Importantly, the repository emphasizes the model's relatively small size, making it feasible to train and deploy on consumer-grade hardware, unlike many larger, more demanding medical LLMs.

The repository provides several key datasets. The primary training dataset, MiniCPM-V-Dataset, is a carefully constructed collection of Chinese medical texts sourced from various publicly available resources, including medical textbooks, research papers, clinical guidelines, and online forums. This dataset is designed to cover a wide range of medical specialties and topics. Alongside the training data, the repository includes evaluation datasets for different tasks, such as medical question answering, clinical report generation, and medical dialogue. These datasets are crucial for assessing the model's performance and identifying areas for improvement. The datasets are formatted to be easily used with the provided training and evaluation scripts.

The `minicpm-v` repository isn't just about the model and data; it also provides a complete training and evaluation pipeline. This includes scripts for data preprocessing, model training, and performance evaluation. The training pipeline utilizes the DeepSpeed library for efficient distributed training, allowing researchers to scale training across multiple GPUs. Evaluation scripts are provided for common medical NLP tasks, enabling a standardized assessment of model capabilities. The repository also includes detailed instructions and documentation to guide users through the process of training, fine-tuning, and evaluating the model.

Finally, the project actively encourages community contributions and collaboration. The repository is open-source, allowing researchers to modify and extend the model and datasets. The developers provide a clear roadmap for future development, including plans to expand the datasets, improve the model architecture, and explore new applications of medical SLMs. The goal is to foster a collaborative ecosystem for advancing research in Chinese medical language processing, making sophisticated medical AI tools more accessible to a wider audience.

minicpm-o
by
openbmbopenbmb/minicpm-o

Repository Details

Fetching additional details & charts...