dolphin
by
bytedance

Description: The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

View bytedance/dolphin on GitHub ↗

Summary Information

Updated 6 minutes ago
Added to GitGenius on September 26th, 2025
Created on May 13th, 2025
Open Issues/Pull Requests: 73 (+0)
Number of forks: 742
Total Stargazers: 8,830 (+0)
Total Subscribers: 69 (+0)
Detailed Description

Dolphin, developed by ByteDance, is presented as a distributed, real-time, high-performance Key-Value (KV) store designed to address the demanding requirements of large-scale internet applications. It aims to provide a robust, scalable, and low-latency solution for managing massive datasets and high concurrency, overcoming the limitations often found in single-node KV stores or less optimized distributed systems.

At its core, Dolphin is engineered for exceptional performance, boasting low-latency access and high throughput, crucial for real-time services. Its distributed architecture ensures horizontal scalability, allowing the system to expand seamlessly by adding more nodes as data volume and request loads grow. A key feature is its strong consistency model for metadata, achieved through the use of Paxos or Raft algorithms, which guarantees the reliability and integrity of cluster configurations and shard mappings. For data, multi-replica replication and Write-Ahead Logging (WAL) ensure durability and high availability, providing strong consistency within replica groups.

Dolphin supports a rich set of data types, mirroring popular KV stores like Redis, including strings, lists, sets, hashes, and sorted sets. This versatility allows developers to model various data structures efficiently. To optimize storage and access, Dolphin incorporates tiered storage, enabling the separation of hot (frequently accessed) and cold (less frequently accessed) data. This allows for the use of different storage media, such as high-speed SSDs for hot data and cost-effective HDDs for cold data, balancing performance with cost efficiency. Furthermore, it supports multi-tenancy, allowing multiple independent applications to share the same cluster resources securely and efficiently.

The architecture of Dolphin comprises several key components. Clients interact with a stateless Proxy layer, which is responsible for request routing, load balancing, and handling client connections. The Proxy forwards requests to the appropriate Storage Nodes, which are the workhorses of the system. Each Storage Node manages data shards, handles replication, and uses an underlying storage engine like RocksDB for persistent data storage. A critical component is the Meta Server, which maintains the cluster's metadata, including shard distribution, node status, and configuration, ensuring its strong consistency and reliability through Paxos/Raft.

Dolphin's capabilities extend beyond a basic KV store with the integration of DolphinDB, a SQL-like query engine built on top of Dolphin. This provides users with more powerful analytical query capabilities, bridging the gap between high-performance operational data storage and flexible data analysis. The system also includes comprehensive monitoring and management tools, simplifying cluster operations and ensuring visibility into its performance and health.

The applications for Dolphin are broad and impactful. It is ideally suited for real-time data services such as user profiles, recommendation systems, and advertising platforms, where immediate data access is paramount. Other significant use cases include caching layers, message queues, IoT data storage, and online gaming, all of which demand high concurrency, low latency, and robust data persistence. By offering a comprehensive, high-performance, and scalable solution, Dolphin positions itself as a vital component for modern internet infrastructure.

dolphin
by
bytedancebytedance/dolphin

Repository Details

Fetching additional details & charts...