pinot
by
apache

Description: Apache Pinot - A realtime distributed OLAP datastore

View on GitHub ↗

Summary Information

Updated 36 minutes ago

Added to GitGenius on January 3rd, 2025

Created on May 19th, 2014

Open Issues & Pull Requests: 1,477 (+6)

Number of forks: 1,485

Total Stargazers: 6,104 (+0)

Total Subscribers: 224 (+0)

Issue Activity (beta)

Open issues: 1,064

New in 7 days: 0

Closed in 7 days: 2

Avg open age: 1,264 days

Stale 30+ days: 982

Stale 90+ days: 904

Recent activity

Opened in 7 days: 0

Closed in 7 days: 1

Comments in 7 days: 3

Events in 7 days: 6

Top labels

bug (323)
stale (290)
feature (273)
multi-stage (173)
good first issue (165)
help wanted (133)
ingestion (120)
enhancement (114)

Most active issues this week

#18867 [Bug] Possible Realtime consumer semaphore leak: orphaned `RealtimeSegmentDataManager` can block the successor segment indefinitely - 3 events / 1 comments
#13038 Fix the resource leak within the tests - 2 events / 1 comments
#13055 Do not automatically execute queries in the URL - 2 events / 1 comments
#13061 Row level TTLs in Pinot - 2 events / 1 comments
#13076 Automated deployment of schema and table configuration changes - 2 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 25.4 hours

Mean response time: 566.6 days

90th percentile: 2352.8 days

Tracked items: 1,298

Most active contributors

Jackie-Jiang - 1,657 events, 731 issues
xiangfu0 - 701 events, 353 issues
yashmayya - 257 events, 120 issues
ankitsultana - 194 events, 88 issues
gortiz - 169 events, 81 issues

Related by overlapping contributors

Detailed Description

Apache Pinot is a real-time distributed OLAP datastore engineered to deliver scalable analytics with low latency on massive datasets. Written in Java, it was originally built by engineers at LinkedIn and Uber to power interactive real-time analytic applications. At LinkedIn, Pinot powers over 50 user-facing products, ingesting millions of events per second while serving 100,000 or more queries per second at millisecond latency. The system is designed to scale horizontally with no upper bound, maintaining constant performance based on cluster size and expected query throughput.

The platform supports dual ingestion modes, accepting batch data from sources like Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage, as well as streaming data from Apache Kafka, Apache Pulsar, and AWS Kinesis. This hybrid capability allows organizations to combine batch and streaming sources into unified tables for querying. Pinot also supports upsert operations during real-time ingestion, enabling at-scale data updates with consistency guarantees.

Pinot's query capabilities center on a standard SQL interface accessible through a built-in query editor and REST API. The system can filter and aggregate petabyte-scale datasets with P90 latencies in the tens of milliseconds, making it suitable for interactive UI applications. It supports versatile joins including arbitrary fact-to-dimension and fact-to-fact operations on petabyte datasets. The architecture is column-oriented with various compression schemes such as Run Length and Fixed Bit Length encoding.

The indexing system is pluggable, offering multiple technologies including timestamp, inverted, StarTree, Bloom filter, range, text search, JSON, and geospatial indexes. Built-in multitenancy enables data management and security across isolated logical namespaces, supporting cloud-friendly resource allocation. The platform is cloud-native on Kubernetes with Helm charts providing horizontally scalable and fault-tolerant clustered deployments.

Community activity around the repository shows sustained engagement with a median issue and pull request response latency of 25.4 hours across 1,298 tracked items. The most active contributor, Jackie-Jiang, has logged 1,657 events, followed by xiangfu0 with 699 events and yashmayya with 257 events. Issue tracking reveals 285 stale items, 235 feature requests, and 208 bug reports as the most active label categories. The repository shares overlapping contributors with major projects including microsoft/vscode, microsoft/typescript, and rust-lang/rust, indicating cross-pollination with the broader open-source ecosystem.

Pinot is particularly well-suited for executing real-time OLAP queries on immutable data with fast aggregations and analytics. It excels at querying time-series data with numerous dimensions and metrics. The system is designed for contexts requiring both real-time stream ingestion and batch processing while maintaining consistent low-latency query performance. Building Pinot uses Maven, with optimized development builds available through the pinot-fastdev profile for faster iteration cycles.

pinot
by
apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pinot
by
apacheapache/pinot

Repository Details

pinot by apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pinot by apacheapache/pinot

Repository Details

pinot
by
apache

pinot
by
apacheapache/pinot