iceberg
by
apache

Description: Apache Iceberg

View on GitHub ↗

Summary Information

Updated 36 minutes ago

Added to GitGenius on January 5th, 2025

Created on November 19th, 2018

Open Issues & Pull Requests: 808 (+1)

Number of forks: 3,390

Total Stargazers: 9,033 (+0)

Total Subscribers: 179 (+0)

Issue Activity (beta)

Open issues: 391

New in 7 days: 25

Closed in 7 days: 7

Avg open age: 139 days

Stale 30+ days: 304

Stale 90+ days: 150

Recent activity

Opened in 7 days: 2

Closed in 7 days: 4

Comments in 7 days: 6

Events in 7 days: 15

Top labels

stale (1,803)
bug (767)
improvement (506)
question (321)
good first issue (135)
docs (90)
proposal (79)
python (68)

Most active issues this week

#16418 Spark: rewrite_table_path fails on second run with FileAlreadyExistsException for position delete files - 3 events / 1 comments
#14754 When spark writes to iceberg, only one executor is working, resulting in an oom - 2 events / 1 comments
#14790 ParallelIterable performance issues caused by O(N) queue size complexity - 2 events / 1 comments
#14857 Concurrency security issue during the process of rename metadata for metajson file - 2 events / 1 comments
#14864 Is there any way to quickly query count(*) and the min/max of a specified column? - 2 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.0 hours

Mean response time: 112.5 days

90th percentile: 382.6 days

Tracked items: 2,069

Most active contributors

nastra - 545 events, 270 issues
RussellSpitzer - 480 events, 284 issues
manuzhang - 472 events, 209 issues
pvary - 338 events, 159 issues
kevinjqliu - 250 events, 101 issues

Related by overlapping contributors

Detailed Description

Apache Iceberg is a high-performance table format designed for massive analytic datasets, developed under the Apache Software Foundation. The project provides a unified approach to managing large-scale data by enabling multiple query engines including Spark, Trino, Flink, Presto, Hive, and Impala to safely access and modify the same tables concurrently. The core Java library in this repository serves as the reference implementation for Iceberg across all language bindings and integrations.

The repository is organized into modular components that handle different aspects of table management and engine integration. The foundational modules include iceberg-common for shared utilities, iceberg-api for the public interface, and iceberg-core which contains the primary API implementations and Avro file support. Optional modules extend functionality for specific file formats: iceberg-parquet handles Parquet-backed tables, iceberg-arrow enables reading Parquet data into Arrow memory, and iceberg-orc provides ORC file support. The iceberg-hive-metastore module implements table backing through the Hive metastore Thrift client, while iceberg-data allows direct table access from JVM applications.

Engine-specific integration is handled through dedicated modules. The iceberg-spark module implements Spark's Datasource V2 API with submodules for different Spark versions, iceberg-flink contains Apache Flink integration classes, and iceberg-mr provides InputFormat and related classes for Apache Hive integration. The project maintains compatibility across multiple versions of these engines, with detailed compatibility information available through the Multi-Engine Support documentation.

The repository demonstrates significant community engagement and active maintenance. GitGenius tracking shows a median issue and pull request response latency of zero hours with a mean of approximately 2703 hours, indicating rapid initial triage followed by longer resolution periods for complex issues. The most active labels tracked are stale with 878 occurrences, bug with 732, and improvement with 494, reflecting ongoing maintenance and feature development. Key contributors nastra, RussellSpitzer, and manuzhang have driven 544, 480, and 472 tracked events respectively, establishing a core team managing the project's direction.

The project uses Gradle for building with Java 17 or 21 as requirements. The build system supports standard operations including full builds with tests, test skipping for faster builds, and code style fixes through spotlessApply commands. Testing infrastructure requires Docker, with specific considerations noted for macOS Docker Desktop users and SELinux configurations on certain systems.

Iceberg's format specification is documented as stable, with new features added in each version while maintaining backward compatibility. The project tracks issues and pull requests through GitHub and encourages contributions via pull requests, with community discussions occurring on the dev mailing list. Beyond the Java reference implementation, the Iceberg ecosystem includes implementations in Go, Python, Rust, and C++, each maintained as separate Apache projects. The project's classification spans schemas evolution, Hadoop compatibility, schema management, ACID transactions, partitioning, metadata management, and data lake functionality, positioning it as a comprehensive solution for scalable analytics on large datasets.

iceberg
by
apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

iceberg
by
apacheapache/iceberg

Repository Details

iceberg by apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

iceberg by apacheapache/iceberg

Repository Details

iceberg
by
apache

iceberg
by
apacheapache/iceberg