delta
by
delta-io

Description: An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

View on GitHub ↗

Summary Information

Updated 35 minutes ago

Added to GitGenius on May 5th, 2024

Created on April 22nd, 2019

Open Issues & Pull Requests: 1,535 (-1)

Number of forks: 2,134

Total Stargazers: 8,900 (+0)

Total Subscribers: 237 (+0)

Issue Activity (beta)

Open issues: 707

New in 7 days: 2

Closed in 7 days: 0

Avg open age: 638 days

Stale 30+ days: 679

Stale 90+ days: 629

Recent activity

Opened in 7 days: 1

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 3

Top labels

enhancement (512)
bug (432)
kernel (89)
good first issue (73)
acknowledged (66)
question (28)
delta-kernel (16)
kernel-spark (14)

Most active issues this week

#7140 [Feature Request] Implement kernel-based dsv2 delta streaming sink - 3 events / 0 comments
#7136 [BUG][Spark] Filter pruning (data skipping) - 1 events / 0 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.0 hours

Mean response time: 149.5 days

90th percentile: 671.2 days

Tracked items: 769

Most active contributors

AnudeepKonaboina - 225 events, 108 issues
felipepessoto - 146 events, 83 issues
vkorukanti - 136 events, 80 issues
raveeram-db - 106 events, 55 issues
allisonport-db - 100 events, 52 issues

Related by overlapping contributors

Detailed Description

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture by combining data lake and data warehouse capabilities. Written primarily in Scala, it provides ACID transaction support and integrates with multiple compute engines including Apache Spark, PrestoDB, Flink, Trino, and Hive. The project offers APIs for Scala, Java, Rust, Ruby, and Python, making it accessible to a wide range of development environments and use cases.

The repository implements a transaction protocol that guarantees serializability for concurrent reads and writes. Delta Lake achieves ACID guarantees through specific requirements on underlying storage systems, including atomic visibility of files, mutual exclusion for writers, and consistent directory listings. The framework maintains backward compatibility with all Delta Lake tables, ensuring that newer versions can always read tables written by older versions, though forward compatibility is not guaranteed as new protocol features are introduced.

Delta Lake provides multiple integration pathways for different systems. The Apache Spark connector allows reading from and writing to Delta Lake tables. The Delta Standalone library enables Scala and Java-based projects, including Apache Flink, Apache Hive, Apache Beam, and PrestoDB, to interact with Delta tables without requiring Spark. The Delta Rust API provides low-level access to Delta tables with Python and Ruby bindings for use with data processing frameworks. Trino and PrestoDB connectors support reading and writing capabilities, while the Apache Flink connector focuses on write operations.

The codebase is built using SBT and requires Java 17 or later. Development setup is supported through IntelliJ, which is the recommended IDE. The project includes comprehensive test suites and provides both Scala and Python testing environments, with Python tests managed through Conda.

Activity tracking shows the repository maintains active engagement with its community. GitGenius data reveals a median issue and pull request response latency of 0.0 hours with a mean of 3593.7 hours across 768 tracked items, indicating variable response times depending on issue complexity. The most active issue labels are enhancement with 347 occurrences, bug with 284 occurrences, and kernel with 76 occurrences. Top contributors tracked by GitGenius include AnudeepKonaboina with 225 events, felipepessoto with 146 events, and vkorukanti with 136 events. The repository shares overlapping contributors with microsoft/vscode, trinodb/trino, and microsoft/typescript, suggesting cross-project collaboration within the data processing and development tools ecosystem.

The project is part of a larger Delta Lake ecosystem within the delta-io organization, which includes related repositories such as delta-rs, delta-sharing, kafka-delta-ingest, and the project website. Delta Lake is licensed under Apache License 2.0 and maintains community engagement through public Slack channels, a mailing list, LinkedIn, and YouTube presence.

delta
by
delta-io

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

delta
by
delta-iodelta-io/delta

Repository Details

delta by delta-io

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

delta by delta-iodelta-io/delta

Repository Details

delta
by
delta-io

delta
by
delta-iodelta-io/delta