flink
by
apache

Description: Apache Flink

View on GitHub ↗

Summary Information

Updated 1 hour ago

Added to GitGenius on June 12th, 2023

Created on June 7th, 2014

Open Issues & Pull Requests: 362 (+0)

Number of forks: 13,995

Total Stargazers: 26,160 (+0)

Total Subscribers: 907 (+0)

Issue Activity (beta)

Open issues: 0

New in 7 days: 0

Closed in 7 days: 0

Avg open age: N/A days

Stale 30+ days: 0

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Full issues analysis pending...

Detailed Description

Apache Flink is an open source stream processing framework maintained by the Apache Software Foundation that provides powerful capabilities for both stream and batch processing of large-scale data. Written primarily in Java with support for Python and Scala, Flink originated from the Stratosphere research project and has evolved into a comprehensive platform for real-time data analytics and event-driven applications.

The framework operates on a streaming-first runtime architecture that unifies batch and streaming workloads, offering elegant and fluent APIs in Java alongside support for multiple programming languages. A core strength of Flink is its ability to maintain very high throughput while simultaneously achieving low event latency, making it suitable for demanding real-time analytics scenarios. The platform implements the Dataflow Model to handle event time and out-of-order processing, critical capabilities for applications that must respect the temporal ordering of events rather than just their arrival sequence.

Flink provides flexible windowing mechanisms supporting time-based, count-based, and session windows across different time semantics including event time and processing time. The framework guarantees exactly-once processing semantics with fault-tolerance built into its core, ensuring data integrity even in the face of system failures. Natural back-pressure handling in streaming programs prevents system overload by automatically regulating data flow rates.

The platform includes specialized libraries for graph processing in batch mode, machine learning capabilities, and complex event processing for streaming workloads. Flink implements custom memory management to efficiently handle transitions between in-memory and out-of-core data processing algorithms, optimizing resource utilization across different computational scenarios. The framework maintains compatibility layers for Apache Hadoop MapReduce and integrates with the broader Hadoop ecosystem including YARN, HDFS, and HBase.

Building Flink from source requires a Unix-like environment, Git, Maven version 3.8.6 or later, and Java versions 11, 17, or 21, with the build process typically completing in approximately ten minutes. The project recommends IntelliJ IDEA as the primary development environment due to its superior support for mixed Java and Scala codebases, though the codebase can be developed with any IDE offering Maven integration and Scala support.

Flink maintains an active open-source community with development discussions occurring on mailing lists and issue tracking through Apache JIRA. The project has externalized most connectors to individual repositories under the Apache Software Foundation, including dedicated connector projects for AWS, Cassandra, Elasticsearch, GCP Pub/Sub, HBase, Hive, JDBC, Kafka, MongoDB, OpenSearch, Prometheus, Pulsar, and RabbitMQ, allowing independent versioning and maintenance of integration points.

flink
by
apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

flink
by
apacheapache/flink

Repository Details

flink by apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

flink by apacheapache/flink

Repository Details

flink
by
apache

flink
by
apacheapache/flink