impala
by
apache

Description: Apache Impala

View on GitHub ↗

Summary Information

Updated 37 minutes ago

Added to GitGenius on January 4th, 2025

Created on April 13th, 2016

Open Issues & Pull Requests: 4 (+0)

Number of forks: 556

Total Stargazers: 1,276 (+0)

Total Subscribers: 74 (+0)

Issue Activity (beta)

Open issues: 0

New in 7 days: 0

Closed in 7 days: 0

Avg open age: N/A days

Stale 30+ days: 0

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Full issues analysis pending...

Detailed Description

Apache Impala is a massively distributed and massively parallel SQL query engine written in C++ that enables lightning-fast analysis of petabytes of data across multiple storage systems and formats. The project is designed to deliver best-of-breed performance and scalability for interactive analytics workloads, making it a core component of the Apache Hadoop ecosystem for real-time data analysis.

The engine supports a diverse range of data sources and storage backends, including Apache Iceberg, HDFS, Apache HBase, Apache Kudu, Amazon S3, Azure Data Lake Storage, and Apache Hadoop Ozone. This multi-source capability allows organizations to query data regardless of where it is stored, providing flexibility in data architecture decisions. Impala also supports the most commonly used Hadoop file formats, specifically Apache Parquet and Apache ORC, which are optimized for analytical workloads and columnar storage patterns.

Impala's query processing capabilities are comprehensive, featuring wide analytic SQL support that includes window functions and subqueries. A key performance optimization technique employed by Impala is on-the-fly code generation using LLVM, which generates highly optimized code tailored specifically to each individual query. This approach eliminates interpretation overhead and produces machine code optimized for the exact data types and operations required by a particular query.

Security is a primary concern addressed by Impala's architecture, with support for industry-standard security protocols including Kerberos, LDAP, and TLS. These security features enable Impala to be deployed in enterprise environments where authentication and encryption are mandatory requirements.

The project is fully open source under the Apache license, making it freely available for both commercial and non-commercial use. The codebase is written primarily in C++, which provides the performance characteristics necessary for handling large-scale distributed queries efficiently.

Platform support is currently limited to Linux systems, with specific testing and validation on Ubuntu versions 20.04, 22.04, and 24.04, as well as Rocky and RHEL versions 8, 9, and 10. Impala supports both x86_64 and arm64 architectures as of version 4.4, expanding deployment options beyond traditional server hardware. Other Linux distributions such as SLES15 and SLES16 may function but are not actively tested by the community.

For users wanting to experiment with Impala, the project provides a quickstart Docker container that eliminates the need to install dependencies manually. This container can automatically load test datasets into Apache Kudu and Apache Parquet formats, allowing users to begin running queries and exploring Impala's capabilities within minutes on a single machine.

The project maintains comprehensive documentation split between user and administrator resources on the Impala homepage and detailed developer documentation on the Impala wiki. Build instructions and project layout information are provided in the repository's README files to assist developers interested in contributing to or understanding Impala's internals and architecture.

impala
by
apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

impala
by
apacheapache/impala

Repository Details

impala by apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

impala by apacheapache/impala

Repository Details

impala
by
apache

impala
by
apacheapache/impala