hive
by
apache

Description: Apache Hive

View on GitHub ↗

Summary Information

Updated 4 minutes ago

Added to GitGenius on January 3rd, 2025

Created on May 21st, 2009

Open Issues & Pull Requests: 76 (+0)

Number of forks: 4,788

Total Stargazers: 5,990 (+0)

Total Subscribers: 304 (+0)

Issue Activity (beta)

Open issues: 0

New in 7 days: 0

Closed in 7 days: 0

Avg open age: N/A days

Stale 30+ days: 0

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Full issues analysis pending...

Detailed Description

Apache Hive is a data warehouse software system built on top of Apache Hadoop that enables users to read, write, and manage large datasets stored in distributed storage systems using SQL. Written primarily in Java, Hive abstracts the complexity of distributed computing by providing a SQL interface to data that would otherwise require lower-level programming approaches. The project is classified across multiple data management domains including relational databases, big data analytics, data warehousing, ETL processes, and batch processing, reflecting its broad applicability in the Hadoop ecosystem.

The core functionality of Hive centers on providing SQL-based access to data stored in Apache HDFS and other storage systems such as Apache HBase. Rather than being designed for online transaction processing, Hive is optimized for traditional data warehousing tasks where large volumes of data require distributed processing across multiple machines. The system imposes structure on various data formats, allowing organizations to treat unstructured or semi-structured data as queryable tables. Hive supports standard SQL functionality including features from the 2003 and 2011 SQL standards, with particular emphasis on OLAP functions, subqueries, and common table expressions that are essential for analytical workloads.

A key architectural feature of Hive is its use of the Apache Tez framework for query execution, which substantially reduces computational overhead compared to the earlier MapReduce approach. This design choice reflects the project's focus on improving performance while maintaining the scalability and fault-tolerance characteristics necessary for distributed systems. The system is extensible through user-defined functions (UDFs), user-defined aggregates (UDAFs), and user-defined table functions (UDTFs), allowing developers to customize SQL functionality for domain-specific requirements.

Hive's MetaStore component serves as a metadata repository that tracks table schemas, partitions, and other structural information about datasets. The project provides upgrade scripts for multiple database backends including MySQL, PostgreSQL, Oracle, Microsoft SQL Server, and Derby, acknowledging that different organizations have varying infrastructure requirements. Version compatibility requirements have evolved significantly, with Hive 4.0.1 requiring Java 8, Hive 4.1.x requiring Java 17, and Hive 4.2.x requiring Java 21, while all recent versions depend on Hadoop 3.x.

The repository maintains active community engagement through dedicated mailing lists for users, developers, and commit monitoring. The project's classification as supporting schema evolution and large dataset handling indicates its maturity in addressing real-world data management challenges where schemas change over time and datasets grow beyond the capacity of single machines. By combining SQL accessibility with distributed processing capabilities, Hive bridges the gap between traditional database systems and the scalability requirements of modern big data analytics, making it a foundational component of many data warehousing architectures built on Hadoop infrastructure.

hive
by
apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

hive
by
apacheapache/hive

Repository Details

hive by apache

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

hive by apacheapache/hive

Repository Details

hive
by
apache

hive
by
apacheapache/hive