hyperloglog
by
axiomhq

Description: HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction) brought to you by Axiom

View on GitHub ↗

Summary Information

Updated 2 hours ago

Added to GitGenius on July 28th, 2024

Created on June 18th, 2017

Open Issues & Pull Requests: 10 (+0)

Number of forks: 83

Total Stargazers: 1,043 (+0)

Total Subscribers: 18 (+0)

Issue Activity (beta)

Open issues: 7

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 934 days

Stale 30+ days: 7

Stale 90+ days: 6

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

help wanted (3)
question (2)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 28.6 hours

Mean response time: 133.7 days

90th percentile: 761.4 days

Tracked items: 7

Most active contributors

lukasmalkmus - 5 events, 4 issues
seiflotfy - 3 events, 2 issues
CodeZHXS - 1 events, 1 issues
HurSungYun - 1 events, 1 issues
axw - 1 events, 1 issues

Related by overlapping contributors

Detailed Description

The axiomhq/hyperloglog repository is a Go implementation of the HyperLogLog algorithm, a probabilistic data structure designed to approximate the number of distinct elements in large datasets with minimal memory overhead. Developed and maintained by Axiom, this library provides an enhanced version of the classical HyperLogLog algorithm specifically optimized for cardinality estimation problems in big data and stream processing contexts.

The implementation has evolved significantly from its initial version. The original v0.1.0 was based on research by Qingjun Xiao, You Zhou, and Shigang Chen on improving cardinality estimation performance for large data streams. However, the current implementation has moved away from that foundation and now uses the LogLog-Beta algorithm as described in 2016 research by Jason Qin, Denys Kim, and Yumei Tung. This shift represents a deliberate architectural decision to provide better overall performance and simplicity.

Key technical features distinguish this implementation from standard HyperLogLog variants. The library uses Metro hash instead of xxhash for hashing operations, incorporates sparse representation for handling lower cardinalities similar to HyperLogLog++, and implements LogLog-Beta for dynamic bias correction across all cardinality ranges. The use of 8-bit registers simplifies the implementation while maintaining practical accuracy. The algorithm supports order-independent insertions and merging, ensuring consistent results regardless of the sequence in which data is processed or how sketches are combined. The removal of the tailcut method from earlier versions streamlines the approach without sacrificing performance.

Flexibility in precision is a notable design choice. Users can configure the number of registers from 2^4 to 2^18, allowing fine-grained control over the memory-accuracy tradeoff. This flexibility translates to practical memory usage ranging from 16 bytes at minimum precision to 256 KB at maximum precision, with the default configuration using 2^14 registers and consuming 16 KB of memory. This range makes the library suitable for diverse applications from memory-constrained environments to high-precision scenarios.

GitGenius activity tracking reveals moderate engagement with the repository. Across seven tracked issues and pull requests, the median response latency was 28.6 hours, though the mean of 3209 hours indicates occasional longer-term discussions. The most active contributor tracked was lukasmalkmus with five events, followed by seiflotfy with three events. The help wanted label appeared once among tracked issues, suggesting occasional requests for community assistance. The repository maintains connections with other significant projects through overlapping contributors, including links to cockroachdb/cockroach, cockroachdb/pebble, and open-telemetry/opentelemetry-collector-contrib, indicating adoption within the broader data infrastructure ecosystem.

The repository is classified across multiple domains including probabilistic data structures, big data processing, approximate counting, stream processing, and memory-efficient algorithms. These classifications reflect the library's positioning as a tool for distributed systems and large-scale data analysis where exact cardinality counts are computationally prohibitive. The implementation maintains backwards compatibility with previous versions while providing a cleaner, more efficient codebase. The project is distributed under the MIT License and includes contribution guidelines for developers interested in proposing improvements or bugfixes.

hyperloglog
by
axiomhq

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

hyperloglog
by
axiomhqaxiomhq/hyperloglog

Repository Details

hyperloglog by axiomhq

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

hyperloglog by axiomhqaxiomhq/hyperloglog

Repository Details

hyperloglog
by
axiomhq

hyperloglog
by
axiomhqaxiomhq/hyperloglog