pgvectorscale
by
timescale

Description: Postgres extension for vector search (DiskANN), complements pgvector for performance and scale. Postgres OSS licensed.

View on GitHub ↗

Summary Information

Updated 55 minutes ago

Added to GitGenius on August 18th, 2025

Created on July 1st, 2023

Open Issues & Pull Requests: 20 (+0)

Number of forks: 142

Total Stargazers: 3,079 (+0)

Total Subscribers: 26 (+0)

Issue Activity (beta)

Open issues: 14

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 272 days

Stale 30+ days: 14

Stale 90+ days: 9

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

community (51)
pgvectorscale (49)
bug (39)
enhancement (10)
question (8)
good first issue (4)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.0 hours

Mean response time: 19.3 days

90th percentile: 17.9 days

Tracked items: 99

Most active contributors

tjgreen42 - 91 events, 42 issues
cevian - 82 events, 46 issues
alejandrodnm - 58 events, 28 issues
mgrosso - 21 events, 2 issues
syvb - 17 events, 11 issues

Related by overlapping contributors

Detailed Description

pgvectorScale is a project by Timescale aimed at dramatically accelerating vector similarity search within PostgreSQL using pgvector, TimescaleDB, and specialized hardware like GPUs. It’s essentially a drop-in replacement for pgvector’s indexing functionality, offering significantly improved query performance, particularly for large datasets, without requiring changes to application code. The core idea is to leverage TimescaleDB’s columnar storage and parallel processing capabilities alongside GPU acceleration to perform approximate nearest neighbor (ANN) searches much faster than traditional methods.

At its heart, pgvectorScale utilizes a technique called HNSW (Hierarchical Navigable Small World) graph indexing, similar to pgvector, but with key optimizations. Instead of relying solely on CPU processing, pgvectorScale offloads the computationally intensive indexing and search operations to GPUs. This is achieved through a custom CUDA kernel implementation designed to efficiently traverse the HNSW graph. The project focuses on maximizing GPU utilization and minimizing data transfer between CPU and GPU, which are common bottlenecks in GPU-accelerated database systems. It supports various distance metrics including L2 (Euclidean), cosine similarity, and inner product.

The architecture involves a PostgreSQL extension that intercepts pgvector index creation and search requests. When pgvectorScale is enabled, these requests are redirected to the GPU for processing. The extension manages the transfer of vector data to the GPU, executes the ANN search using the CUDA kernel, and then returns the results back to PostgreSQL. Crucially, pgvectorScale is designed to be compatible with existing pgvector workflows. Applications using pgvector can continue to use the same API calls (e.g., `CREATE INDEX USING pgvector`, `SELECT ... ORDER BY vector_column <-> query_vector`) without modification. This ease of integration is a major advantage.

Currently, pgvectorScale is focused on NVIDIA GPUs and requires a compatible CUDA installation. The project provides Docker images for simplified deployment and testing. Performance gains are substantial, with benchmarks demonstrating speedups of several orders of magnitude compared to CPU-based pgvector indexing, especially as the dataset size increases. The speedup is dependent on factors like GPU model, vector dimensionality, dataset size, and query complexity. TimescaleDB’s hyperfunctions framework is used to define and execute the GPU kernels within PostgreSQL.

The project is still under active development, but it represents a significant step forward in making vector similarity search practical for large-scale applications. Future development plans include support for additional distance metrics, improved indexing algorithms, and potentially support for other GPU vendors. pgvectorScale is particularly well-suited for applications like recommendation systems, image/video search, natural language processing, and fraud detection, where efficient similarity search is critical. It effectively bridges the gap between the flexibility of PostgreSQL and the performance of specialized hardware for vector database workloads.

pgvectorscale
by
timescale

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pgvectorscale
by
timescaletimescale/pgvectorscale

Repository Details

pgvectorscale by timescale

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pgvectorscale by timescaletimescale/pgvectorscale

Repository Details

pgvectorscale
by
timescale

pgvectorscale
by
timescaletimescale/pgvectorscale