pgai
by
timescale

Description: A suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL

View timescale/pgai on GitHub ↗

Summary Information

Updated 37 minutes ago

Added to GitGenius on August 18th, 2025

Created on May 16th, 2024

Open Issues/Pull Requests: 45 (+0)

Number of forks: 301

Total Stargazers: 5,760 (+0)

Total Subscribers: 39 (+0)

Detailed Description

pgai (Postgres AI) is a TimescaleDB extension designed to bring the power of large language models (LLMs) directly into your PostgreSQL database. It fundamentally shifts the paradigm of how you interact with and analyze time-series and relational data, moving beyond traditional SQL queries to leverage natural language prompts for insights. Instead of needing to know *how* to query, you can simply ask *what* you want to know, and pgai translates that into executable SQL.

At its core, pgai utilizes a retrieval-augmented generation (RAG) pipeline. This means it doesn't rely solely on the LLM's pre-trained knowledge. Instead, it first *retrieves* relevant data from your Postgres database based on the user's prompt. This retrieved context is then combined with the prompt and fed into the LLM, which *generates* a natural language answer, or more importantly, a SQL query. The generated SQL is then executed against your database, and the results are presented back to the user, often formatted as natural language. This approach significantly improves accuracy and relevance compared to relying on the LLM alone, especially for domain-specific data.

The architecture of pgai is modular and extensible. It consists of several key components: a Postgres extension providing the core functionality, a vector database (currently supporting pgvector, but designed for future integrations), and a connection to an LLM provider (currently OpenAI, but also designed for flexibility with other models like those from Cohere, or open-source options). The extension provides functions for embedding data (converting text into vector representations), storing those embeddings in the vector database, and querying the database using natural language. The RAG pipeline is orchestrated within Postgres itself, minimizing data transfer and maximizing performance.

A key feature is the ability to define "capabilities," which are essentially pre-defined prompts and SQL templates that guide the LLM. These capabilities allow you to tailor pgai to specific use cases, such as anomaly detection, forecasting, or root cause analysis. Instead of relying on the LLM to figure out everything from scratch, you provide it with a structured framework, improving reliability and consistency. pgai also includes features for managing and monitoring the LLM interactions, including logging and cost tracking.

The repository provides comprehensive documentation, examples, and a quickstart guide to help users get up and running. It includes Docker Compose files for easy local deployment and demonstrates how to integrate pgai with various TimescaleDB features, such as hypertables for efficient time-series data storage. The project is actively developed and maintained by TimescaleDB, with a focus on improving performance, expanding LLM support, and adding new capabilities. Ultimately, pgai aims to democratize access to data insights by making it easier for anyone, regardless of their SQL expertise, to unlock the value hidden within their Postgres databases.

pgai
by
timescale

Summary Information

pgai
by
timescaletimescale/pgai

Repository Details

pgai by timescale

Summary Information

pgai by timescaletimescale/pgai

Repository Details

pgai
by
timescale

pgai
by
timescaletimescale/pgai