Description: A suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL
View timescale/pgai on GitHub ↗
pgai (Postgres AI) is a TimescaleDB extension designed to bring the power of large language models (LLMs) directly into your PostgreSQL database. It fundamentally shifts the paradigm of how you interact with and analyze time-series and relational data, moving beyond traditional SQL queries to leverage natural language prompts for insights. Instead of needing to know *how* to query, you can simply ask *what* you want to know, and pgai translates that into executable SQL.
At its core, pgai utilizes a retrieval-augmented generation (RAG) pipeline. This means it doesn't rely solely on the LLM's pre-trained knowledge. Instead, it first *retrieves* relevant data from your Postgres database based on the user's prompt. This retrieved context is then combined with the prompt and fed into the LLM, which *generates* a natural language answer, or more importantly, a SQL query. The generated SQL is then executed against your database, and the results are presented back to the user, often formatted as natural language. This approach significantly improves accuracy and relevance compared to relying on the LLM alone, especially for domain-specific data.
The architecture of pgai is modular and extensible. It consists of several key components: a Postgres extension providing the core functionality, a vector database (currently supporting pgvector, but designed for future integrations), and a connection to an LLM provider (currently OpenAI, but also designed for flexibility with other models like those from Cohere, or open-source options). The extension provides functions for embedding data (converting text into vector representations), storing those embeddings in the vector database, and querying the database using natural language. The RAG pipeline is orchestrated within Postgres itself, minimizing data transfer and maximizing performance.
A key feature is the ability to define "capabilities," which are essentially pre-defined prompts and SQL templates that guide the LLM. These capabilities allow you to tailor pgai to specific use cases, such as anomaly detection, forecasting, or root cause analysis. Instead of relying on the LLM to figure out everything from scratch, you provide it with a structured framework, improving reliability and consistency. pgai also includes features for managing and monitoring the LLM interactions, including logging and cost tracking.
The repository provides comprehensive documentation, examples, and a quickstart guide to help users get up and running. It includes Docker Compose files for easy local deployment and demonstrates how to integrate pgai with various TimescaleDB features, such as hypertables for efficient time-series data storage. The project is actively developed and maintained by TimescaleDB, with a focus on improving performance, expanding LLM support, and adding new capabilities. Ultimately, pgai aims to democratize access to data insights by making it easier for anyone, regardless of their SQL expertise, to unlock the value hidden within their Postgres databases.
Fetching additional details & charts...