pipecat
by
pipecat-ai

Description: Open Source framework for voice and multimodal conversational AI

View on GitHub ↗

Summary Information

Updated 2 hours ago

Added to GitGenius on August 4th, 2025

Created on December 27th, 2023

Open Issues & Pull Requests: 227 (+1)

Number of forks: 2,291

Total Stargazers: 13,328 (+1)

Total Subscribers: 74 (+0)

Issue Activity (beta)

Open issues: 98

New in 7 days: 5

Closed in 7 days: 4

Avg open age: 77 days

Stale 30+ days: 50

Stale 90+ days: 32

Recent activity

Opened in 7 days: 3

Closed in 7 days: 2

Comments in 7 days: 6

Events in 7 days: 12

Top labels

in-progress (59)
help wanted (29)
need-more-info (20)
code-scan-issue-found (17)
langchain (7)
community integration (3)
duplicate (3)
good first issue (3)

Most active issues this week

#4707 FilterIncompleteUserTurnStrategies (1.2.x): duplicate bot responses per user turn + talk-over (no dedup / no VAD re-validation on ✓-finalization) - 8 events / 0 comments
#4901 AnthropicLLMService does not route inline <thinking> text to LLMThoughtTextFrame (leaks to TTS) - 6 events / 1 comments
#4963 NLTK downloads on runtime - 6 events / 3 comments
#4830 [Detail Bug] Workers: Streaming jobs remain cancellable after stream end, causing incorrect CANCELLED handling - 2 events / 0 comments
#4484 SpeechmaticsSTTService: streaming materially worse than batch on identical config (trailing tokens clipped by immediate finalize) - 1 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.0 hours

Mean response time: 3.5 days

90th percentile: 3.9 days

Tracked items: 1,196

Most active contributors

markbackman - 2,466 events, 825 issues
aconchillo - 417 events, 212 issues
filipi87 - 342 events, 128 issues
chadbailey59 - 338 events, 152 issues
vipyne - 87 events, 50 issues

Related by overlapping contributors

Detailed Description

PipeCat is an open-source, modular, and scalable data observability platform designed to help data teams proactively detect, investigate, and resolve data quality issues in their data pipelines. It aims to provide a comprehensive solution for monitoring data health across the entire data lifecycle, from ingestion to transformation and consumption. The core philosophy revolves around defining data quality expectations as "checks" and then continuously evaluating data against those checks, alerting when issues arise.

At its heart, PipeCat utilizes a flexible architecture built around "connectors," "checks," and "actions." Connectors are responsible for extracting metadata and data samples from various data sources (databases, data lakes, streaming platforms, etc.). Currently, it supports connections to popular systems like Snowflake, BigQuery, Postgres, Redshift, Databricks, Kafka, and more, with the ability to easily add custom connectors. Checks define the data quality rules to be applied – these can range from simple schema validation and null checks to more complex statistical tests and custom SQL queries. Actions define what happens when a check fails, such as sending alerts via Slack, PagerDuty, or triggering automated remediation workflows.

A key differentiator for PipeCat is its focus on modularity and extensibility. The platform is designed to be easily customized and integrated into existing data infrastructure. Users can write their own connectors, checks, and actions using Python, allowing for highly specific and tailored data quality monitoring. The use of a declarative configuration system (YAML) simplifies the definition and management of data quality rules. This allows data engineers to define *what* needs to be checked, rather than *how* to check it, promoting consistency and reducing maintenance overhead.

The repository contains several core components. `pipecat-core` houses the central logic for running checks, managing metadata, and triggering actions. `pipecat-cli` provides a command-line interface for interacting with the platform, including defining checks, running tests, and viewing results. `pipecat-web` is a user-friendly web UI that provides a visual overview of data quality status, allows for drill-down investigation of failures, and facilitates collaboration among data team members. Furthermore, the repository includes example configurations and integrations to help users get started quickly.

PipeCat is actively developed and maintained by PipeCat AI, with a growing community contributing to its expansion. It's designed to be cloud-agnostic and can be deployed on various infrastructure platforms, including Kubernetes. The project emphasizes observability not just of the data itself, but also of the data pipeline processes, providing insights into the root causes of data quality issues. Ultimately, PipeCat aims to empower data teams to build and maintain reliable data pipelines, fostering trust in their data and enabling data-driven decision-making.

pipecat
by
pipecat-ai

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pipecat
by
pipecat-aipipecat-ai/pipecat

Repository Details

pipecat by pipecat-ai

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

pipecat by pipecat-aipipecat-ai/pipecat

Repository Details

pipecat
by
pipecat-ai

pipecat
by
pipecat-aipipecat-ai/pipecat