promptfoo
by
promptfoo

Description: Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

View promptfoo/promptfoo on GitHub ↗

Summary Information

Updated 17 minutes ago
Added to GitGenius on March 13th, 2026
Created on April 28th, 2023
Open Issues & Pull Requests: 263 (+0)
Number of forks: 1,825
Total Stargazers: 21,076 (+1)
Total Subscribers: 53 (+0)

Issue Activity (beta)

Open issues: 71
New in 7 days: 5
Closed in 7 days: 3
Avg open age: 393 days
Stale 30+ days: 66
Stale 90+ days: 25

Recent activity

Opened in 7 days: 4
Closed in 7 days: 3
Comments in 7 days: 6
Events in 7 days: 11

Top labels

  • Feature Request (79)
  • Open Source (75)
  • bug (64)
  • question (37)
  • good-first-issue (13)
  • From Github (10)
  • help-wanted (9)
  • in-progress (8)

Most active issues this week

Detailed Description

Promptfoo is a command-line interface (CLI) and library designed to evaluate and red-team applications built on Large Language Models (LLMs). Its primary purpose is to help developers build more secure, reliable, and performant AI applications by providing tools for testing, vulnerability scanning, and model comparison. The project aims to move away from the trial-and-error approach often associated with LLM development, offering a data-driven methodology for prompt engineering and model selection.

At its core, Promptfoo allows users to automate the evaluation of prompts and models. This is achieved through a simple, declarative configuration system that defines test cases and evaluation criteria. Users can specify inputs, expected outputs, and metrics to assess the performance of different LLMs, including GPT, Claude, Gemini, Llama, and others. The tool then runs these evaluations and presents the results in a clear and concise manner, often in a side-by-side comparison format. This enables developers to quickly identify the strengths and weaknesses of different models and prompts.

A key feature of Promptfoo is its red-teaming capabilities. This involves using the tool to identify potential vulnerabilities in LLM applications. By simulating malicious inputs and probing for weaknesses, developers can proactively secure their AI systems against attacks. Promptfoo provides tools for vulnerability scanning, helping to uncover potential security flaws and compliance issues. This is particularly important as LLM applications become more integrated into critical systems. The tool can generate security vulnerability reports, providing actionable insights for remediation.

Promptfoo offers several key benefits for developers. It allows for automated evaluations, saving time and effort compared to manual testing. It supports CI/CD integration, enabling automated checks as part of the software development lifecycle. This ensures that LLM-related security and performance issues are caught early in the development process. The tool also facilitates the comparison of different models, allowing developers to choose the best model for their specific needs. Furthermore, Promptfoo is designed to be developer-friendly, with features like live reload and caching to speed up the development process. The tool prioritizes privacy, with evaluations running locally, ensuring that prompts and sensitive data never leave the user's machine.

The project is open-source and MIT-licensed, fostering an active community. This allows for community contributions and collaborative development. The project provides comprehensive documentation, including getting started guides, red-teaming guides, and CLI usage instructions. It also offers a Discord community for support and discussion. Promptfoo is designed to be flexible, working with any LLM API or programming language. It is a battle-tested tool, powering LLM applications serving millions of users in production. In essence, Promptfoo empowers developers to build better, more secure, and more reliable AI applications by providing a comprehensive suite of tools for evaluation, red-teaming, and model comparison.

promptfoo
by
promptfoopromptfoo/promptfoo

Repository Details

Fetching additional details & charts...