bloom
by
safety-research

Description: bloom - evaluate any behavior immediately 🌸🌱

View on GitHub ↗

Summary Information

Updated 2 hours ago

Added to GitGenius on December 23rd, 2025

Created on June 24th, 2025

Open Issues & Pull Requests: 9 (+0)

Number of forks: 173

Total Stargazers: 1,362 (+0)

Total Subscribers: 12 (+0)

Issue Activity (beta)

Open issues: 0

New in 7 days: 0

Closed in 7 days: 0

Avg open age: N/A days

Stale 30+ days: 0

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

No label distribution available yet.

Most active issues this week

No issue events were indexed in the last 7 days.

Full issues analysis pending...

Repository Insights (GitGenius)

Median issue/PR response: 25.7 hours

Mean response time: 2.7 days

90th percentile: 9.0 days

Tracked items: 17

Most active contributors

isha-gpt - 37 events, 16 issues
Butanium - 6 events, 3 issues
rht - 6 events, 3 issues
Shashikant86 - 2 events, 1 issues

Related by overlapping contributors

Detailed Description

Bloom is a Python-based tool for generating and executing automated behavioral evaluations of large language models. Rather than relying on fixed benchmarks, Bloom dynamically creates evaluation suites tailored to probe for specific behaviors such as sycophancy, self-preservation, political bias, and other characteristics. The core workflow involves providing a seed configuration that describes the target behavior and evaluation parameters, after which Bloom generates diverse test scenarios, executes conversations with a specified model, and scores the results to determine whether and how strongly the behavior manifests.

The repository is classified across multiple AI safety and interpretability domains including counterfactuals, LLM analysis, model interpretability, AI debugging, prompt engineering, language model evaluation, behavior understanding, input perturbation, and explanations. This breadth reflects Bloom's position as a comprehensive tool for understanding and evaluating model behavior rather than a narrowly focused utility. According to GitGenius activity tracking, the repository has maintained a median issue and pull request response latency of 25.7 hours across 17 tracked items, with a mean latency of 65.7 hours, indicating reasonably active maintenance. The most active contributor tracked by GitGenius is isha-gpt with 37 events, followed by Butanium and rht with 6 events each. The repository shares contributors with several major projects including Anthropic's Claude Code, the vLLM project, and Hugging Face Transformers, suggesting integration points with the broader language model ecosystem.

The evaluation pipeline consists of four distinct stages. The Understanding stage analyzes the target behavior and provided examples. The Ideation stage generates diverse evaluation scenarios based on configuration parameters. The Rollout stage executes conversations with the target model. The Judgment stage scores the presence of the behavior and evaluates additional qualities. Users can run individual stages independently or execute the full pipeline. Configuration is managed through a seed.yaml file where users specify the behavior name, example transcripts, number of scenarios to generate, variation dimensions, target model, and modality (conversation or simenv with tool calls).

A distinctive feature is the variation dimensions system, which allows users to generate targeted variations of base scenarios to test behavior stability under different conditions. For example, users can specify dimensions like noise or emotional pressure, and Bloom will automatically create variations along those axes. This approach means that evaluation suites are reproducible only when cited with their full seed configuration, distinguishing Bloom from traditional fixed benchmarks.

The tool supports multiple model providers through LiteLLM integration and includes advanced features such as Weights and Biases integration for large-scale experiments, interactive chat for manual testing, extended thinking support for Claude and OpenAI o-series models, and programmatic usage as both a command-line tool and Python library. Results are saved to structured directories and can be viewed through an interactive viewer. The repository includes development infrastructure with pre-commit hooks for linting, formatting, and type checking, as well as a test suite using mocked API responses.

It is important to note that Bloom has been transferred to Meridian Labs and is now maintained at meridianlabs-ai.github.io/petri_bloom. The GitHub repository is frozen at its last standalone release and will not receive further updates, though existing users can continue using the documented version. New projects are directed to start from the Meridian Labs version instead.

bloom
by
safety-research

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

bloom
by
safety-researchsafety-research/bloom

Repository Details

bloom by safety-research

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

bloom by safety-researchsafety-research/bloom

Repository Details

bloom
by
safety-research

bloom
by
safety-researchsafety-research/bloom