mangle
by
google

Description: Mangle is a novel, open-source tool developed by Google for detecting and mitigating prompt injection attacks against Large Language Models (LLMs).

View on GitHub ↗

Summary Information

Updated 16 minutes ago

Added to GitGenius on September 8th, 2025

Created on November 24th, 2022

Open Issues & Pull Requests: 9 (+0)

Number of forks: 153

Total Stargazers: 2,985 (+0)

Total Subscribers: 38 (+0)

Issue Activity (beta)

Open issues: 4

New in 7 days: 0

Closed in 7 days: 0

Avg open age: 614 days

Stale 30+ days: 4

Stale 90+ days: 4

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

enhancement (8)
bug (2)
good first issue (2)
documentation (1)

Most active issues this week

No issue events were indexed in the last 7 days.

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.3 hours

Mean response time: 128.1 days

90th percentile: 678.4 days

Tracked items: 16

Most active contributors

burakemir - 35 events, 16 issues
RyanCarlisle - 3 events, 1 issues
hertzcodes - 2 events, 1 issues
maxott - 2 events, 1 issues
vorburger - 2 events, 2 issues

Related by overlapping contributors

Detailed Description

Mangle is a novel, open-source tool developed by Google for detecting and mitigating prompt injection attacks against Large Language Models (LLMs). It operates as a runtime defense, meaning it analyzes user inputs *during* interaction with the LLM, rather than relying solely on pre-training or static analysis. Its core innovation lies in identifying “mangled” prompts – inputs subtly altered to hijack the LLM’s intended behavior, often by embedding instructions within seemingly harmless text. Unlike traditional input sanitization which focuses on blocking keywords, Mangle aims to understand the *intent* behind the input, even when disguised.

The key principle behind Mangle is the observation that successful prompt injections often involve a shift in the LLM’s “persona” or task. Instead of directly blocking malicious keywords (which are easily bypassed through obfuscation), Mangle attempts to detect when the input is attempting to redefine the LLM’s role. It does this by leveraging a smaller, “shadow” LLM specifically trained to identify these persona shifts. This shadow LLM doesn’t generate responses itself; it solely acts as a detector, classifying inputs as either “safe” or “injected.” The repository details how this shadow LLM is trained using a dataset of both benign and adversarial prompts, focusing on examples where the LLM’s behavior demonstrably changes due to the injection.

Mangle’s architecture is designed for flexibility and integration. It’s not a standalone application but rather a library intended to be incorporated into existing LLM-powered applications. The repository provides Python code demonstrating how to integrate Mangle into a simple Flask application serving an LLM. The workflow involves receiving user input, passing it through Mangle’s detection model, and only forwarding the input to the main LLM if Mangle classifies it as safe. This allows developers to add a layer of runtime security without significantly altering their existing infrastructure. The authors emphasize that Mangle is *not* a perfect solution and should be used as part of a defense-in-depth strategy, alongside other security measures.

The repository includes detailed instructions on setting up the environment, training the shadow LLM (using provided scripts and datasets), and running the example application. It also provides a comprehensive evaluation of Mangle’s performance, demonstrating its effectiveness against a variety of prompt injection attacks. The evaluation metrics focus on precision and recall, highlighting the trade-offs between blocking legitimate inputs (false positives) and allowing malicious ones to pass through (false negatives). The authors acknowledge the challenges of achieving high accuracy and emphasize the importance of continuous monitoring and retraining of the shadow LLM to adapt to evolving attack techniques.

Finally, the repository is well-documented and includes a clear explanation of the underlying concepts and design choices. It’s actively maintained by Google researchers and encourages community contributions. The project’s open-source nature allows for wider scrutiny and improvement, fostering a collaborative approach to addressing the growing threat of prompt injection attacks in the rapidly evolving landscape of LLMs. The code and documentation are designed to be accessible to researchers and developers interested in building more secure LLM applications.

mangle
by
google

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

mangle
by
googlegoogle/mangle

Repository Details

mangle by google

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

mangle by googlegoogle/mangle

Repository Details

mangle
by
google

mangle
by
googlegoogle/mangle