trulens
by
truera

Description: Evaluation and Tracking for LLM Experiments and AI Agents

View on GitHub ↗

Summary Information

Updated 11 minutes ago

Added to GitGenius on May 28th, 2024

Created on November 2nd, 2020

Open Issues & Pull Requests: 108 (+0)

Number of forks: 309

Total Stargazers: 3,434 (+0)

Total Subscribers: 23 (+0)

Issue Activity (beta)

Open issues: 50

New in 7 days: 0

Closed in 7 days: 1

Avg open age: 40 days

Stale 30+ days: 25

Stale 90+ days: 0

Recent activity

Opened in 7 days: 0

Closed in 7 days: 0

Comments in 7 days: 0

Events in 7 days: 0

Top labels

bug (149)
enhancement (86)
help wanted (56)
good first issue (43)
documentation (26)
Examples (12)
question (12)
feature (8)

Most active issues this week

#2585 Finance + worldview eval datasets - 2 events / 1 comments

Explore full issue details

Repository Insights (GitGenius)

Median issue/PR response: 0.0 hours

Mean response time: 9.6 days

90th percentile: 9.9 days

Tracked items: 260

Most active contributors

sfc-gh-jreini - 404 events, 155 issues
joshreini1 - 256 events, 101 issues
sfc-gh-pdharmana - 51 events, 40 issues
piotrm0 - 45 events, 23 issues
yuvneshtruera - 37 events, 29 issues

Related by overlapping contributors

Detailed Description

TruLens is a Python-based evaluation and tracking framework designed for LLM experiments and AI agents. The project provides systematic evaluation capabilities that go beyond informal testing, enabling developers to understand performance, identify failure modes, and iteratively improve their applications as they develop prompts, models, retrievers, and knowledge sources.

The framework is built on OpenTelemetry-based tracing, making every function call, LLM generation, retrieval, and tool invocation visible as structured OTEL spans. This architecture ensures interoperability with existing observability infrastructure, allowing users to export traces to Jaeger, Grafana Tempo, Datadog, or any OTLP-compatible backend. The instrumentation is described as fine-grained and stack-agnostic, providing comprehensive visibility across different application architectures.

TruLens introduces seven purpose-built evaluators specifically designed for agentic systems. These evaluators measure distinct aspects of agent behavior including LogicalConsistency for reasoning coherence and hallucination detection, ExecutionEfficiency for identifying redundant steps and wasted computation, PlanAdherence for tracking whether execution followed stated plans, PlanQuality for assessing intrinsic strategy quality, ToolSelection for verifying correct tool choices, ToolCalling for validating argument validity and output interpretation, and ToolQuality for measuring external tool reliability. The framework also supports Model Context Protocol tool calls through dedicated MCP span types that capture tool names, arguments, output, and latency.

The evaluation system operates in multiple modes, supporting batch evaluation on existing data, inline evaluation alongside running applications, and offline evaluation workflows. A flexible Selector API allows developers to target any span attribute for evaluation. Core concepts documented in the project include Feedback Functions, the RAG Triad, and Honest, Harmless and Helpful Evals frameworks.

TruLens supports a broad range of LLM providers through dedicated packages including OpenAI and Azure OpenAI, LiteLLM for Anthropic and other providers, Google Gemini, AWS Bedrock, Snowflake Cortex, HuggingFace, and LangChain models. The project is available on PyPI and includes interactive examples runnable in Google Colab.

According to GitGenius activity tracking, the repository shows strong engagement with 260 tracked issues and pull requests. The median response latency is 0.0 hours while the mean is 231.1 hours, indicating rapid initial responses to community submissions. Bug reports represent the most active label category with 140 items, followed by enhancement requests with 78 items and help wanted issues with 56 items. The most active contributors tracked are sfc-gh-jreini with 404 events and joshreini1 with 256 events. The project shares contributors with major ecosystem projects including run-llama/llama_index, mastra-ai/mastra, and langchain-ai/langchain, indicating integration within the broader LLM application development ecosystem. GitGenius classifies the repository across multiple domains including AI model transparency, fairness assessment, bias detection, AI risk management, explainable AI techniques, and algorithmic accountability.

trulens
by
truera

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

trulens
by
trueratruera/trulens

Repository Details

trulens by truera

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week

Repository Insights (GitGenius)

Most active contributors

Related by overlapping contributors

trulens by trueratruera/trulens

Repository Details

trulens
by
truera

trulens
by
trueratruera/trulens