AssetOpsBench
by
IBM

Description: AssetOpsBench - Industry 4.0: A unified benchmark and framework for building, orchestrating, and evaluating domain-specific AI agents for Industry 4.0 asset operations and maintenance, with 460+ scenarios, 5 specialist agents (IoT, FMSR, TSFM, Work Order,...), and multi-agent orchestration blueprints (MetaAgent, AgentHive) over MCP.

View IBM/AssetOpsBench on GitHub ↗

Summary Information

Updated 27 minutes ago

Added to GitGenius on March 8th, 2026

Created on May 1st, 2025

Open Issues & Pull Requests: 37 (+0)

Number of forks: 258

Total Stargazers: 1,740 (+1)

Total Subscribers: 9 (+0)

Issue Activity (beta)

Open issues: 22

New in 7 days: 6

Closed in 7 days: 1

Avg open age: 25 days

Stale 30+ days: 0

Stale 90+ days: 0

Recent activity

Opened in 7 days: 6

Closed in 7 days: 1

Comments in 7 days: 15

Events in 7 days: 55

Top labels

enhancement (27)
External contribution (14)
stale (11)
documentation (8)
bug (7)
integration (4)
experimental (3)
help wanted (3)

Most active issues this week

#283 Rule Logic Scenario Verification and Execution - 14 events / 6 comments
#282 Work Order Scenario Verification and Execution - 10 events / 2 comments
#339 SImulator for plan evaluation - 8 events / 0 comments
#346 Proposal: reliability extension for AssetOpsBench with trace diagnostics - 8 events / 2 comments
#281 Support multi-LLM judging for more robust evaluation - 5 events / 1 comments

Explore full issue details

Detailed Description

AssetOpsBench is a comprehensive benchmark designed to advance the development and evaluation of AI agents for industrial asset operations and maintenance within the context of Industry 4.0. Its primary purpose is to provide a unified framework for researchers and practitioners to build, orchestrate, and assess the performance of domain-specific AI agents in realistic industrial scenarios. The repository offers a valuable resource for those seeking to apply AI to improve efficiency, reliability, and predictive capabilities in industrial settings.

The core functionality of AssetOpsBench revolves around its ability to simulate and evaluate multi-step workflows within a controlled environment. It achieves this through a combination of domain-specific AI agents, multi-agent orchestration frameworks, and a rich dataset of industrial scenarios. The repository provides four key domain-specific agents, each designed to handle specific tasks relevant to asset operations: an IoT Agent for data retrieval, an FMSR (Failure Mode, Sensor, and Rule) Agent for failure analysis, a TSFM (Time Series Forecasting and Monitoring) Agent for predictive maintenance, and a WO (Work Order) Agent for automating work order generation. These agents act as the building blocks for more complex, multi-step workflows.

To facilitate the orchestration of these agents, AssetOpsBench offers two multi-agent frameworks: MetaAgent, which utilizes a reAct-based single-agent-as-tool approach, and AgentHive, which employs a plan-and-execute sequential workflow. These frameworks provide different strategies for coordinating the actions of the domain-specific agents, allowing users to experiment with various approaches to solve complex industrial problems. The repository also includes the necessary infrastructure, including MCP (Model Context Protocol) servers and a plan-execute runner, to support the execution and evaluation of these agents.

A key component of AssetOpsBench is its extensive dataset of 141 industrial scenarios. These scenarios cover a wide range of asset classes, including turbines, HVAC systems, pumps, and more. The tasks within these scenarios span various domains, such as IoT data retrieval, failure mode analysis, time series forecasting, and work order generation. Some scenarios focus on single-domain tasks, while others represent complex, multi-step end-to-end workflows, providing a comprehensive testing ground for AI agents. The dataset is available on Hugging Face, making it easily accessible for researchers and practitioners.

The repository also provides a leaderboard for evaluating the performance of AI agents. The leaderboards are generated using seven different Large Language Models (LLMs), and the trajectories of the agents are scored using an LLM Judge (Llama-4-Maverick-17B). The evaluation criteria are based on six dimensions, measuring reasoning, execution, and data handling capabilities. This allows for a standardized and objective comparison of different AI agent approaches.

AssetOpsBench is actively being expanded and improved. The repository includes a call for contributions, inviting researchers and practitioners to submit new scenarios and expand the scope of the benchmark. The project is also actively involved in the academic community, with accepted papers at major conferences such as NeurIPS, EMNLP, and AAAI. The repository provides resources such as tutorials, papers, and blog posts to help users understand and utilize the benchmark effectively. Furthermore, the repository offers pre-built Docker images and a detailed setup guide, making it easy for users to run and experiment with AssetOpsBench. Overall, AssetOpsBench is a valuable resource for anyone interested in developing and evaluating AI agents for industrial asset operations and maintenance.

AssetOpsBench
by
IBM

Summary Information

Issue Activity (beta)

Recent activity

Top labels

Most active issues this week