Grafana Labs open-sources o11y-bench and expands AI observability tools to OSS users, aiming to standardise how AI agents are monitored and evaluated at scale.
Grafana Labs has advanced its open-source strategy for AI systems by open-sourcing o11y-bench, a new benchmark designed to evaluate AI agents in real-world observability workflows, while extending AI-powered tooling to Grafana OSS environments.
The o11y-bench framework, built on Harbor and designed to run on live Grafana stacks, measures how agents perform across metrics, logs, and traces, incident investigations, and dashboard modifications. Unlike conventional benchmarks, it focuses on real-world agent behaviour rather than static outputs, aiming to standardise evaluation across open, multi-tool observability ecosystems.
Alongside this, the company introduced AI Observability in Grafana Cloud (public preview), enabling real-time monitoring of LLM-powered applications. The platform provides visibility into inputs, outputs, and execution flows, while continuously evaluating responses for anomalies, policy violations, and risks such as data leakage or misuse. It also elevates agent interactions into telemetry signals, addressing gaps where traditional observability fails to capture silent degradation or inconsistent AI behaviour.
Grafana Assistant has been expanded beyond cloud environments to include on-premises Grafana Enterprise and Grafana OSS (via cloud integration), bringing AI-assisted monitoring, automation, and workflow orchestration into open deployments.
Further strengthening agent-driven workflows, the new Grafana Cloud CLI (GCX) integrates observability directly into AI-native development environments, enabling agents to query telemetry, correlate code changes, and recommend fixes.
“AI systems are starting to look a lot like distributed systems did a decade ago: powerful, but difficult to reason about and even harder to operate,” said Jen Villa.
“AI breaks in ways traditional observability wasn’t designed for,” added Mat Ryer.
These moves reinforce Grafana Labs’ open-core approach, combining commercial cloud capabilities with open standards, open ecosystems, and community-driven innovation for AI observability.



