AgentFlow automatically synthesizes multi-agent systems to uncover Chrome sandbox escape zero-day vulnerabilities

robot
Abstract generation in progress

According to Beating Monitoring, UCSB Feng Yu’s team, in collaboration with fuzz.land and other organizations, proposed AgentFlow, an automated system that synthesizes multiple agent harnesses (programs that coordinate agent roles, information transfer, tool allocation, and retry logic) for vulnerability discovery. The paper states that when the model remains unchanged, simply modifying the harness can improve success rates by several times, but existing solutions are mostly handcrafted or only search partial design spaces.

AgentFlow uses a typed graph DSL to unify the five dimensions of harnesses (roles, topology, message patterns, tool binding, coordination protocols) into an editable graph program, allowing step-by-step simultaneous modifications to agents, topology, prompts, and toolsets. The outer loop identifies failure points based on runtime signals such as target program coverage and sanitizer reports, replacing binary feedback of success/failure. On TerminalBench-2, combined with Claude Opus 4.6, it achieved 84.3% (75/89), the highest score among similar entries on that leaderboard.

On the Chrome codebase (35 million lines of C/C++), the system synthesized a harness containing 18 roles and approximately 210 agents, including 7 subsystem analyzers, 192 parallel explorers, and a four-stage crash classification pipeline, with dedicated agents like Crash Filter and Root Cause Analyzer deduplicating crashes using unique ASAN crash signatures. Running Kimi K2.5, an open-source model, on 192 H100 GPUs for 7 days, it discovered 10 zero-day vulnerabilities, all confirmed by Chrome VRP. Six have been assigned CVE numbers, involving WebCodecs, Proxy, Network, Codecs, and Rendering, with types including UAF, integer overflow, and heap buffer overflow, among which CVE-2026-5280 and CVE-2026-6297 are critical sandbox escape vulnerabilities.

Fuzz.land co-founder Shou Chaofan stated that some vulnerabilities were initially discovered using MiniMax M2.5, and MiniMax M2.5 along with Opus 4.6 can also find most of them. AgentFlow has been open-sourced.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin