AgentFlow Automatically Synthesizes Multi-Agent System to Uncover Chrome Sandbox Escape Zero-Day Vulnerabilities

According to monitoring by Dongcha Beating, the UCSB team led by Feng Yu, in collaboration with fuzz.land and other organizations, has proposed AgentFlow, an automatic synthesis of multi-agent harnesses (programs that orchestrate agent roles, information transfer, tool allocation, and retry logic) for vulnerability discovery. The paper points out that when the model remains unchanged, merely altering the harness can significantly improve success rates, yet existing solutions are mostly manually written or only search local design spaces. AgentFlow utilizes a typed graph DSL to unify five dimensions of the harness (roles, topology, message patterns, tool bindings, and coordination protocols) into an editable graph program, allowing simultaneous modifications to agents, topology, prompts, and toolsets in a single step. The outer loop identifies failure points from runtime signals such as coverage of the target program and sanitizer reports, replacing the binary feedback of pass/fail. On TerminalBench-2, combined with Claude Opus 4.6, it achieved an 84.3% success rate (75/89), the highest score in its category on that leaderboard. In the Chrome codebase (35 million lines of C/C++), the system synthesized a harness containing 18 roles and approximately 210 agents, including 7 subsystem analyzers, 192 parallel explorers, and a four-stage crash classification pipeline, where dedicated agents like Crash Filter and Root Cause Analyzer deduplicate using unique ASAN crash signatures. Running the open-source model Kimi K2.5 on 192 H100 units for 7 days, it discovered 10 zero-day vulnerabilities, all confirmed by Chrome’s VRP. Six have been assigned CVE numbers, involving WebCodecs, Proxy, Network, Codecs, and Rendering, with types including UAF, integer overflow, and heap buffer overflow, among which CVE-2026-5280 and CVE-2026-6297 are critical-level sandbox escapes. Fuzz.land co-founder Shou Chaofan stated that some vulnerabilities were initially discovered using MiniMax M2.5, which, along with Opus 4.6, can also find most vulnerabilities. AgentFlow has been open-sourced.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin