2026年3月16日FrameworkOpen SourceAgentsTool

Alibaba Page Agent — Open Source In-Page GUI Agent for Natural Language Web Control

Alibaba has open-sourced Page Agent, a JavaScript library that turns any web page into an agent-operable interface. Add a single script tag and an AI agent can control the page through natural language commands — clicking buttons, filling forms, navigating menus, and extracting data via pure DOM manipulation.

Unlike browser automation agents that rely on screenshots and vision models, Page Agent works directly with the DOM. It's lightweight, requires no browser extension or headless Chrome, and runs entirely in the page context. It supports any LLM backend — OpenAI, Claude, DeepSeek, Qwen, Gemini, or local models via Ollama.

A built-in human approval step ensures users can review and approve each action before execution, keeping humans in the loop for critical operations. The MIT license makes it freely adaptable for any use case.

Page Agent represents a different approach to web agents: instead of agents operating browsers from outside, the agent lives inside the web page itself. This makes it easier to integrate into existing web applications and gives agents direct access to the DOM structure rather than relying on pixel-level interpretation.

GitHub: https://github.com/alibaba/page-agent
Demo: https://alibaba.github.io/page-agent/
← 上一篇
UnityAI 完成 850 万美元 A 轮融资,部署医疗领域自主 AI 智能体
下一篇 →
阿里巴巴 Page Agent — 开源页内 GUI 智能体,自然语言控制网页
← 返回所有文章

评论

加载中...
>_