Alibaba Page Agent — Open Source In-Page GUI Agent for Natural Language Web Control
Alibaba has open-sourced Page Agent, a JavaScript library that turns any web page into an agent-operable interface. Add a single script tag and an AI agent can control the page through natural language commands — clicking buttons, filling forms, navigating menus, and extracting data via pure DOM manipulation.
Unlike browser automation agents that rely on screenshots and vision models, Page Agent works directly with the DOM. It's lightweight, requires no browser extension or headless Chrome, and runs entirely in the page context. It supports any LLM backend — OpenAI, Claude, DeepSeek, Qwen, Gemini, or local models via Ollama.
A built-in human approval step ensures users can review and approve each action before execution, keeping humans in the loop for critical operations. The MIT license makes it freely adaptable for any use case.
Page Agent represents a different approach to web agents: instead of agents operating browsers from outside, the agent lives inside the web page itself. This makes it easier to integrate into existing web applications and gives agents direct access to the DOM structure rather than relying on pixel-level interpretation.
GitHub: https://github.com/alibaba/page-agent
Demo: https://alibaba.github.io/page-agent/
← Back to all articles
Unlike browser automation agents that rely on screenshots and vision models, Page Agent works directly with the DOM. It's lightweight, requires no browser extension or headless Chrome, and runs entirely in the page context. It supports any LLM backend — OpenAI, Claude, DeepSeek, Qwen, Gemini, or local models via Ollama.
A built-in human approval step ensures users can review and approve each action before execution, keeping humans in the loop for critical operations. The MIT license makes it freely adaptable for any use case.
Page Agent represents a different approach to web agents: instead of agents operating browsers from outside, the agent lives inside the web page itself. This makes it easier to integrate into existing web applications and gives agents direct access to the DOM structure rather than relying on pixel-level interpretation.
GitHub: https://github.com/alibaba/page-agent
Demo: https://alibaba.github.io/page-agent/