{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}

Page Agent

Alibaba/page-agent를 자신의 웹 응용 프로그램에 임베드하세요. - 단일 < script> 태그 또는 npm 패키지로 배송하는 순수 - JavaScript in-page GUI 에이전트를 사용하여 사이트의 최종 사용자를 자연적인 언어로 구동합니다 ("클릭 로그인, John으로 사용자 이름을 채우기"). 파이썬 없음, 헤드리스 브라우저 없음, 확장 없음. 사용자는 SaaS / 관리자 패널 / 도구에 AI copilot을 추가하려는 웹 개발자가 자연 언어를 통해 액세스 할 수있는 레거시 웹 응용 프로그램을 만들거나 로컬 (Ollama) 또는 클라우드 (Qwen / OpenAI / OpenRouter) LLM에 대한 페이지 시약을 평가합니다. 서버 측 브라우저 자동화를 위해 - 대신 Hermes의 내장 브라우저 도구에 해당 사용자를 지적합니다.

기술 메타데이터


소스	선택 사항 - `hermes skills install official/web-development/page-agent`로 설치
경로	`optional-skills/web-development/page-agent`
버전	`1.0.0`
저자	Hermes Agent
라이선스	MIT
플랫폼	linux, macos, windows
태그	`web`, `javascript`, `agent`, `browser`, `gui`, `alibaba`, `embed`, `copilot`, `saas`

참고: 전체 SKILL.md

정보

아래는 Hermes가 이 스킬을 활성화할 때 로드하는 원문 SKILL.md 정의입니다. 명령어, 코드, 식별자를 정확히 보존하기 위해 이 참조 블록은 원문을 유지합니다.

# page-agent

alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.

## When to use this skill

Load this skill when a user wants to:

- **Ship an AI copilot inside their own web app** (SaaS, admin panel, B2B tool, ERP, CRM) — "users on my dashboard should be able to type 'create invoice for Acme Corp and email it' instead of clicking through five screens"
- **Modernize a legacy web app** without rewriting the frontend — page-agent drops on top of existing DOM
- **Add accessibility via natural language** — voice / screen-reader users drive the UI by describing what they want
- **Demo or evaluate page-agent** against a local (Ollama) or hosted (Qwen, OpenAI, OpenRouter) LLM
- **Build interactive training / product demos** — let an AI walk a user through "how to submit an expense report" live in the real UI

## When NOT to use this skill

- User wants **Hermes itself to drive a browser** → use Hermes' built-in browser tool (Browserbase / Camofox). page-agent is the *opposite* direction.
- User wants **cross-tab automation without embedding** → use Playwright, browser-use, or the page-agent Chrome extension
- User needs **visual grounding / screenshots** → page-agent is text-DOM only; use a multimodal browser agent instead

## Prerequisites

- Node 22.13+ or 24+, npm 10+ (docs claim 11+ but 10.9 works fine)
- An OpenAI-compatible LLM endpoint: Qwen (DashScope), OpenAI, Ollama, OpenRouter, or anything speaking `/v1/chat/completions`
- Browser with devtools (for debugging)

## Path 1 — 30-second demo via CDN (no install)

Fastest way to see it work. Uses alibaba's free testing LLM proxy — **for evaluation only**, subject to their terms.

Add to any HTML page (or paste into the devtools console as a bookmarklet):

```html
&lt;script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js" crossorigin="true"&gt;&lt;/script&gt;
```

A panel appears. Type an instruction. Done.

Bookmarklet form (drop into bookmarks bar, click on any page):

```javascript
javascript:(function()&#123;var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);&#125;)();
```

## Path 2 — npm install into your own web app (production use)

Inside an existing web project (React / Vue / Svelte / plain):

```bash
npm install page-agent
```

Wire it up with your own LLM endpoint — **never ship the demo CDN to real users**:

```javascript
import &#123; PageAgent &#125; from 'page-agent'

const agent = new PageAgent(&#123;
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: process.env.LLM_API_KEY,   // never hardcode
    language: 'en-US',
&#125;)

// Show the panel for end users:
agent.panel.show()

// Or drive it programmatically:
await agent.execute('Click submit button, then fill username as John')
```

Provider examples (any OpenAI-compatible endpoint works):

| Provider | `baseURL` | `model` |
|----------|-----------|---------|
| Qwen / DashScope | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen3.5-plus` |
| OpenAI | `https://api.openai.com/v1` | `gpt-4o-mini` |
| Ollama (local) | `http://localhost:11434/v1` | `qwen3:14b` |
| OpenRouter | `https://openrouter.ai/api/v1` | `anthropic/claude-sonnet-4.6` |

**Key config fields** (passed to `new PageAgent({...})`):

- `model`, `baseURL`, `apiKey` — LLM connection
- `language` — UI language (`en-US`, `zh-CN`, etc.)
- Allowlist and data-masking hooks exist for locking down what the agent can touch — see https://alibaba.github.io/page-agent/ for the full option list

**Security.** Don't put your `apiKey` in client-side code for a real deployment — proxy LLM calls through your backend and point `baseURL` at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation.

## Path 3 — clone the source repo (contributing, or hacking on it)

Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension.

```bash
git clone https://github.com/alibaba/page-agent.git
cd page-agent
npm ci              # exact lockfile install (or `npm i` to allow updates)
```

Create `.env` in the repo root with an LLM endpoint. Example:

```
LLM_MODEL_NAME=gpt-4o-mini
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1
```

Ollama flavor:

```
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=NA
LLM_MODEL_NAME=qwen3:14b
```

Common commands:

```bash
npm start           # docs/website dev server
npm run build       # build every package
npm run dev:demo    # serve IIFE bundle at http://localhost:5174/page-agent.demo.js
npm run dev:ext     # develop the browser extension (WXT + React)
npm run build:ext   # build the extension
```

**Test on any website** using the local IIFE bundle. Add this bookmarklet:

```javascript
javascript:(function()&#123;var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);&#125;)();
```

Then: `npm run dev:demo`, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save.

**Warning:** your `.env` `LLM_API_KEY` is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from `.env`.)

## Repo layout (Path 3)

Monorepo with npm workspaces. Key packages:

| Package | Path | Purpose |
|---------|------|---------|
| `page-agent` | `packages/page-agent/` | Main entry with UI panel |
| `@page-agent/core` | `packages/core/` | Core agent logic, no UI |
| `@page-agent/mcp` | `packages/mcp/` | MCP server (beta) |
| — | `packages/llms/` | LLM client |
| — | `packages/page-controller/` | DOM ops + visual feedback |
| — | `packages/ui/` | Panel + i18n |
| — | `packages/extension/` | Chrome/Firefox extension |
| — | `packages/website/` | Docs + landing site |

## Verifying it works

After Path 1 or Path 2:
1. Open the page in a browser with devtools open
2. You should see a floating panel. If not, check the console for errors (most common: CORS on the LLM endpoint, wrong `baseURL`, or a bad API key)
3. Type a simple instruction matching something visible on the page ("click the Login link")
4. Watch the Network tab — you should see a request to your `baseURL`

After Path 3:
1. `npm run dev:demo` prints `Accepting connections at http://localhost:5174`
2. `curl -I http://localhost:5174/page-agent.demo.js` returns `HTTP/1.1 200 OK` with `Content-Type: application/javascript`
3. Click the bookmarklet on any site; panel appears

## Pitfalls

- **Demo CDN in production** — don't. It's rate-limited, uses alibaba's free proxy, and their terms forbid production use.
- **API key exposure** — any key passed to `new PageAgent({apiKey: ...})` ships in your JS bundle. Always proxy through your own backend for real deployments.
- **Non-OpenAI-compatible endpoints** fail silently or with cryptic errors. If your provider needs native Anthropic/Gemini formatting, use an OpenAI-compatibility proxy (LiteLLM, OpenRouter) in front.
- **CSP blocks** — sites with strict Content-Security-Policy may refuse to load the CDN script or disallow inline eval. In that case, self-host from your origin.
- **Restart dev server** after editing `.env` in Path 3 — Vite only reads env at startup.
- **Node version** — the repo declares `^22.13.0 || >=24`. Node 20 will fail `npm ci` with engine errors.
- **npm 10 vs 11** — docs say npm 11+; npm 10.9 actually works fine.

## Reference

- Repo: https://github.com/alibaba/page-agent
- Docs: https://alibaba.github.io/page-agent/
- License: MIT (built on browser-use's DOM processing internals, Copyright 2024 Gregor Zunic)

기술 메타데이터​

참고: 전체 SKILL.md​

기술 메타데이터

참고: 전체 SKILL.md