awi

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2026 License: MIT

README

AWI

Agentic Web Interface — A zero-config, single-binary web content fetcher for AI agents. No API keys, no external services, no dependencies.

# Read a webpage
awi read https://example.com

# Search the web
awi search "Claude API rate limits" --limit 5

# Batch fetch
awi batch urls.txt --concurrency 10

Why awi?

AI agents need to read the web. Existing solutions either:

  • Require API keys (Jina Reader, Firecrawl) — costs money, rate limits, privacy concerns
  • Require deploying services (Firecrawl self-hosted) — complex setup
  • Are Python-only (Crawl4AI) — slow startup, dependency hell

awi is a single Go binary that works out of the box:

awi Jina Reader Firecrawl Crawl4AI
API key required
Install Single binary go install Docker/deploy pip install
Anti-bot bypass ✅ Built-in Cloud proxy Cloud proxy Basic
Offline capable
Language Go Go Python Python

Install

From source
go install github.com/jasonz/awi/cmd/ws@latest
From binary

Download from Releases.

Homebrew (coming soon)
brew install jasonz/tap/awi

How it works

awi uses a 3-tier backend architecture with automatic escalation:

awi read <url>
  │
  ├─ 1. direct    → Plain HTTP + readability (fastest, zero overhead)
  │     ↓ 403?
  ├─ 2. stealth   → Chrome TLS fingerprint via tls-client (bypasses Cloudflare)
  │     ↓ still blocked?
  └─ 3. browser   → Headless Chrome with anti-detection (handles JS-heavy pages)

No configuration needed. If a simple HTTP request works, it uses that. If the site has bot protection, it automatically escalates.

Usage

Read a webpage
# Default (auto-selects best backend)
awi read https://docs.python.org/3/tutorial/

# Force a specific backend
awi read https://example.com --backend direct
awi read https://spa-app.com --backend browser
awi read https://protected-site.com --backend stealth

# Force JS rendering
awi read https://react-app.com --js

# Output formats
awi read https://example.com --format json      # default
awi read https://example.com --format markdown
awi read https://example.com --format text
Search the web
# Search via DuckDuckGo (no API key needed)
awi search "golang web scraping" --limit 5
Batch fetch
# From file
awi batch urls.txt --concurrency 10

# From stdin
cat urls.txt | awi batch - --concurrency 5
Proxy support
awi read https://example.com --proxy http://user:pass@proxy:8080
awi read https://example.com --proxy socks5://127.0.0.1:1080

Output format

JSON (default)
{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "This domain is for use in illustrative examples...",
  "backend": "direct",
  "fetched_at": "2026-02-28T20:43:46Z",
  "cache_hit": false
}
Markdown
# Example Domain

Source: https://example.com
Backend: direct

This domain is for use in illustrative examples...

Configuration

Optional. Create ~/.awi/config.yaml:

# Default output format
format: json

# Default timeout
timeout: 30s

# Cache settings
cache:
  enabled: true
  ttl: 24h
  dir: ~/.awi/cache

# Network settings
network:
  proxy: ""  # default proxy for all requests

Backend details

direct
  • Pure HTTP with realistic browser headers
  • Content extraction via go-readability
  • Fallback text extraction for complex pages
  • Best for: Static HTML, blogs, documentation, news
stealth
  • Uses tls-client to mimic Chrome 120 TLS fingerprint
  • Bypasses basic Cloudflare and bot detection
  • No browser needed — pure HTTP with browser-like transport
  • Best for: Sites with Cloudflare or basic bot protection
browser
  • Headless Chrome via chromedp
  • Anti-detection: removes navigator.webdriver, disables automation flags
  • Waits for page load + network idle
  • Best for: JavaScript SPAs, dynamic content, heavy anti-bot pages

Caching

awi caches responses locally (default 24h TTL):

  • Cache dir: ~/.awi/cache/
  • SHA256-keyed JSON files
  • Cache keys include URL + backend + options
  • Disable with --no-cache
  • File permissions: 0600 (private)

For AI agent developers

awi is designed to be called by AI agents:

import subprocess
import json

result = subprocess.run(
    ["awi", "read", url, "--format", "json", "--no-cache"],
    capture_output=True, text=True
)
data = json.loads(result.stdout)
content = data["content"]

Or use it as an OpenClaw skill — SKILL.md coming soon.

Test results

Tested against 30 diverse websites across 8 categories:

Total: 30 | Pass: 27 | Fail: 2 | Flaky: 1
Score: 116/120 (96.7%)

By category:
  static:       100%
  tech_docs:     90%
  news_blog:    100%
  github:       100%
  chinese:      100%
  social_forum: 100%
  cloudflare:    83%
  edge_cases:   100%

The 2 failures are sites with aggressive enterprise-grade bot protection (OpenAI, Cloudflare.com) that require residential proxy pools to access — a limitation shared by all local CLI tools.

Roadmap

  • OpenClaw SKILL.md
  • Homebrew formula
  • awi extract — LLM-powered structured data extraction
  • Residential proxy pool integration
  • Cookie/session management
  • PDF/document parsing
  • GitHub Actions for cross-platform builds

License

MIT

Credits

Built with:

Directories

Path Synopsis
cmd
awi command
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL