tap

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2026 License: MIT Imports: 8 Imported by: 0

README ΒΆ

🚰 Tap

Tap into any website from your terminal.

A Go library and CLI toolkit that runs JavaScript scripts against real websites β€” fast via QuickJS, with full browser fallback when needed. Also extracts clean content from any URL via go-defuddle.

Install

CLI
go install github.com/vaayne/tap/cmd/tap@latest
Library
go get github.com/vaayne/tap

Requires Go 1.22+ and Google Chrome (or Chromium) for browser fallback.

CLI Usage

Site Scripts
# List all available scripts
tap site list

# Run a script (QuickJS first, browser fallback)
tap site v2ex/hot
tap site twitter/search query=claude
tap site bilibili/search keyword=编程 order=click

# Pipe to jq
tap site hackernews/top | jq '.stories[:3]'
Fetch Content
# Extract clean markdown from any URL
tap fetch https://example.com/article

# Output as JSON with full metadata
tap fetch --json https://example.com/article

Library Usage

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/vaayne/tap"
    "github.com/vaayne/tap/fetch"
)

func main() {
    client, err := tap.New(
        tap.WithSitesDir("./sites"),
    )
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    // Run a site script
    result, err := client.RunScript(context.Background(), "v2ex/hot", nil)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(result)

    // Fetch clean content from a URL
    content, err := client.Fetch(context.Background(), "https://example.com", &fetch.Options{
        Markdown: true,
    })
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(content.Markdown)
}

How It Works

Both tap site and tap fetch share a common transport layer with two-tier network access:

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       Shared Transport Layer     β”‚
                    β”‚  Level 1: HTTP  β”‚  Level 2: CDP  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚                β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚    tap site     β”‚    β”‚    tap fetch       β”‚
              β”‚  QuickJS β†’ CDP  β”‚    β”‚  HTTP β†’ CDP        β”‚
              β”‚  β†’ structured   β”‚    β”‚  β†’ defuddle        β”‚
              β”‚    JSON         β”‚    β”‚  β†’ markdown/HTML   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Transport layer β€” Shared HTTP client and headless Chrome (CDP) browser, configured once and used by all consumers.

Site scripts β€” Predefined recipes that know the optimal path to fetch structured data. Tries QuickJS (fast, Go-backed fetch()) first, falls back to Chrome via CDP for pages needing cookies, DOM, or auth.

Fetch β€” Generic content extraction from any URL. Tries direct HTTP first, falls back to browser for JS-rendered pages. Parses with go-defuddle to extract clean HTML/Markdown.

Configuration

All config via environment variables, .env file, or CLI flags:

Variable Flag Description Default
TAP_SITES_DIR --sites-dir Directory containing site scripts ./sites
TAP_WS_URL --ws-url Remote CDP WebSocket URL (local Chrome)
TAP_PROFILE_DIR --profile-dir Chrome profile for persistent cookies ~/.cache/tap/chrome-profile-$USER
Browser Modes

Local Chrome (default) β€” launches headless Chrome with a persistent profile so cookies survive across runs.

Remote browser β€” connect to a remote CDP endpoint:

export TAP_WS_URL=wss://your-remote-browser/ws
tap site v2ex/hot

Writing Scripts

Scripts live in sites/ organized by site name:

/* @meta
{
  "name": "site/action",
  "description": "What this script does",
  "domain": "example.com",
  "args": {
    "query": {"required": true, "description": "Search query"}
  }
}
*/

async function(args) {
  const resp = await fetch('https://api.example.com?q=' + args.query);
  return await resp.json();
}
Available Sites

100+ scripts across 30+ sites:

  • Search: Google, Bing, Baidu, DuckDuckGo
  • Social: Twitter, Weibo, Reddit, 小纒书, 即刻
  • Video: YouTube, Bilibili
  • News: Hacker News, BBC, Reuters, 今ζ—₯倴村, 36ζ°ͺ
  • Dev: GitHub, Stack Overflow, Dev.to, npm, PyPI
  • Finance: ι›ͺ球, δΈœζ–Ήθ΄’ε―Œ, Yahoo Finance
  • Knowledge: Wikipedia, ηŸ₯乎, Douban, arXiv

Run tap site list for the full list.

Project Structure

github.com/vaayne/tap/
β”œβ”€β”€ tap.go              # Client API β€” unified entry point
β”œβ”€β”€ options.go          # Functional options (WithSitesDir, WithWSURL, ...)
β”œβ”€β”€ transport/
β”‚   └── transport.go    # Shared network layer (HTTP + CDP browser)
β”œβ”€β”€ engine/
β”‚   β”œβ”€β”€ engine.go       # Engine interface + fallback orchestrator
β”‚   β”œβ”€β”€ quickjs.go      # QuickJS engine with Go fetch() polyfill
β”‚   └── browser.go      # Chrome CDP engine (delegates to transport)
β”œβ”€β”€ fetch/
β”‚   └── fetch.go        # URL β†’ clean content via go-defuddle (HTTP β†’ browser fallback)
β”œβ”€β”€ script/
β”‚   β”œβ”€β”€ parser.go       # Script @meta parser
β”‚   └── registry.go     # Script directory scanner + index
β”œβ”€β”€ cmd/tap/
β”‚   └── main.go         # CLI binary (urfave/cli)
└── sites/              # 100+ community site scripts

Roadmap

  • Site scripts with QuickJS + browser fallback
  • tap fetch <url> β€” clean content extraction
  • tap screenshot <url> β€” page screenshots
  • tap pdf <url> β€” save as PDF
  • tap eval <js> --url <url> β€” run arbitrary JS on a page
  • tap fill <script> β€” form automation

License

MIT

Documentation ΒΆ

Overview ΒΆ

Package tap provides a unified API for interacting with web pages.

Tap can run site scripts (with QuickJS β†’ Browser fallback) and fetch clean content from URLs via go-defuddle. Both share a common transport layer for HTTP and browser-based network access.

Basic usage:

client, err := tap.New(tap.WithSitesDir("./sites"))
if err != nil {
    log.Fatal(err)
}
defer client.Close()

// Run a site script
result, err := client.RunScript(ctx, "v2ex/hot", nil)

// Fetch clean content
content, err := client.Fetch(ctx, "https://example.com", nil)

Index ΒΆ

Constants ΒΆ

This section is empty.

Variables ΒΆ

This section is empty.

Functions ΒΆ

This section is empty.

Types ΒΆ

type Client ΒΆ

type Client struct {
	// contains filtered or unexported fields
}

Client is the main entry point for the tap library.

func New ΒΆ

func New(optFns ...Option) (*Client, error)

New creates a new Client with the given options.

func (*Client) Close ΒΆ

func (c *Client) Close() error

Close releases all resources.

func (*Client) Fetch ΒΆ

func (c *Client) Fetch(ctx context.Context, url string, opts *fetch.Options) (*fetch.Result, error)

Fetch retrieves a URL and extracts clean content using go-defuddle.

func (*Client) GetScript ΒΆ

func (c *Client) GetScript(name string) (*script.Script, bool)

GetScript returns a script by name.

func (*Client) ListScripts ΒΆ

func (c *Client) ListScripts() []*script.Script

ListScripts returns all available scripts sorted by name.

func (*Client) RunScript ΒΆ

func (c *Client) RunScript(ctx context.Context, name string, args map[string]string) (any, error)

RunScript executes a site script by name with the given arguments. It tries QuickJS first, then falls back to the browser (unless --browser is set).

type Option ΒΆ

type Option func(*options)

Option configures a Client.

func WithForceBrowser ΒΆ added in v0.1.1

func WithForceBrowser(force bool) Option

WithForceBrowser skips QuickJS and runs scripts directly in Chrome.

func WithHeadless ΒΆ added in v0.1.1

func WithHeadless(headless bool) Option

WithHeadless sets whether Chrome runs in headless mode (default: true).

func WithProfileDir ΒΆ

func WithProfileDir(dir string) Option

WithProfileDir sets the Chrome user data directory for persistent cookies/storage. Defaults to ~/.cache/tap/chrome-profile-$USER.

func WithSitesDir ΒΆ

func WithSitesDir(dir string) Option

WithSitesDir sets the directory containing site scripts.

func WithTimeout ΒΆ added in v0.1.1

func WithTimeout(d time.Duration) Option

WithTimeout sets the execution timeout for scripts and fetches.

func WithWSURL ΒΆ

func WithWSURL(url string) Option

WithWSURL sets the remote CDP WebSocket URL. If empty, a local Chrome is launched.

type ScriptNotFoundError ΒΆ added in v0.1.1

type ScriptNotFoundError struct {
	Name      string
	Available []string
}

ScriptNotFoundError is returned when a script name doesn't match any registered script.

func (*ScriptNotFoundError) Error ΒΆ added in v0.1.1

func (e *ScriptNotFoundError) Error() string

func (*ScriptNotFoundError) Suggestions ΒΆ added in v0.1.1

func (e *ScriptNotFoundError) Suggestions(max int) []string

Suggestions returns script names similar to the requested name, ranked by relevance.

Directories ΒΆ

Path Synopsis
cmd
tap command
Package engine provides execution engines for running site scripts.
Package engine provides execution engines for running site scripts.
Package fetch provides URL content extraction using go-defuddle.
Package fetch provides URL content extraction using go-defuddle.
Package script handles parsing and discovery of site scripts.
Package script handles parsing and discovery of site scripts.
Package transport provides a shared network layer for fetching web content.
Package transport provides a shared network layer for fetching web content.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL