assembllm-htmltools

command module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2024 License: BSD-3-Clause Imports: 4 Imported by: 0

README

assembllm HTML Tools

Extism web assembly plugin written in Go that performs web scraping and HTML rewriting. The plugin exposes two main functionalities: scraper and htmlrewrite.

Built to aid in generating prompts and related results using assembllm, but can be used by any Extism host.

Features

  • The scraper function allows extracting text content from HTML elements that match a given CSS selector.
  • The htmlrewrite function modifies HTML content based on a set of rewrite rules, where each rule specifies a CSS selector and new HTML content to replace the matched elements.

Usage

Scraper

Input

The scraper function expects a JSON input with the following structure:

  • html: The HTML content as a string.
  • selector: A CSS selector to identify the elements to extract text from.

Output

The function outputs the text content of the matched elements.

Example:

extism call \
    assembllm-htmltools.wasm scraper \
    --input='{"html": "<ul> <li>foo</li> <li>bar</li> </ul><p class='\''moon'\''>test text</p>", "selector": ".moon"}' \
    --wasi

# ==> test text
HTML Rewriter

The htmlrewrite function expects a JSON input with the following structure:

{
  "html": "<html-content>",
  "rules": [
    {
      "selector": "<css-selector>",
      "html_content": "<new-html-content>"
    },
    ...
  ]
}
  • html: The HTML content as a string.
  • selector: A CSS selector to identify the elements to extract text from.

Output

The function outputs the modified HTML content as a string.

Example:

extism call assembllm-htmltools.wasm htmlrewrite \
    --input='{"html": "<html><body><h1>Title</h1><p>This is a paragraph.</p><div>Some <span>nested</span> text.</div></body></html>", "rules": [{"selector": "p", "html_content": "This is the new paragraph content."}, {"selector": "div", "html_content": "<b>New nested content</b>"}]}' \
    --wasi

# ==> <html><head></head><body><h1>Title</h1><p>This is the new paragraph content.</p><div><b>New nested content</b></div></body></html>

Build

make build

Test

make test

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL