structhtml

package module
v0.0.0-...-822286a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2025 License: MIT Imports: 14 Imported by: 3

README

go-structhtml

Package structhtml provides tools for working with structured-HTML, for the Go programming language.

Structured-HTML are data-formats created that use HTML underneath.

A Structured HTML has its own media-type of the form text/???+html. Where the ??? is replaced with something. For example:

  • text/article+html
  • text/note+html
  • text/person+html

Documention

Online documentation, which includes examples, can be found at: http://godoc.org/codeberg.org/reiver/go-structhtml

GoDoc

Import

To import package structhtml use import code like the following:

import "codeberg.org/reiver/go-structhtml"

Installation

To install package structhtml do the following:

GOPROXY=direct go get codeberg.org/reiver/go-structhtml

Author

Package structhtml was written by Charles Iliya Krempeaux

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExtractCategories

func ExtractCategories(structuredHTML string) []string

ExtractCategories extracts the the (zero, one, or many) 'category' from structured-HTML data.

For example, if the structured-HTML data is:

<article>
	<h1>Supercalifragilisticexpialidocious</h1>
	<p>
		It's supercalifragilisticexpialidocious.
		<br />
		Even though the sound of it is something quite atrocious.
		<br />
		If you say it loud enough you'll always sound precocious.
	</p>
	<meta name="category" content="song" />
	<meta name="category" content="music" />
</article>

Then the extracted categories are:

[]string{"song", "music"}

The categories are declared using HTML <meta> elements with name-category.

func ExtractTitle

func ExtractTitle(structuredHTML string) string

ExtractTitle extracts the the 'title' from structured-HTML data.

For example, if the structured-HTML data is:

<article>
	<h1>Supercalifragilisticexpialidocious</h1>
	<p>
		It's supercalifragilisticexpialidocious.
		<br />
		Even though the sound of it is something quite atrocious.
		<br />
		If you say it loud enough you'll always sound precocious.
	</p>
</article>

Then the extracted title is:

"Supercalifragilisticexpialidocious"

The title is the first <h1>, <h2>, <h3>, <h4>, <h5>, or <h6>.

func MediaType

func MediaType(prefix string) string

MediaType returns a structured HTML media-type for the provided prefix.

For example:

mediaType := structhtml.MediaType("article")
// mediaType == "text/article+html"
Example
var prefix string = "quotation"

mediaType := structhtml.MediaType(prefix)

fmt.Printf("Media-Type: %s\n", mediaType)
Output:

Media-Type: text/quotation+html

func SanitizeBytes

func SanitizeBytes(html []byte) []byte

SanitizeBytes returns the sanitized version of the HTML that is passed to it.

For example:

var html []byte = []byte(`One Two <script>alert("Hello!")</script> Three`)

sanitizedHTML := harticle.SanitizeBytes(html)

See also: SanitizeString

func SanitizeString

func SanitizeString(html string) string

SanitizeString returns the sanitized version of the HTML that is passed to it.

For example:

var html string = `One Two <script>alert("Hello!")</script> Three`

sanitizedHTML := harticle.SanitizeString(html)

See also: SanitizeBytes

Types

type Author

type Author struct {
	Name      string
	Reference string
}

Author is used with ExtractAuthors.

func ExtractAuthors

func ExtractAuthors(structuredHTML string) []Author

ExtractAuthors extracts the the 'authors' from structured-HTML data. There can be zero, one, or many authors in structured-HTML data.

In the structured-HTML, author data looks like this.

<a rel="author" href="http://example.com/~joeblow">Jow Blow</a>

The author-name is: Joe Blow / The author-reference is: http://example.com/~joeblow

The thing about that HTML archor element (<a>) that makes it an author is the presense of the rel-author code.

type HTTPHandler

type HTTPHandler struct {
	MediaType         string
	Logger            Logger
	SubHandler        http.Handler
	TransformerToHTML TransformerToHTML
}

HTTPHandler is an http.Handler middleware that (potentially) transforms structured-HTML that has a particular media-type to (regular) HTML.

HTTPHandler calls the sub-http.Handler's ServeHTTP method, and looks at the Content-Type of what the sub-http.Handler wrote to its http.ResponseWriter. If that Content-Type has the particular media-type HTTPHandler is looking for then it MIGHT do the transformation.

If the HTTP requst's Accept header does NOT contain the particular media-type HTTPHandler is looking for. then HTTPHandler does the transformation.

For example:

var handler http.Handler = structhtml.HTTPHandler{
	MediaType: "text/article+html",             // <---- the media-type that would need to be set by the sub-http.Handler to trigger HTTPHandler to do the transformation.
	SubHandler: subHandler,                     // <---- the sub-HTTPHandler
	SubHandler: transformToHTML,                // <---- the function that does the transformation
}

func (HTTPHandler) ServeHTTP

func (receiver HTTPHandler) ServeHTTP(responseWriter http.ResponseWriter, request *http.Request)

ServeHTTP makes HTTPHandler be an http.Handler.

type Logger

type Logger = log.Logger

type TransformerToHTML

type TransformerToHTML interface {
	TransformToHTML([]byte) ([]byte, error)
}

TransformerToHTML represents somethnig that transformed structured-HTML to (regular) HTML.

See also: HTTPHandler

type TransformerToHTMLFunc

type TransformerToHTMLFunc func([]byte) ([]byte, error)

TransformerToHTMLFunc turns a function with a particular signature to a TransformerToHTML.

See also: HTTPHandler

func (TransformerToHTMLFunc) TransformToHTML

func (fn TransformerToHTMLFunc) TransformToHTML(data []byte) ([]byte, error)

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL