htmlindex

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 1, 2025 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

View Source
const (
	BackgroundAttribute = "background"
	HrefAttribute       = "href"

	DataSrcAttribute = "data-src"
	SrcAttribute     = "src"

	DataSrcSetAttribute = "data-srcset"
	SrcSetAttribute     = "srcset"
)
View Source
const (
	ATag      = "a"
	BodyTag   = "body"
	ImgTag    = "img"
	LinkTag   = "link"
	ScriptTag = "script"
	StyleTag  = "style"
)

Variables

View Source
var Nodes = map[string]Node{
	ATag: {
		Attributes: []string{HrefAttribute},
	},
	BodyTag: {
		Attributes: []string{BackgroundAttribute},
	},
	ImgTag: {
		Attributes: []string{SrcAttribute, DataSrcAttribute, SrcSetAttribute, DataSrcSetAttribute},
		// contains filtered or unexported fields
	},
	LinkTag: {
		Attributes: []string{HrefAttribute},
	},
	ScriptTag: {
		Attributes: []string{SrcAttribute},
	},
	StyleTag: {
		// contains filtered or unexported fields
	},
}

Nodes describes the HTML tags and their attributes that can contain URL.

View Source
var SrcSetAttributes = map[string]struct{}{
	DataSrcSetAttribute: {},
	SrcSetAttribute:     {},
}

SrcSetAttributes contains the attributes that contain srcset values.

Functions

This section is empty.

Types

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index provides an index for all HTML tags of relevance for scraping.

func New

func New(logger *log.Logger) *Index

New returns a new index.

func (*Index) Index

func (idx *Index) Index(baseURL *url.URL, node *html.Node)

Index the given HTML document.

func (*Index) Nodes

func (idx *Index) Nodes(tag string) map[string][]*html.Node

Nodes returns a map of all URLs and their HTML nodes.

func (*Index) URLs

func (idx *Index) URLs(tag string) ([]*url.URL, error)

URLs returns all URLs of the references found for a specific tag.

type Node added in v0.3.0

type Node struct {
	Attributes []string
	// contains filtered or unexported fields
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL