htmlprocessor

package

v0.0.0-...-75bc046 Latest Latest Go to latest Published: Mar 24, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/edgecomet/engine

Links

Open Source Insights

Documentation ¶

Index ¶

type Document
- func ParseWithDOM(htmlBytes []byte) (Document, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Document ¶

type Document interface {
	// Title extracts the page title from <title> tag.
	// Returns empty string if not found.
	// Truncates to 200 characters (runes, not bytes).
	Title() string

	// IndexationStatus determines page indexability with priority:
	// non-200 > blocked by meta > non-canonical > indexable
	IndexationStatus(statusCode int, finalURL string) types.IndexStatus

	// CleanScripts removes executable script elements.
	// Returns true if any were removed.
	CleanScripts() bool

	// GoQueryDoc returns the underlying goquery Document for advanced queries.
	GoQueryDoc() *goquery.Document

	// HTML returns current HTML as bytes (re-serialized from DOM).
	HTML() []byte

	// ExtractPageSEO extracts comprehensive SEO metadata from the document.
	// statusCode and pageURL are needed for IndexationStatus calculation.
	ExtractPageSEO(statusCode int, pageURL string) *types.PageSEO
}

Document provides methods for processing HTML documents.

func ParseWithDOM ¶

func ParseWithDOM(htmlBytes []byte) (Document, error)

ParseWithDOM parses HTML bytes into a Document using DOM parsing.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL