parser

package
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 3, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package parser provides file parsers for extracting URLs from various file formats. It defines shared types, helpers, and multi-file processing functions used by all parsers.

Package parser provides file parsers for extracting URLs from various file formats. This file implements the parser registry which manages different file type parsers.

Index

Constants

This section is empty.

Variables

View Source
var URLRegex = regexp.MustCompile(`https?://[^\s"'\]\}>,]+`)

URLRegex matches HTTP/HTTPS URLs. Exported for use by subpackage parsers.

Functions

func BuildLineIndex added in v0.1.3

func BuildLineIndex(content []byte) []int

BuildLineIndex creates an index of byte offsets for the start of each line. This index enables O(log n) line/column lookups from byte offsets, which is more efficient than scanning from the start for each lookup. Exported for use by subpackage parsers.

func CleanURLTrailing added in v0.1.3

func CleanURLTrailing(url string) string

CleanURLTrailing removes trailing punctuation from URLs. Exported for use by subpackage parsers.

func IsHTTPURL added in v0.1.3

func IsHTTPURL(url string) bool

IsHTTPURL checks if a URL is an HTTP or HTTPS URL. Non-HTTP URLs (mailto, tel, file, anchors, etc.) are excluded from link checking. Exported for use by subpackage parsers.

func OffsetToLineCol added in v0.1.3

func OffsetToLineCol(lines []int, offset int) (lineNum, colNum int)

OffsetToLineCol converts a byte offset to 1-indexed line and column numbers using binary search for O(log n) performance. The lines parameter should be created by BuildLineIndex. Exported for use by subpackage parsers.

func RegisterParser added in v0.1.3

func RegisterParser(p FileParser)

RegisterParser registers a parser with the default registry.

func SupportedFileTypes added in v0.1.3

func SupportedFileTypes() []string

SupportedFileTypes returns all supported file types from the default registry.

Types

type FileParser added in v0.1.3

type FileParser interface {
	// Extensions returns the file extensions this parser handles (e.g., [".json"]).
	// Extensions should include the leading dot.
	Extensions() []string

	// ValidateAndParse validates the content and extracts links in a single pass.
	// Returns an error if the content is malformed.
	ValidateAndParse(filename string, content []byte) ([]Link, error)
}

FileParser defines the interface for file type parsers. Each parser implementation handles a specific file format (markdown, JSON, YAML, etc.).

func GetParser added in v0.1.3

func GetParser(ext string) (FileParser, bool)

GetParser returns a parser from the default registry for the given extension.

func GetParserForFile added in v0.1.3

func GetParserForFile(filename string) (FileParser, bool)

GetParserForFile returns a parser from the default registry for the given filename.

type Link struct {
	URL      string // The actual URL
	FilePath string // Which file it was found in
	Text     string // Link text or alt text for images

	// For reference links.
	RefName string   // Reference name (e.g., "myref" in [text][myref])
	Line    int      // Line number (1-indexed)
	Column  int      // Column position (1-indexed)
	Type    LinkType // Type of link

	RefDefLine int // Line where [ref]: url is defined (0 if not reference)
}

Link represents a URL found in a file.

func ExtractLinksFromMultipleFilesWithRegistry added in v0.1.3

func ExtractLinksFromMultipleFilesWithRegistry(filePaths []string, strict bool) ([]Link, error)

ExtractLinksFromMultipleFilesWithRegistry processes multiple files concurrently using the appropriate parser for each file type from the registry. If strict is true, validation errors will cause the function to return an error. Files with unsupported extensions are silently skipped.

func ExtractLinksWithRegistry added in v0.1.3

func ExtractLinksWithRegistry(filePath string, strict bool) ([]Link, error)

ExtractLinksWithRegistry reads a file and returns all HTTP/HTTPS links found, using the appropriate parser from the registry based on file extension. If strict is true, validation errors will cause the function to return an error.

type LinkType

type LinkType int

LinkType represents the type of link found in a file.

const (
	// LinkTypeInline represents a standard markdown link: [text](url).
	LinkTypeInline LinkType = iota
	// LinkTypeReference represents a reference-style link: [text][ref] with [ref]: url.
	LinkTypeReference
	// LinkTypeImage represents an image: ![alt](url).
	LinkTypeImage
	// LinkTypeAutolink represents a bare URL that's auto-linked.
	LinkTypeAutolink
	// LinkTypeHTML represents a link in HTML: <a href="url">.
	LinkTypeHTML
)

func (LinkType) String

func (t LinkType) String() string

String returns the string representation of a LinkType.

type ParseError added in v0.1.3

type ParseError struct {
	FilePath string
	Err      error
}

ParseError represents an error that occurred during file parsing. It includes the file path and the underlying error.

func (*ParseError) Error added in v0.1.3

func (e *ParseError) Error() string

func (*ParseError) Unwrap added in v0.1.3

func (e *ParseError) Unwrap() error

type Registry added in v0.1.3

type Registry struct {
	// contains filtered or unexported fields
}

Registry manages file parsers by extension. It provides thread-safe registration and lookup of parsers.

func DefaultRegistry added in v0.1.3

func DefaultRegistry() *Registry

DefaultRegistry returns the global parser registry.

func NewRegistry added in v0.1.3

func NewRegistry() *Registry

NewRegistry creates a new empty parser registry.

func (*Registry) ExtensionsForTypes added in v0.1.3

func (r *Registry) ExtensionsForTypes(types []string) ([]string, error)

ExtensionsForTypes returns the file extensions for the given type names. Type names are without the leading dot (e.g., "md", "json"). Returns an error if any type name is not supported.

func (*Registry) Get added in v0.1.3

func (r *Registry) Get(ext string) (FileParser, bool)

Get returns the parser for the given file extension. Returns nil, false if no parser is registered for the extension.

func (*Registry) GetForFile added in v0.1.3

func (r *Registry) GetForFile(filename string) (FileParser, bool)

GetForFile returns the parser for the given filename based on its extension. Returns nil, false if no parser is registered for the file's extension.

func (*Registry) HasParser added in v0.1.3

func (r *Registry) HasParser(ext string) bool

HasParser returns true if a parser is registered for the given extension.

func (*Registry) Register added in v0.1.3

func (r *Registry) Register(p FileParser)

Register adds a parser to the registry for all its supported extensions. If an extension is already registered, it will be overwritten.

func (*Registry) SupportedExtensions added in v0.1.3

func (r *Registry) SupportedExtensions() []string

SupportedExtensions returns all registered file extensions.

func (*Registry) SupportedTypes added in v0.1.3

func (r *Registry) SupportedTypes() []string

SupportedTypes returns a sorted list of registered file type names. For display purposes in CLI help text.

Directories

Path Synopsis
Package json implements a URL extractor for JSON files.
Package json implements a URL extractor for JSON files.
Package markdown implements a URL extractor for Markdown files.
Package markdown implements a URL extractor for Markdown files.
Package toml implements a URL extractor for TOML files.
Package toml implements a URL extractor for TOML files.
Package xml implements a URL extractor for XML files.
Package xml implements a URL extractor for XML files.
Package yaml implements a URL extractor for YAML files.
Package yaml implements a URL extractor for YAML files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL