Documentation
¶
Overview ¶
Package parser provides file parsers for extracting URLs from various file formats. It defines shared types, helpers, and multi-file processing functions used by all parsers.
Package parser provides file parsers for extracting URLs from various file formats. This file implements the parser registry which manages different file type parsers.
Index ¶
- Variables
- func BuildLineIndex(content []byte) []int
- func CleanURLTrailing(url string) string
- func IsHTTPURL(url string) bool
- func OffsetToLineCol(lines []int, offset int) (lineNum, colNum int)
- func RegisterParser(p FileParser)
- func SupportedFileTypes() []string
- type FileParser
- type Link
- type LinkType
- type ParseError
- type Registry
- func (r *Registry) ExtensionsForTypes(types []string) ([]string, error)
- func (r *Registry) Get(ext string) (FileParser, bool)
- func (r *Registry) GetForFile(filename string) (FileParser, bool)
- func (r *Registry) HasParser(ext string) bool
- func (r *Registry) Register(p FileParser)
- func (r *Registry) SupportedExtensions() []string
- func (r *Registry) SupportedTypes() []string
Constants ¶
This section is empty.
Variables ¶
var URLRegex = regexp.MustCompile(`https?://[^\s"'\]\}>,]+`)
URLRegex matches HTTP/HTTPS URLs. Exported for use by subpackage parsers.
Functions ¶
func BuildLineIndex ¶ added in v0.1.3
BuildLineIndex creates an index of byte offsets for the start of each line. This index enables O(log n) line/column lookups from byte offsets, which is more efficient than scanning from the start for each lookup. Exported for use by subpackage parsers.
func CleanURLTrailing ¶ added in v0.1.3
CleanURLTrailing removes trailing punctuation from URLs. Exported for use by subpackage parsers.
func IsHTTPURL ¶ added in v0.1.3
IsHTTPURL checks if a URL is an HTTP or HTTPS URL. Non-HTTP URLs (mailto, tel, file, anchors, etc.) are excluded from link checking. Exported for use by subpackage parsers.
func OffsetToLineCol ¶ added in v0.1.3
OffsetToLineCol converts a byte offset to 1-indexed line and column numbers using binary search for O(log n) performance. The lines parameter should be created by BuildLineIndex. Exported for use by subpackage parsers.
func RegisterParser ¶ added in v0.1.3
func RegisterParser(p FileParser)
RegisterParser registers a parser with the default registry.
func SupportedFileTypes ¶ added in v0.1.3
func SupportedFileTypes() []string
SupportedFileTypes returns all supported file types from the default registry.
Types ¶
type FileParser ¶ added in v0.1.3
type FileParser interface {
// Extensions returns the file extensions this parser handles (e.g., [".json"]).
// Extensions should include the leading dot.
Extensions() []string
// ValidateAndParse validates the content and extracts links in a single pass.
// Returns an error if the content is malformed.
ValidateAndParse(filename string, content []byte) ([]Link, error)
}
FileParser defines the interface for file type parsers. Each parser implementation handles a specific file format (markdown, JSON, YAML, etc.).
func GetParser ¶ added in v0.1.3
func GetParser(ext string) (FileParser, bool)
GetParser returns a parser from the default registry for the given extension.
func GetParserForFile ¶ added in v0.1.3
func GetParserForFile(filename string) (FileParser, bool)
GetParserForFile returns a parser from the default registry for the given filename.
type Link ¶
type Link struct {
URL string // The actual URL
FilePath string // Which file it was found in
Text string // Link text or alt text for images
// For reference links.
RefName string // Reference name (e.g., "myref" in [text][myref])
Line int // Line number (1-indexed)
Column int // Column position (1-indexed)
Type LinkType // Type of link
RefDefLine int // Line where [ref]: url is defined (0 if not reference)
}
Link represents a URL found in a file.
func ExtractLinksFromMultipleFilesWithRegistry ¶ added in v0.1.3
ExtractLinksFromMultipleFilesWithRegistry processes multiple files concurrently using the appropriate parser for each file type from the registry. If strict is true, validation errors will cause the function to return an error. Files with unsupported extensions are silently skipped.
func ExtractLinksWithRegistry ¶ added in v0.1.3
ExtractLinksWithRegistry reads a file and returns all HTTP/HTTPS links found, using the appropriate parser from the registry based on file extension. If strict is true, validation errors will cause the function to return an error.
type LinkType ¶
type LinkType int
LinkType represents the type of link found in a file.
const ( // LinkTypeInline represents a standard markdown link: [text](url). LinkTypeInline LinkType = iota // LinkTypeReference represents a reference-style link: [text][ref] with [ref]: url. LinkTypeReference // LinkTypeImage represents an image: . LinkTypeImage // LinkTypeAutolink represents a bare URL that's auto-linked. LinkTypeAutolink // LinkTypeHTML represents a link in HTML: <a href="url">. LinkTypeHTML )
type ParseError ¶ added in v0.1.3
ParseError represents an error that occurred during file parsing. It includes the file path and the underlying error.
func (*ParseError) Error ¶ added in v0.1.3
func (e *ParseError) Error() string
func (*ParseError) Unwrap ¶ added in v0.1.3
func (e *ParseError) Unwrap() error
type Registry ¶ added in v0.1.3
type Registry struct {
// contains filtered or unexported fields
}
Registry manages file parsers by extension. It provides thread-safe registration and lookup of parsers.
func DefaultRegistry ¶ added in v0.1.3
func DefaultRegistry() *Registry
DefaultRegistry returns the global parser registry.
func NewRegistry ¶ added in v0.1.3
func NewRegistry() *Registry
NewRegistry creates a new empty parser registry.
func (*Registry) ExtensionsForTypes ¶ added in v0.1.3
ExtensionsForTypes returns the file extensions for the given type names. Type names are without the leading dot (e.g., "md", "json"). Returns an error if any type name is not supported.
func (*Registry) Get ¶ added in v0.1.3
func (r *Registry) Get(ext string) (FileParser, bool)
Get returns the parser for the given file extension. Returns nil, false if no parser is registered for the extension.
func (*Registry) GetForFile ¶ added in v0.1.3
func (r *Registry) GetForFile(filename string) (FileParser, bool)
GetForFile returns the parser for the given filename based on its extension. Returns nil, false if no parser is registered for the file's extension.
func (*Registry) HasParser ¶ added in v0.1.3
HasParser returns true if a parser is registered for the given extension.
func (*Registry) Register ¶ added in v0.1.3
func (r *Registry) Register(p FileParser)
Register adds a parser to the registry for all its supported extensions. If an extension is already registered, it will be overwritten.
func (*Registry) SupportedExtensions ¶ added in v0.1.3
SupportedExtensions returns all registered file extensions.
func (*Registry) SupportedTypes ¶ added in v0.1.3
SupportedTypes returns a sorted list of registered file type names. For display purposes in CLI help text.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package json implements a URL extractor for JSON files.
|
Package json implements a URL extractor for JSON files. |
|
Package markdown implements a URL extractor for Markdown files.
|
Package markdown implements a URL extractor for Markdown files. |
|
Package toml implements a URL extractor for TOML files.
|
Package toml implements a URL extractor for TOML files. |
|
Package xml implements a URL extractor for XML files.
|
Package xml implements a URL extractor for XML files. |
|
Package yaml implements a URL extractor for YAML files.
|
Package yaml implements a URL extractor for YAML files. |