linkgraph

package

v0.23.0 Latest Latest Go to latest Published: May 21, 2026 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/jeduden/mdsmith

Links

Open Source Insights

Documentation ¶

Overview ¶

Package linkgraph extracts Markdown links and heading anchors so the link-validity rule (MDS027) and the `backlinks` subcommand share one implementation of the link walk, anchor slug rules, and target parsing.

Index ¶

func CollectAnchors(f *lint.File) map[string]bool
func DecodeAnchor(raw string) string
func ExpandCatalog(globs, files []string) []string
func NormalizeAnchor(raw string) string
func ResolveRelTarget(srcFile, linkPath string) string
type DirectiveEdge
- func ExtractDirectives(f *lint.File) []DirectiveEdge
- func (d DirectiveEdge) IsUnresolved() bool
type DirectiveKind
type Link
type RefLink
- func ExtractRefLinks(f *lint.File) []RefLink
type Target
- func ParseTarget(dest string) (Target, bool)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CollectAnchors ¶

func CollectAnchors(f *lint.File) map[string]bool

CollectAnchors returns the set of heading anchors defined in f, with GitHub-compatible disambiguation suffixes (-1, -2, …) when slugs would otherwise collide. Uniqueness is enforced against the running set of produced anchors so a sequence like "Intro" / "Intro" / "Intro-1" yields three distinct keys (`intro`, `intro-1`, `intro-1-1`) rather than two distinct ones with a collision. The set keys are the slugified anchor names; values are always true so callers can use map-lookup.

func DecodeAnchor ¶ added in v0.15.0

func DecodeAnchor(raw string) string

DecodeAnchor URL-decodes raw and returns the decoded form. On decode failure (e.g. a stray `%` not followed by hex) the input is returned unchanged.

Use NormalizeAnchor when comparing against CollectAnchors output — NormalizeAnchor combines DecodeAnchor with Slugify so callers see one normalised form. DecodeAnchor is exposed for code paths that store the decoded anchor as a distinct field from the slugified one (the LSP locator), where the slugify step happens later.

func ExpandCatalog ¶ added in v0.15.0

func ExpandCatalog(globs, files []string) []string

ExpandCatalog returns the subset of files that match any of the given glob patterns. Patterns prefixed with `!` are exclusion patterns — see globpath.MatchAny for the precise semantics.

The function does not walk the filesystem; the caller is responsible for supplying the candidate file list (typically the workspace-relative paths the discovery layer produced). Order in the returned slice matches the order in files.

func NormalizeAnchor ¶

func NormalizeAnchor(raw string) string

NormalizeAnchor URL-decodes raw and slugifies it so the result can be compared against CollectAnchors output.

func ResolveRelTarget ¶ added in v0.15.0

func ResolveRelTarget(srcFile, linkPath string) string

ResolveRelTarget joins srcFile's directory with linkPath and returns the workspace-relative result. Absolute paths and ones that escape the workspace root after normalization return the empty string — callers must treat "" as "no in-workspace target" rather than as a valid path.

The function is strict about its inputs:

srcFile must already be workspace-relative (no leading `/`, no drive letter, no UNC `\\` prefix). Callers that hold absolute paths must convert them first; otherwise a `../../etc/passwd`-style linkPath could escape via path.Join's absolute-path semantics.
linkPath has both `\` and `/` translated to `/` before joining so a Windows-authored `sub\x.md` resolves the same way on Linux. (filepath.ToSlash is OS-dependent and a no-op on POSIX hosts; this helper translates explicitly via strings.ReplaceAll.)
Absolute inputs are rejected up-front; path.Join of two relative paths never produces an absolute result, so the only escape vector is a leading `../` in the cleaned output (caught below).

Types ¶

type DirectiveEdge ¶ added in v0.15.0

type DirectiveEdge struct {
	Line  int
	Col   int
	Kind  DirectiveKind
	Path  string
	Globs []string
}

DirectiveEdge is one directive's parsed target.

Line and Col are body-relative (post front-matter strip) — same convention as Link.Line/Column. Callers needing file-relative coordinates must add f.LineOffset themselves.

For DirectiveInclude and DirectiveBuild, Path carries the raw directive value (file: for include, source: for build) verbatim from the directive body. Path is the un-resolved string — callers resolve it against the host file's directory using ResolveRelTarget.

For DirectiveCatalog, Globs carries the raw glob pattern list. Path is empty. The IsUnresolved method returns true for catalog edges so reverse-edge queries skip them generically — see the index layer for the corresponding Unresolved flag.

func ExtractDirectives ¶ added in v0.15.0

func ExtractDirectives(f *lint.File) []DirectiveEdge

ExtractDirectives walks f.AST top-level for processing-instruction nodes whose name is "include", "build", or "catalog", parses each one's YAML body, and returns one DirectiveEdge per directive that carries a usable target. Directives with malformed YAML or empty required parameters are skipped silently — the dedicated lint rules surface those as diagnostics; this extractor only contributes to the link graph.

Like ExtractLinks, ExtractDirectives is pure given its input: it does no file reads, no workspace traversal, and no global state mutation, so callers can invoke it concurrently across files.

func (DirectiveEdge) IsUnresolved ¶ added in v0.15.0

func (d DirectiveEdge) IsUnresolved() bool

IsUnresolved reports whether this directive points at glob patterns that need workspace-list expansion before they identify concrete files. True for DirectiveCatalog, false otherwise.

type DirectiveKind ¶ added in v0.15.0

type DirectiveKind int

DirectiveKind enumerates the directive shapes ExtractDirectives recognises.

const (
	// DirectiveInclude is a `<?include file: …?>` directive.
	DirectiveInclude DirectiveKind = iota
	// DirectiveBuild is a `<?build source: …?>` directive.
	DirectiveBuild
	// DirectiveCatalog is a `<?catalog glob: …?>` directive. Catalog
	// targets are glob patterns; concrete files are produced by
	// ExpandCatalog against a workspace file list.
	DirectiveCatalog
)

type Link ¶

type Link struct {
	Line   int
	Column int
	Text   string
	Target Target
}

Link is one parsed Markdown link occurrence in a source file.

Reference-style links (`[text][label]`) are intentionally omitted from ExtractLinks results because their destinations resolve through the link-reference map rather than a URL; the link-graph builder only sees direct destinations.

Line is body-relative — counted from the start of the parsed body, not the original file. Lint rules return body-relative diagnostics because the engine applies f.LineOffset for front-matter adjustment. CLI callers (like `mdsmith list backlinks`) that want file-relative line numbers must add f.LineOffset themselves.

func ExtractImages ¶ added in v0.21.0

func ExtractImages(f *lint.File) []Link

ExtractImages walks f.AST and returns every Markdown image in document order. Both inline (Reference == nil) and reference-style (Reference != nil) images are included when their destination can be parsed as a local target. Lines are body-relative — same convention as Link.

func ExtractLinks ¶

func ExtractLinks(f *lint.File) []Link

ExtractLinks walks f.AST and returns every regular Markdown link in document order. Lines are body-relative (post front-matter strip); see the Link doc for why.

func ExtractRefLinkTargets ¶ added in v0.21.0

func ExtractRefLinkTargets(f *lint.File) []Link

ExtractRefLinkTargets walks f.AST and returns every reference-style link whose definition has been resolved by the parser, as Link values with the resolved destination ready for the same file-existence resolver that ExtractLinks feeds. Images are not included — those come from ExtractImages. Lines are body-relative — same convention as Link.

type RefLink ¶ added in v0.15.0

type RefLink struct {
	Line   int
	Column int
	Text   string
	// Label is the link-reference label, normalised via
	// util.ToLinkReference (lower-cased, internal whitespace
	// collapsed). Use this when keying into the parser-context ref
	// table or matching against a `[label]: url` definition.
	Label string
}

RefLink is one reference-style link use (`[text][label]`, `[text][]`, or `[label]`).

ExtractLinks skips these because reference-style destinations resolve through the link reference map at render time rather than via a URL, so callers that need to map "what file does this link point at" handle them separately (e.g. via the link-ref definition table in parser.Context).

Line and Column are body-relative — same convention as Link.

func ExtractRefLinks ¶ added in v0.15.0

func ExtractRefLinks(f *lint.File) []RefLink

ExtractRefLinks walks f.AST and returns every reference-style link in document order. Inline links (`[text](url)`) are intentionally excluded — those come from ExtractLinks.

type Target ¶

type Target struct {
	Raw         string
	Path        string
	Anchor      string
	LocalAnchor bool
}

Target is the parsed shape of a link destination URL.

Raw is the original destination string as it appeared in the source. Path and Anchor are the decoded path and fragment components — both are populated from url.URL, which percent-decodes them on parse. LocalAnchor is true when the destination was an anchor-only reference (e.g. `#section`).

Anchor matching against CollectAnchors output must still go through NormalizeAnchor: that runs Slugify (and a defensive PathUnescape) to produce the same form CollectAnchors stores.

func ParseTarget ¶

func ParseTarget(dest string) (Target, bool)

ParseTarget parses a Markdown link destination into a Target. Returns ok=false when the destination is empty, has a scheme or host (treated as external), or has neither a path nor a fragment.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL