schema

package
v0.15.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: MIT Imports: 19 Imported by: 0

Documentation

Overview

Package schema models the document-structure schemas that drive MDS020 (required-structure). A schema describes what a Markdown document's front matter, filename, and heading tree must look like.

Two sources feed the same in-memory representation:

  • Inline. A YAML schema block under kinds.<name>.schema: in .mdsmith.yml.
  • File. A proto.md file referenced by rules.required-structure.schema: (the legacy heading-template form).

Both parse into a Schema whose Sections is a recursive tree of Scope nodes. The validator walks a document's AST against that tree, emitting diagnostics through the lint.Diagnostic shape.

See plan/146_inline-schema-in-kinds.md for the design context.

Index

Constants

View Source
const (
	IndexIncludeStepMap      = "step-map"
	IndexIncludeCrossRefs    = "cross-ref-graph"
	IndexIncludeWordCounts   = "word-counts"
	IndexIncludeHeadingsFlat = "headings"
)

Valid include keys for IndexSpec.Include.

View Source
const SectionWildcard = "..."

SectionWildcard is the literal text the file-based parser recognises in a proto.md heading row (`## ...`) as a positional slot — the on-disk surface for what the inline grammar spells `heading: {unlisted: true}`. The inline parser rejects this string when it appears as `heading:` or `aliases:` text; authors must use the mapping form. The constant lives here so the two parsers agree on the same on-disk token.

Variables

This section is empty.

Functions

func BuildIndex

func BuildIndex(f *lint.File, sch *Schema) ([]byte, error)

BuildIndex computes the JSON index document the IndexSpec asks for and returns its serialised bytes. The returned bytes are pretty-printed with two-space indentation so the file is reviewable when diffed. Errors from sub-builders (currently only `cross-ref-graph` can fail, on a bad regex) propagate so ValidateIndex / Fix surface them as diagnostics instead of silently shipping a partial index.

func EmbeddedTypesCUE

func EmbeddedTypesCUE() string

EmbeddedTypesCUE returns the source of the shortcut library shipped at `cue/types/types.cue`. The runtime registry below is the actual lookup table; this accessor exposes the documented source for tooling (e.g. an `mdsmith help schema-types` command) and for the drift test that pins registry entries to the file contents.

func LookupShortcut

func LookupShortcut(name string) (string, bool)

LookupShortcut returns the canonical CUE expression registered under name, or (` `, false) when name is not in the registry. Callers should use resolveBareName for end-to-end shortcut handling; this helper exists for tests and for tools that want to introspect a single entry.

func MatchesHeading

func MatchesHeading(sc Scope, dh DocHeading) bool

MatchesHeading reports whether sc matches the heading text in dh. Exported so callers outside the validator (notably the per-scope rule walker in internal/rules/requiredstructure) reuse the same matching semantics — anchored regex for field-interpolated patterns, exact text otherwise, plus aliases and the "?" wildcard.

func ShortcutNames

func ShortcutNames() []string

ShortcutNames returns the sorted list of registered shortcut names. Used by error messages, the docs page, and the drift test.

func Validate

func Validate(
	f *lint.File, sch *Schema, docFM map[string]any, fmIsCUE bool,
	mkDiag MakeDiag,
) []lint.Diagnostic

Validate walks the document AST against sch, emitting diagnostics for missing/extra/out-of-order sections, level mismatches, frontmatter that fails the schema's CUE constraints, and filename patterns. mkDiag builds the diagnostic with the caller's rule ID.

docFM is the document's parsed front matter (nil when absent). When fmIsCUE is true, the front-matter values are themselves CUE expressions (the `cue-frontmatter` placeholder); the CUE check is skipped because the values are not concrete data.

func ValidateAcronyms

func ValidateAcronyms(
	f *lint.File, sch *Schema, mkDiag MakeDiag,
) []lint.Diagnostic

ValidateAcronyms flags all-caps tokens (length 2-6) that appear for the first time inside a configured scope without a parenthesised expansion. KnownSafe tokens are exempt; a missing scope list applies the check document-wide.

First-use state is per-scope. An acronym defined inside "Check" must be re-introduced when it first appears inside "Expected" — the rule treats the two scope passes independently so each named scope reads as self-contained.

Heading lines are excluded from the scan in both modes: prose rules apply to body text, and a section heading like "## OIDC configuration" should not be flagged as the first use of OIDC when the body that follows immediately spells out the expansion.

func ValidateCrossReferences

func ValidateCrossReferences(
	f *lint.File, sch *Schema, mkDiag MakeDiag,
) []lint.Diagnostic

ValidateCrossReferences walks the document's inline text nodes and, for each cross-reference pattern, checks that every match resolves to an existing heading slug. Unresolved references produce a diagnostic at the source position of the match. Lines whose raw text matches SkipLinesMatching are exempt.

func ValidateFrontmatter

func ValidateFrontmatter(sch *Schema, fm map[string]any) error

ValidateFrontmatter compiles sch.Frontmatter into a CUE schema and unifies it with fm (the document's parsed front matter).

func ValidateFrontmatterSyntax

func ValidateFrontmatterSyntax(sch *Schema) error

ValidateFrontmatterSyntax checks that the schema's frontmatter constraints compile as CUE. Returns nil if there are no constraints.

func ValidateIndex

func ValidateIndex(f *lint.File, sch *Schema, mkDiag MakeDiag) []lint.Diagnostic

ValidateIndex compares the on-disk index file (if any) against the bytes BuildIndex would emit. When the file is missing or its content differs, a diagnostic asks the user to run `mdsmith fix` so the artefact stays in sync. Comparison normalises CRLF line endings to LF so a Windows checkout with `core.autocrlf=true` does not flag a semantically-identical file as stale. Read errors other than "file does not exist" surface as a distinct diagnostic. If the last Fix tried to write this index and failed, the cached I/O error is reported in place of the generic "missing / out of date" message so users can act on the real cause instead of running fix again. `mdsmith check` still respects the read-only contract: it never touches the file.

func WriteIndex

func WriteIndex(f *lint.File, sch *Schema) error

WriteIndex writes the JSON index produced by BuildIndex next to the source file. Output paths are resolved relative to the source file's directory; absolute paths (including Windows drive-letter and leading-backslash forms), parent-traversal segments, and symlinks that escape the allowed root are rejected so a schema cannot trick fix into writing outside the project. Parent directories are created on demand so a nested `output:` path (e.g. `.mdsmith/index/runbook.json`) works on a clean checkout.

The allowed root is f.RootDir when set (the project root), and the source file's directory otherwise. After mkdir we EvalSymlinks the parent directory and verify it still resolves inside that root, so a `sub` directory that turns out to be a symlink to `/etc` is caught before any bytes are written.

The target file itself is also Lstat-checked: if an existing symlink sits at the index path (an in-root symlink that points outside the project — e.g. `.runbook-index.json` → `/etc/passwd`), os.WriteFile would follow it and clobber the link target. We reject the write instead. The write goes through a sibling temp file + os.Rename so the directory entry is replaced atomically and never as a symlink-follow operation.

On error WriteIndex records the failure in the package-level indexWriteErr cache keyed by f.Path so the next Check surfaces the underlying I/O error instead of repeating the generic "missing / out of date" message — otherwise a misconfiguration (e.g. `output: "."` resolving to a directory) would trap users in a fix loop with no signal about what is actually wrong. A successful write clears the entry.

Types

type AcronymRule

type AcronymRule struct {
	KnownSafe []string
	Scope     []string
}

AcronymRule configures first-use acronym detection. KnownSafe is the allowlist of tokens that may appear without expansion. Scope, if non-empty, restricts the check to text inside sections whose heading text matches one of the listed names; empty Scope applies the check document-wide.

type CrossRef

type CrossRef struct {
	Pattern           string
	MustMatch         string
	SkipLinesMatching string
}

CrossRef declares one text-pattern → slug-template binding. The validator searches text nodes for Pattern; for each match it fills the captured groups (numeric `{n}` or named `{slug}`) into MustMatch, slugifies the result, and looks it up in the document's heading slug set. Lines whose raw text matches SkipLinesMatching are exempt — typically blockquoted stale text.

type DocHeading

type DocHeading struct {
	Level int
	Text  string
	Line  int
}

DocHeading is a heading collected from the document under validation.

func ExtractDocHeadings

func ExtractDocHeadings(f *lint.File) []DocHeading

ExtractDocHeadings walks the document AST and collects every heading in source order, with its source line.

type FileHeading

type FileHeading struct {
	Level int
	Text  string
}

FileHeading is a heading collected from a schema markdown file.

type FileReader

type FileReader struct {
	RootFS   fs.FS
	RootDir  string
	MaxBytes int64
}

FileReader resolves a schema path or include reference to the underlying bytes. RootFS, when set, is used to constrain reads to the project root; otherwise reads go through os.ReadFile.

type IndexHeading

type IndexHeading struct {
	Level int    `json:"level"`
	Text  string `json:"text"`
	Slug  string `json:"slug"`
	Line  int    `json:"line"`
}

IndexHeading is one entry in the flat heading list emitted by the "headings" include.

type IndexSpec

type IndexSpec struct {
	Output  string
	Include []string
}

IndexSpec configures the index side-output. Output is the path (relative to the source file's directory) where `mdsmith fix` writes the JSON document. Include selects which sub-objects are emitted; the set is closed so downstream tools can parse the file without referencing a schema.

type MakeDiag

type MakeDiag func(file string, line int, msg string) lint.Diagnostic

MakeDiag is the diagnostic constructor the validator uses. Callers supply it so the schema package stays free of rule-ID coupling.

type Require

type Require struct {
	// Filename is a glob the document basename must match. Empty
	// means no filename constraint.
	Filename string
}

Require captures the schema-level constraints that apply to the document as a whole.

type Schema

type Schema struct {
	// Frontmatter maps each front-matter key to a CUE expression that
	// constrains its value. The map preserves user keys verbatim,
	// including any trailing "?" optional-field marker.
	Frontmatter map[string]string

	// Require carries constraints that apply to the document as a
	// whole (filename pattern, etc.).
	Require Require

	// Sections holds the top-level section list at RootLevel; each
	// Scope may itself nest further sections one heading level
	// deeper. Inline schemas always start at H2 (RootLevel=2), so
	// the document H1 is owned by first-line-heading and any
	// title-bearing frontmatter field rather than represented here.
	// File-based schemas can root at H1 (e.g. a `# ?` wildcard in
	// proto.md, RootLevel=1) — that H1 scope is part of Sections
	// and its children appear at level 2.
	Sections []Scope

	// Closed reports whether the root scope is strict: when true,
	// unlisted top-level headings produce a diagnostic; when false,
	// they are tolerated between listed sections. File-based schemas
	// default to Closed=true to preserve the historical
	// heading-template semantics; inline schemas default to false
	// per plan 146.
	Closed bool

	// Source is a human-readable label (file path for file-based
	// schemas, kind name for inline schemas) used in diagnostics
	// referring to the schema itself.
	Source string

	// RootLevel is the heading level of entries in Sections.
	// Inline schemas use 2 (H1 belongs to the title). File-based
	// schemas adopt whatever level the topmost heading in the
	// proto.md uses — usually 1 for a `# ?` wildcard, 2 when the
	// file declares only `## ...` rows.
	RootLevel int

	// CrossReferences lists patterns whose matches in the document
	// body must resolve to a heading slug. See plan 143.
	CrossReferences []CrossRef

	// Acronyms, if non-nil, asks the validator to flag first-use
	// acronyms (length 2-6, all caps) that lack a parenthesised
	// expansion. See plan 143.
	Acronyms *AcronymRule

	// Index, if non-nil, asks `mdsmith fix` to emit a JSON
	// side-output describing the document. `mdsmith check` reports
	// staleness (missing or outdated file) as a diagnostic so the
	// fixer is triggered, but never writes the file itself. See
	// plan 143.
	Index *IndexSpec
}

Schema is the parsed representation of a single document schema. It is produced by the inline YAML parser or the proto.md file parser; both feed the same struct.

func ParseFile

func ParseFile(r *FileReader, path string) (*Schema, error)

ParseFile loads the proto.md schema at path through r and returns the parsed Schema. It expands <?include?> directives, extracts <?require?> filename constraints, and normalises the flat heading-template into a Scope tree (each H2 becomes a top-level Scope, each H3 nests beneath the previous H2, and so on).

File-based schemas default to Closed=true at the root so the historical heading-template behaviour (extras flagged) survives the migration. Inline schemas opt back to open scopes per plan 146.

func ParseInline

func ParseInline(raw map[string]any, source string) (*Schema, error)

ParseInline builds a Schema from the YAML-decoded inline form found under kinds.<name>.schema: in .mdsmith.yml. The input is the raw map[string]any produced by goyaml so callers do not have to share a dependency on a specific YAML schema struct.

source is a label used in diagnostics that point back at the schema (typically "kind <name>").

Inline schemas default to open scopes (Closed: false) per plan 146. The validator's open-scope semantics still enforce required sections and listed-section ordering; only unlisted headings are tolerated.

func (*Schema) EffectiveRootLevel

func (s *Schema) EffectiveRootLevel() int

EffectiveRootLevel returns the heading level of the root scope list, falling back to 2 when unset.

func (*Schema) FrontmatterCUE

func (s *Schema) FrontmatterCUE() string

FrontmatterCUE returns a CUE struct literal that constrains the document front matter to the schema. The result is suitable for compiling with cuelang and unifying against a JSON-encoded document front matter. Keys with a trailing "?" are emitted as optional CUE fields with the marker stripped from the label.

func (*Schema) IsEmpty

func (s *Schema) IsEmpty() bool

IsEmpty reports whether s carries no constraints. Used by callers (notably MDS020) to short-circuit when a kind declares no schema.

type Scope

type Scope struct {
	// Heading is the heading text to match. No "#" markers; the
	// level comes from depth in the tree. The single-character
	// "?" matches any text. Headings (and aliases) containing
	// `{field}` interpolation are matched as anchored regex
	// patterns, with each placeholder consuming one or more
	// characters of the doc heading text. Empty when Wildcard is
	// true.
	Heading string

	// Required reports whether a matching heading must appear in
	// the document. Literal scopes default to true. Slot scopes
	// (`heading: {unlisted: true}`) always parse to Required=false
	// because the parser rejects an explicit `required:` key on
	// them. Preamble scopes (`heading: null`) default to false but
	// accept an explicit `required:` value; the inline validator
	// does not yet act on it (a future plan that adds preamble-
	// content checks will).
	Required bool

	// Aliases lists alternate heading texts that match this scope.
	// An empty list means only Heading matches.
	Aliases []string

	// Sections is the recursive list of nested sections (one level
	// deeper in the document tree).
	Sections []Scope

	// Repeats reports whether Heading is a pattern (with placeholder
	// tokens) that may match zero or more sections.
	Repeats bool

	// Sequential, on a repeating scope, asserts no gaps and no
	// duplicates in the {n} placeholder values.
	Sequential bool

	// Min and Max bound the match count of a repeating scope. Zero
	// means unbounded.
	Min int
	Max int

	// Closed reports whether this scope is strict: when true,
	// unlisted child headings produce a diagnostic; when false, they
	// are tolerated between listed sub-sections.
	Closed bool

	// Wildcard reports whether this scope is a slot that matches
	// zero or more sections the schema did not list by name. Authors
	// write it inline as `heading: {unlisted: true}` (or as a `## ...`
	// row in a file-based proto.md). Out-of-order detection still
	// claims a heading whose text matches a later listed scope, so
	// the slot only absorbs truly-unlisted sections.
	Wildcard bool

	// Preamble reports whether this scope describes the implicit
	// section before any heading — the document's lead-in content.
	// Authors write it inline as `heading: null`. A preamble scope
	// has no heading text to match; its range is [parent-start,
	// first-child-heading). Plan 146 limits the preamble to
	// carrying `rules:` overrides for that range; `content:` (plan
	// 149) extends it to AST-node constraints.
	Preamble bool

	// Rules carries per-scope rule-config overrides. Each entry maps
	// a rule name to a settings map. The MDS020 walker re-runs each
	// named rule with these settings against the document and
	// filters diagnostics to the scope's heading range.
	//
	// Today the override stacks on the rule's defaults, not the
	// file's full effective config (defaults → kinds → file globs →
	// scope). Threading the full merge through the engine is a
	// tracked follow-up on plan 146; see docs/guides/schemas.md.
	Rules map[string]map[string]any
}

Scope binds an AST subtree (a section) to a set of constraints and per-rule config overrides. The root scope's children are the top-level (H2) section list; their children are H3, and so on. Levels come from depth in the tree.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL