discovery

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 23, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ContentDiscoverer

type ContentDiscoverer interface {
	Discover(ctx context.Context, input string) (*Result, error)
	RegisterProvider(p Provider)
	Providers() []Provider
}

ContentDiscoverer probes URLs for LLM-targeted content. The default implementation tries well-known paths, companion files, sitemaps, and optional Context7 API. Alternative implementations could add platform detection, deeper crawling, or custom discovery strategies.

type ContentType

type ContentType string

ContentType classifies discovered content.

const (
	TypeLLMSTxt     ContentType = "llms-txt"
	TypeLLMSFull    ContentType = "llms-full-txt"
	TypeLLMSCtx     ContentType = "llms-ctx-txt"
	TypeLLMSCtxFull ContentType = "llms-ctx-full-txt"
	TypeAITxt       ContentType = "ai-txt"
	TypeCompanion   ContentType = "companion"
	TypeTDMRep      ContentType = "tdmrep"
	TypeWellKnown   ContentType = "well-known"
)
const (

	// TypeContext7 classifies content fetched from Context7.
	TypeContext7 ContentType = "context7"
)

type Context7Provider

type Context7Provider struct {
	// contains filtered or unexported fields
}

Context7Provider discovers LLM-optimized documentation via the Context7 API. It resolves library names to IDs, then fetches curated documentation snippets.

func NewContext7Provider

func NewContext7Provider(apiKey string) *Context7Provider

NewContext7Provider creates a Context7Provider. apiKey should start with "ctx7sk". If empty, the provider will still be registered but all requests will fail with an auth error from the API.

func (*Context7Provider) CanHandle

func (p *Context7Provider) CanHandle(input string) bool

CanHandle returns true for bare library names (not URLs). Inputs like "react", "stripe-node", "nextjs" match. URLs (http:// or https://) do not match.

func (*Context7Provider) Discover

func (p *Context7Provider) Discover(ctx context.Context, input string) (*Result, error)

Discover resolves the library name, fetches docs, and returns them with Body populated.

func (*Context7Provider) Name

func (p *Context7Provider) Name() string

type DiscoveredFile

type DiscoveredFile struct {
	URL         string      `json:"url"`
	Path        string      `json:"path"` // URL path (e.g., "/llms.txt")
	ContentType ContentType `json:"content_type"`
	Size        int         `json:"size"`
	FoundVia    string      `json:"found_via"` // How it was discovered
	Body        []byte      `json:"-"`         // If non-nil, content is already fetched (skip mirror fetch)
}

DiscoveredFile represents a single piece of discovered LLM content.

type Discoverer

type Discoverer struct {
	// contains filtered or unexported fields
}

Discoverer routes discovery requests to registered providers.

func New

func New(f fetcher.HTTPFetcher, robotsChecker *robots.Checker, maxProbes int) *Discoverer

New creates a Discoverer with a SiteProvider as the default.

func NewWithProviders

func NewWithProviders(providers ...Provider) *Discoverer

NewWithProviders creates a Discoverer with the given providers.

func (*Discoverer) Discover

func (d *Discoverer) Discover(ctx context.Context, input string) (*Result, error)

Discover routes to the first provider that can handle the input.

func (*Discoverer) Providers

func (d *Discoverer) Providers() []Provider

Providers returns the registered providers.

func (*Discoverer) RegisterProvider

func (d *Discoverer) RegisterProvider(p Provider)

RegisterProvider adds a provider to the discoverer.

type Platform

type Platform struct {
	Theme            string   `json:"theme"`             // mkdocs-material, docusaurus, sphinx, generic
	ContentSelectors []string `json:"content_selectors"` // CSS selectors for main content
	RemoveSelectors  []string `json:"remove_selectors"`  // CSS selectors for chrome to strip
}

Platform identifies a documentation site's framework/theme.

func DetectPlatform

func DetectPlatform(html string) Platform

DetectPlatform inspects HTML to identify the documentation platform and returns appropriate CSS selectors for content extraction. This improves HTML→markdown conversion quality by targeting the actual content area.

type Provider

type Provider interface {
	// Name returns a short identifier for this provider (e.g., "site", "context7").
	Name() string
	// CanHandle returns true if this provider can discover content for the given input.
	CanHandle(input string) bool
	// Discover probes the input and returns discovered content.
	// Files may include Body (content already fetched) or leave it nil (mirror will fetch).
	Discover(ctx context.Context, input string) (*Result, error)
}

Provider discovers LLM-friendly content from a source. Implementations handle different content origins — websites, doc aggregators, package registries, etc.

type Result

type Result struct {
	Domain       string           `json:"domain"`
	BaseURL      string           `json:"base_url"`
	Files        []DiscoveredFile `json:"files"`
	ProbedPaths  []string         `json:"probed_paths,omitempty"` // paths that were checked (shown when no files found)
	Platform     *Platform        `json:"platform,omitempty"`     // detected doc platform/theme
	DiscoveredAt time.Time        `json:"discovered_at"`
}

Result holds everything discovered for a site.

type SiteProvider

type SiteProvider struct {
	Fetcher   fetcher.HTTPFetcher
	Robots    *robots.Checker
	MaxProbes int
}

SiteProvider probes websites for LLM-targeted content at well-known paths.

func NewSiteProvider

func NewSiteProvider(f fetcher.HTTPFetcher, robotsChecker *robots.Checker, maxProbes int) *SiteProvider

NewSiteProvider creates a SiteProvider.

func (*SiteProvider) CanHandle

func (p *SiteProvider) CanHandle(input string) bool

func (*SiteProvider) Discover

func (p *SiteProvider) Discover(ctx context.Context, baseURL string) (*Result, error)

Discover probes a site for all LLM content.

func (*SiteProvider) Name

func (p *SiteProvider) Name() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL