Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ContentDiscoverer ¶
type ContentDiscoverer interface {
Discover(ctx context.Context, input string) (*Result, error)
RegisterProvider(p Provider)
Providers() []Provider
}
ContentDiscoverer probes URLs for LLM-targeted content. The default implementation tries well-known paths, companion files, sitemaps, and optional Context7 API. Alternative implementations could add platform detection, deeper crawling, or custom discovery strategies.
type ContentType ¶
type ContentType string
ContentType classifies discovered content.
const ( TypeLLMSTxt ContentType = "llms-txt" TypeLLMSFull ContentType = "llms-full-txt" TypeLLMSCtx ContentType = "llms-ctx-txt" TypeLLMSCtxFull ContentType = "llms-ctx-full-txt" TypeAITxt ContentType = "ai-txt" TypeCompanion ContentType = "companion" TypeTDMRep ContentType = "tdmrep" TypeWellKnown ContentType = "well-known" )
const ( // TypeContext7 classifies content fetched from Context7. TypeContext7 ContentType = "context7" )
type Context7Provider ¶
type Context7Provider struct {
// contains filtered or unexported fields
}
Context7Provider discovers LLM-optimized documentation via the Context7 API. It resolves library names to IDs, then fetches curated documentation snippets.
func NewContext7Provider ¶
func NewContext7Provider(apiKey string) *Context7Provider
NewContext7Provider creates a Context7Provider. apiKey should start with "ctx7sk". If empty, the provider will still be registered but all requests will fail with an auth error from the API.
func (*Context7Provider) CanHandle ¶
func (p *Context7Provider) CanHandle(input string) bool
CanHandle returns true for bare library names (not URLs). Inputs like "react", "stripe-node", "nextjs" match. URLs (http:// or https://) do not match.
func (*Context7Provider) Discover ¶
Discover resolves the library name, fetches docs, and returns them with Body populated.
func (*Context7Provider) Name ¶
func (p *Context7Provider) Name() string
type DiscoveredFile ¶
type DiscoveredFile struct {
URL string `json:"url"`
Path string `json:"path"` // URL path (e.g., "/llms.txt")
ContentType ContentType `json:"content_type"`
Size int `json:"size"`
FoundVia string `json:"found_via"` // How it was discovered
Body []byte `json:"-"` // If non-nil, content is already fetched (skip mirror fetch)
}
DiscoveredFile represents a single piece of discovered LLM content.
type Discoverer ¶
type Discoverer struct {
// contains filtered or unexported fields
}
Discoverer routes discovery requests to registered providers.
func New ¶
func New(f fetcher.HTTPFetcher, robotsChecker *robots.Checker, maxProbes int) *Discoverer
New creates a Discoverer with a SiteProvider as the default.
func NewWithProviders ¶
func NewWithProviders(providers ...Provider) *Discoverer
NewWithProviders creates a Discoverer with the given providers.
func (*Discoverer) Providers ¶
func (d *Discoverer) Providers() []Provider
Providers returns the registered providers.
func (*Discoverer) RegisterProvider ¶
func (d *Discoverer) RegisterProvider(p Provider)
RegisterProvider adds a provider to the discoverer.
type Platform ¶
type Platform struct {
Theme string `json:"theme"` // mkdocs-material, docusaurus, sphinx, generic
ContentSelectors []string `json:"content_selectors"` // CSS selectors for main content
RemoveSelectors []string `json:"remove_selectors"` // CSS selectors for chrome to strip
}
Platform identifies a documentation site's framework/theme.
func DetectPlatform ¶
DetectPlatform inspects HTML to identify the documentation platform and returns appropriate CSS selectors for content extraction. This improves HTML→markdown conversion quality by targeting the actual content area.
type Provider ¶
type Provider interface {
// Name returns a short identifier for this provider (e.g., "site", "context7").
Name() string
// CanHandle returns true if this provider can discover content for the given input.
CanHandle(input string) bool
// Discover probes the input and returns discovered content.
// Files may include Body (content already fetched) or leave it nil (mirror will fetch).
Discover(ctx context.Context, input string) (*Result, error)
}
Provider discovers LLM-friendly content from a source. Implementations handle different content origins — websites, doc aggregators, package registries, etc.
type Result ¶
type Result struct {
Domain string `json:"domain"`
BaseURL string `json:"base_url"`
Files []DiscoveredFile `json:"files"`
ProbedPaths []string `json:"probed_paths,omitempty"` // paths that were checked (shown when no files found)
Platform *Platform `json:"platform,omitempty"` // detected doc platform/theme
DiscoveredAt time.Time `json:"discovered_at"`
}
Result holds everything discovered for a site.
type SiteProvider ¶
type SiteProvider struct {
Fetcher fetcher.HTTPFetcher
Robots *robots.Checker
MaxProbes int
}
SiteProvider probes websites for LLM-targeted content at well-known paths.
func NewSiteProvider ¶
func NewSiteProvider(f fetcher.HTTPFetcher, robotsChecker *robots.Checker, maxProbes int) *SiteProvider
NewSiteProvider creates a SiteProvider.
func (*SiteProvider) CanHandle ¶
func (p *SiteProvider) CanHandle(input string) bool
func (*SiteProvider) Name ¶
func (p *SiteProvider) Name() string