generator

package
v0.0.0-...-15f27ee Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package generator provides functionality for generating llms.txt files from website content using Firecrawl and OpenAI APIs.

The package supports:

  • Website mapping and URL discovery via Firecrawl
  • Content scraping with configurable parameters
  • AI-powered title and description generation via OpenAI
  • Concurrent processing with rate limiting
  • Generation of both summary (llms.txt) and full content (llms-full.txt) files

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ParseDomainFromURL

func ParseDomainFromURL(rawURL string) (string, error)

Types

type FirecrawlClient

type FirecrawlClient interface {
	MapWebsite(ctx context.Context, url string, limit int, options FirecrawlOptions) ([]string, error)
	ScrapeURL(ctx context.Context, url string, options FirecrawlOptions) (*ScrapedData, error)
}

func NewFirecrawlClient

func NewFirecrawlClient(apiKey string) (FirecrawlClient, error)

NewFirecrawlClient initializes a new *firecrawl.FirecrawlApp given an API key.

type FirecrawlOptions

type FirecrawlOptions struct {
	OnlyMainContent   bool
	Timeout           int
	Formats           []string
	IncludeSubdomains bool
	IgnoreSitemap     bool
}

type GenerationOptions

type GenerationOptions struct {
	Model            string
	MaxURLs          int
	OutputDir        string
	NoFullText       bool
	Verbose          bool
	BatchSize        int
	MaxWorkers       int
	BatchDelay       time.Duration
	Timeout          time.Duration
	MaxContentLength int
	FirecrawlOptions FirecrawlOptions
}

type GenerationResult

type GenerationResult struct {
	LLMsTxt        string `json:"llms_txt"`
	LLMsFullTxt    string `json:"llms_full_txt"`
	ProcessedCount int    `json:"processed_count"`
	TotalCount     int    `json:"total_count"`
}

type LLMsTxtGenerator

type LLMsTxtGenerator struct {
	// contains filtered or unexported fields
}

func NewLLMsTxtGenerator

func NewLLMsTxtGenerator(firecrawlClient FirecrawlClient, SummarizerClient gollm.SummarizerClient, options GenerationOptions) *LLMsTxtGenerator

NewLLMsTxtGenerator creates a new instance of LLMsTxtGenerator with the provided clients and options.

Parameters:

  • firecrawlClient: Client for website mapping and content scraping
  • openaiClient: Client for AI-powered content analysis and description generation
  • options: Configuration options for generation behavior, timeouts, and processing limits

Returns a configured generator ready to process websites and generate llms.txt files.

func (*LLMsTxtGenerator) GenerateLLMsTXT

func (g *LLMsTxtGenerator) GenerateLLMsTXT(ctx context.Context, targetURL string) (*GenerationResult, error)

GenerateLLMsTXT generates both llms.txt and llms-full.txt files from a target URL.

The process includes:

  1. Mapping the website to discover all available URLs
  2. Processing URLs in configurable batches with rate limiting
  3. Scraping content from each URL using Firecrawl
  4. Generating AI-powered titles and descriptions using OpenAI
  5. Building structured output files

Parameters:

  • ctx: Context for cancellation and timeout control
  • targetURL: The base URL of the website to process
  • logger: Structured logger for progress tracking and debugging

Returns GenerationResult containing the generated content and processing statistics, or an error if the generation process fails.

func (*LLMsTxtGenerator) SystemPrompt

func (g *LLMsTxtGenerator) SystemPrompt() string

func (*LLMsTxtGenerator) UserPrompt

func (g *LLMsTxtGenerator) UserPrompt(uri string) string

type MapResponse

type MapResponse struct {
	Success bool     `json:"success"`
	Links   []string `json:"links"`
}

type ProcessedURL

type ProcessedURL struct {
	URL         string `json:"url"`
	Title       string `json:"title"`
	Description string `json:"description"`
	Markdown    string `json:"markdown"`
	Index       int    `json:"index"`
}

type ScrapeResponse

type ScrapeResponse struct {
	Success bool `json:"success"`
	Data    struct {
		Markdown string            `json:"markdown"`
		Metadata map[string]string `json:"metadata"`
	} `json:"data"`
}

type ScrapedData

type ScrapedData struct {
	URL      string            `json:"url"`
	Markdown string            `json:"markdown"`
	Metadata map[string]string `json:"metadata"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL