parsers

package
v0.35.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 11, 2026 License: MIT Imports: 19 Imported by: 0

Documentation

Overview

Package parsers provides a registry for language-specific parser plugins. Parsers extract metadata and chunk code for various programming languages.

Package parsers provides language-specific parsing plugins for code analysis.

Index

Constants

This section is empty.

Variables

View Source
var ErrPluginNotFound = errors.New("language plugin not found")

ErrPluginNotFound is returned when a plugin is not found

Functions

This section is empty.

Types

type ParserRegistry

type ParserRegistry interface {
	// RegisterParser adds a new parser plugin to the registry.
	RegisterParser(plugin schema.ParserPlugin) error
	// GetParser returns the parser for the given language name.
	GetParser(language string) (schema.ParserPlugin, error)
	// GetParserForFile returns the appropriate parser for the given file.
	GetParserForFile(path string, info fs.FileInfo) (schema.ParserPlugin, error)
	// GetParserForExtension returns the parser for the given file extension.
	GetParserForExtension(ext string) (schema.ParserPlugin, error)
	// GetAllParsers returns all registered parsers.
	GetAllParsers() []schema.ParserPlugin
}

ParserRegistry tracks registered language plugins and provides methods to look up parsers by language, file path, or extension.

func NewRegistry

func NewRegistry(logger *slog.Logger) ParserRegistry

NewRegistry creates a new language plugin registry

func RegisterLanguagePlugins

func RegisterLanguagePlugins(logger *slog.Logger) (ParserRegistry, error)

RegisterLanguagePlugins initializes and populates a language registry with all built-in parser plugins (Go, TypeScript, Markdown, JSON, YAML, etc.).

type RSSFeedMetadata added in v0.15.0

type RSSFeedMetadata struct {
	Title       string `json:"title"`       // Feed title
	Link        string `json:"link"`        // Feed website URL
	Language    string `json:"language"`    // Feed language
	Description string `json:"description"` // Feed description
}

RSSFeedMetadata represents metadata extracted from an RSS feed channel.

type RSSItemMetadata added in v0.15.0

type RSSItemMetadata struct {
	Title       string    `json:"title"`       // Item title
	Link        string    `json:"link"`        // Item URL
	PubDate     time.Time `json:"pub_date"`    // Publication date
	Author      string    `json:"author"`      // Author name
	Categories  []string  `json:"categories"`  // Categories/tags
	GUID        string    `json:"guid"`        // Unique identifier
	Description string    `json:"description"` // Short description
	Content     string    `json:"content"`     // Full content
}

RSSItemMetadata represents metadata extracted from an RSS feed item.

type RSSParser added in v0.15.0

type RSSParser struct{}

RSSParser implements the ParserPlugin interface for RSS/Atom feed content. It handles RSS 2.0, Atom 1.0, and JSON feeds, treating each feed item as a document.

func NewRSSParser added in v0.15.0

func NewRSSParser() *RSSParser

NewRSSParser creates a new RSS parser instance.

func (*RSSParser) CanHandle added in v0.15.0

func (p *RSSParser) CanHandle(path string, info fs.FileInfo) bool

CanHandle determines if this parser can handle the given file.

func (*RSSParser) Chunk added in v0.15.0

func (p *RSSParser) Chunk(content string, path string, opts *schema.CodeChunkingOptions) ([]schema.CodeChunk, error)

Chunk divides RSS content into processable chunks. For RSS feeds, the entire item content is treated as a single chunk.

func (*RSSParser) Extensions added in v0.15.0

func (p *RSSParser) Extensions() []string

Extensions returns the file extensions this parser handles.

func (*RSSParser) ExtractMetadata added in v0.15.0

func (p *RSSParser) ExtractMetadata(content string, path string) (schema.FileMetadata, error)

ExtractMetadata extracts metadata from RSS feed content. Returns basic file metadata; actual RSS metadata extraction happens in the loader.

func (*RSSParser) ExtractUsedSymbols added in v0.15.0

func (p *RSSParser) ExtractUsedSymbols(content string) []string

ExtractUsedSymbols returns nil as RSS feeds don't have symbol references.

func (*RSSParser) IsGenerated added in v0.15.0

func (p *RSSParser) IsGenerated(content string, path string) bool

IsGenerated returns false as RSS feeds are typically not auto-generated code.

func (*RSSParser) Name added in v0.15.0

func (p *RSSParser) Name() string

Name returns the parser name identifier.

Directories

Path Synopsis
Package html provides an HTML parser plugin for transforming HTML content into clean Markdown suitable for LLM consumption and RAG applications.
Package html provides an HTML parser plugin for transforming HTML content into clean Markdown suitable for LLM consumption and RAG applications.
core.go - Main plugin file with goldmark integration
core.go - Main plugin file with goldmark integration
extractor - Fixed version
extractor - Fixed version

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL