xml

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2026 License: MIT Imports: 7 Imported by: 0

README

📄 XML Parser - XML 解析器

状态: ✅ 已完成
日期: 2024-03-19
架构: SAX 流式解析


🌟 核心特性

1. SAX 流式解析
// ✅ 基于 encoding/xml.Decoder
// ✅ 逐 token 读取,内存效率 O(1)
// ✅ 支持 GB 级 XML 文件

parser := xml.NewParser()
chunks, err := parser.Parse(ctx, reader)
2. 智能内容清理
// ✅ 自动跳过注释(可配置)
// ✅ 忽略空白字符
// ✅ 保留文本内容

parser.SetPreserveComments(true)  // 保留注释

🚀 快速开始

package main

import (
    "context"
    "os"
    "github.com/DotNetAge/gorag/parser/xml"
)

func main() {
    parser := xml.NewParser()
    
    file, _ := os.Open("data.xml")
    defer file.Close()
    
    ctx := context.Background()
    chunks, _ := parser.Parse(ctx, file)
    
    for _, chunk := range chunks {
        println(chunk.Content)
    }
}

📊 测试结果

$ go test -v -cover ./...
=== RUN   TestParser_Parse
--- PASS: TestParser_Parse (0.00s)
=== RUN   TestParser_ParseWithCallback
--- PASS: TestParser_ParseWithCallback (0.00s)
=== RUN   TestParser_EmptyXML
--- PASS: TestParser_EmptyXML (0.00s)
=== RUN   TestParser_LargeXML
--- PASS: TestParser_LargeXML (0.00s)
PASS
coverage: 91.1% of statements

📍 位置:/Users/ray/workspaces/gorag/gorag/parser/xml/
✅ 状态:完成并可用
📅 完成日期:2024-03-19

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser implements an XML parser using SAX-style parsing

func NewParser

func NewParser() *Parser

NewParser creates a new XML parser

func (*Parser) Parse

func (p *Parser) Parse(ctx context.Context, r io.Reader) ([]core.Chunk, error)

Parse parses XML into chunks

func (*Parser) ParseWithCallback

func (p *Parser) ParseWithCallback(ctx context.Context, r io.Reader, callback func(core.Chunk) error) error

ParseWithCallback parses XML and calls the callback for each chunk

func (*Parser) SetChunkOverlap

func (p *Parser) SetChunkOverlap(overlap int)

SetChunkOverlap sets the chunk overlap

func (*Parser) SetChunkSize

func (p *Parser) SetChunkSize(size int)

SetChunkSize sets the chunk size

func (*Parser) SetPreserveComments

func (p *Parser) SetPreserveComments(preserve bool)

SetPreserveComments sets whether to preserve XML comments

func (*Parser) SupportedFormats

func (p *Parser) SupportedFormats() []string

SupportedFormats returns supported formats

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL