xlsx

package module
v0.0.0-...-8fe9471 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2026 License: Apache-2.0 Imports: 7 Imported by: 1

README

XLSX Parser

The XLSX parser is Eino's document parsing component that implements the 'Parser' interface for parsing Excel (XLSX) files. The component supports flexible table parsing configurations, handles Excel files with or without headers, supports the selection of a specific worksheet, and customizes the document ID prefix.

Features

  • Support for Excel files with or without headers
  • Select one of the multiple sheets to process
  • Custom document id prefixes
  • Automatic conversion of table data to document format
  • Preservation of complete row data as metadata
  • Support for additional metadata injection

Example of use

  • Refer to xlsx_parser_test.go in the current directory, where the test data is in ./examples/testdata/
    • TestXlsxParser_Default: The default configuration uses the first worksheet with the first row as the header
    • TestXlsxParser_WithAnotherSheet: Use the second sheet with the first row as the header
    • TestXlsxParser_WithHeader: Use the third sheet with the first row is not used as the header
    • TestXlsxParser_WithIDPrefix: Use IDPrefix to customize the ID of the output document

Metadata Description

Traversing the doc obtained by docs, doc.Metadata contains the following two types of metadata:

  • _row: Structured mappings that contain data
  • _ext: Additional metadata injected via parsing options
  • example:
    • { "_row": { "name": "lihua", "age": "21" }, "_ext": { "test": "test" } }

where '_row' has a value only if the first row is the header; Of course, you can also go directly through docs, starting with doc.Content: Get the content of the document line directly.

Examples

See the following examples for more usage:

License

This project is licensed under the Apache-2.0 License.

Documentation

Index

Constants

View Source
const (
	MetaDataRow = "_row"
	MetaDataExt = "_ext"
)

Variables

This section is empty.

Functions

func NewXlsxParser

func NewXlsxParser(ctx context.Context, config *Config) (xlp parser.Parser, err error)

NewXlsxParser Create a new xlsxParser

Types

type Config

type Config struct {
	// SheetName is set to Sheet1 by default, which means that the first table is processed
	SheetName string
	// NoHeader is set to false by default, which means that the first row is used as the table header
	NoHeader bool
	// IDPrefix is set to customize the prefix of document ID, default 1,2,3, ...
	IDPrefix string
}

Config Used to configure xlsxParser

type XlsxParser

type XlsxParser struct {
	Config *Config
}

XlsxParser Custom parser for parsing Xlsx file content Can be used to work with Xlsx files with headers or without headers You can also select a specific table from the xlsx file in multiple sheet tables You can also customize the prefix of the document ID

func (*XlsxParser) Parse

func (xlp *XlsxParser) Parse(ctx context.Context, reader io.Reader, opts ...parser.Option) ([]*schema.Document, error)

Parse parses the XLSX content from io.Reader.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL