entity

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultBatchSize = 32

Variables

This section is empty.

Functions

func NormalizeEntityKey

func NormalizeEntityKey(label, text string) string

NormalizeEntityKey creates a stable, sortable key from an entity label and text.

func NormalizeRelationKey

func NormalizeRelationKey(label, headEntityKey, tailEntityKey string) string

NormalizeRelationKey creates a stable key for a relation between two entity keys.

Types

type Enricher

type Enricher struct {
	// contains filtered or unexported fields
}

Enricher runs batched entity extraction over docsaf sections.

func NewEnricher

func NewEnricher(extractor Extractor, opts ...Option) *Enricher

NewEnricher creates a new entity enricher.

func (*Enricher) Enrich

func (e *Enricher) Enrich(ctx context.Context, sections []docsaf.DocumentSection) (*Result, error)

Enrich extracts entities and relations from sections and groups them by section ID.

type Entity

type Entity struct {
	Text  string
	Label string
	Score float32
	Start int
	End   int
}

Entity is a generic entity extracted from text.

type EntityRecord

type EntityRecord struct {
	ID           string
	Name         string
	Label        string
	MentionCount int
}

EntityRecord tracks a canonical entity node and how often it was mentioned.

func (EntityRecord) ToDocument

func (r EntityRecord) ToDocument() map[string]any

ToDocument converts an EntityRecord to a storage-ready document map.

type ExtractOptions

type ExtractOptions struct {
	EntityLabels   []string
	RelationLabels []string
}

ExtractOptions configures an Extractor request.

type Extraction

type Extraction struct {
	Entities  []Entity
	Relations []Relation
}

Extraction contains the entities and relations extracted from one input text.

type Extractor

type Extractor interface {
	Extract(ctx context.Context, texts []string, opts ExtractOptions) ([]Extraction, error)
}

Extractor extracts entities and optionally relations from a batch of texts.

type Option

type Option func(*Enricher)

Option configures an Enricher.

func WithBatchSize

func WithBatchSize(batchSize int) Option

WithBatchSize sets the number of sections per extractor request.

func WithEntityLabels

func WithEntityLabels(labels []string) Option

WithEntityLabels sets entity labels used by the extractor.

func WithEntityThreshold

func WithEntityThreshold(threshold float32) Option

WithEntityThreshold sets the minimum entity score to keep.

func WithRelationLabels

func WithRelationLabels(labels []string) Option

WithRelationLabels sets relation labels used by the extractor.

func WithRelationThreshold

func WithRelationThreshold(threshold float32) Option

WithRelationThreshold sets the minimum relation score to keep.

func WithTextBuilder

func WithTextBuilder(fn func(docsaf.DocumentSection) string) Option

WithTextBuilder overrides how section text is prepared for extraction.

type Relation

type Relation struct {
	Head  Entity
	Label string
	Score float32
	Tail  Entity
}

Relation is a typed edge between two extracted entities.

type RelationRecord

type RelationRecord struct {
	ID           string
	Label        string
	HeadEntity   string
	TailEntity   string
	HeadName     string
	TailName     string
	HeadLabel    string
	TailLabel    string
	Weight       float64
	MentionCount int
}

RelationRecord tracks a canonical relation node and the sections that mention it.

func (RelationRecord) ToDocument

func (r RelationRecord) ToDocument() map[string]any

ToDocument converts a RelationRecord to a storage-ready document map.

type Result

type Result struct {
	EntityRecords       map[string]EntityRecord
	SectionEntityKeys   map[string][]string
	RelationRecords     map[string]RelationRecord
	SectionRelationKeys map[string][]string
}

Result contains extracted entities and relations grouped by section ID.

func (*Result) EntityLabelCounts

func (r *Result) EntityLabelCounts() map[string]int

EntityLabelCounts summarizes unique entities by label.

func (*Result) RelationLabelCounts

func (r *Result) RelationLabelCounts() map[string]int

RelationLabelCounts summarizes unique relations by label.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL