datasets

package
v0.6.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Dataset

type Dataset interface {
	train.Dataset
	Validate() error
	SetTokenizationPipeline(pipeline backends.Pipeline) error
	SetVerbose(bool)
	Close() error
}

type ExamplePreprocessFunc added in v0.3.5

type ExamplePreprocessFunc func([]SemanticSimilarityExample) ([]SemanticSimilarityExample, error)

type SemanticSimilarityDataset

type SemanticSimilarityDataset struct {
	train.Dataset
	// contains filtered or unexported fields
}

SemanticSimilarityDataset is a dataset for fine-tuning a feature extraction pipeline for textual semantic similarity.

func NewInMemorySemanticSimilarityDataset added in v0.3.5

func NewInMemorySemanticSimilarityDataset(examples []SemanticSimilarityExample, batchSize int, preprocessFunc ExamplePreprocessFunc) (*SemanticSimilarityDataset, error)

NewInMemorySemanticSimilarityDataset creates a new SemanticSimilarityDataset in memory from a slice of examples. preprocessFunc here must be a function that takes a slice of SemanticSimilarityExample and returns a slice of SemanticSimilarityExample. This function can be used to apply any custom preprocessing to the example batch before they are passed to the model.

func NewSemanticSimilarityDataset added in v0.3.5

func NewSemanticSimilarityDataset(trainingPath string, batchSize int, preprocessFunc ExamplePreprocessFunc) (*SemanticSimilarityDataset, error)

NewSemanticSimilarityDataset creates a new SemanticSimilarityDataset. The trainingPath must be a .jsonl file where each line has the following format: {"sentence1":"A plane is taking off.","sentence2":"An air plane is taking off.","score":1.0} The score is a float value between 0 and 1. preprocessFunc here must be a function that takes a slice of SemanticSimilarityExample and returns a slice of SemanticSimilarityExample. This function can be used to apply any custom preprocessing to the example batch before they are passed to the model.

func (*SemanticSimilarityDataset) Close added in v0.3.5

func (s *SemanticSimilarityDataset) Close() error

func (*SemanticSimilarityDataset) Reset added in v0.3.5

func (s *SemanticSimilarityDataset) Reset()

Reset resets the dataset to the beginning of the training data (after the epoch is done).

func (*SemanticSimilarityDataset) SetTokenizationPipeline

func (s *SemanticSimilarityDataset) SetTokenizationPipeline(pipeline backends.Pipeline) error

func (*SemanticSimilarityDataset) SetVerbose

func (s *SemanticSimilarityDataset) SetVerbose(v bool)

func (*SemanticSimilarityDataset) Validate added in v0.3.5

func (s *SemanticSimilarityDataset) Validate() error

func (*SemanticSimilarityDataset) Yield added in v0.3.5

func (s *SemanticSimilarityDataset) Yield() (spec any, inputs []*tensors.Tensor, labels []*tensors.Tensor, err error)

Yield returns the next batch of examples from the dataset. The examples are tokenized and converted to tensors for the training process.

func (*SemanticSimilarityDataset) YieldRaw added in v0.3.5

YieldRaw returns the next raw batch of examples from the dataset. Note that if a preprocessing function has been provided at creation time, the examples will be preprocessed before being returned.

type SemanticSimilarityExample added in v0.3.5

type SemanticSimilarityExample struct {
	Data      map[string]any // to store any additional data for the example. Not used by the dataset.
	Sentence1 string         `json:"sentence1"`
	Sentence2 string         `json:"sentence2"`
	Score     float32        `json:"score"`
}

SemanticSimilarityExample is a single example for the semantic similarity dataset.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL