datasets

package

v0.7.5 Latest Latest Go to latest Published: Jun 5, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/knights-analytics/hugot

Links

Open Source Insights

Documentation ¶

Index ¶

type Dataset
type ExamplePreprocessFunc
type SemanticSimilarityDataset
- func NewInMemorySemanticSimilarityDataset(examples []SemanticSimilarityExample, batchSize int, ...) (*SemanticSimilarityDataset, error)
- func NewSemanticSimilarityDataset(ctx context.Context, trainingPath string, batchSize int, ...) (*SemanticSimilarityDataset, error)
type SemanticSimilarityExample

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Dataset ¶

type Dataset interface {
	train.Dataset
	Validate() error
	SetTokenizationPipeline(pipeline backends.Pipeline) error
	SetVerbose(bool)
	Close() error
}

type ExamplePreprocessFunc ¶ added in v0.3.5

type ExamplePreprocessFunc func([]SemanticSimilarityExample) ([]SemanticSimilarityExample, error)

type SemanticSimilarityDataset ¶

type SemanticSimilarityDataset struct {
	train.Dataset
	// contains filtered or unexported fields
}

SemanticSimilarityDataset is a dataset for fine-tuning a feature extraction pipeline for textual semantic similarity.

func NewInMemorySemanticSimilarityDataset ¶ added in v0.3.5

func NewInMemorySemanticSimilarityDataset(examples []SemanticSimilarityExample, batchSize int, preprocessFunc ExamplePreprocessFunc) (*SemanticSimilarityDataset, error)

NewInMemorySemanticSimilarityDataset creates a new SemanticSimilarityDataset in memory from a slice of examples. preprocessFunc here must be a function that takes a slice of SemanticSimilarityExample and returns a slice of SemanticSimilarityExample. This function can be used to apply any custom preprocessing to the example batch before they are passed to the model.

func NewSemanticSimilarityDataset ¶ added in v0.3.5

func NewSemanticSimilarityDataset(ctx context.Context, trainingPath string, batchSize int, preprocessFunc ExamplePreprocessFunc) (*SemanticSimilarityDataset, error)

NewSemanticSimilarityDataset creates a new SemanticSimilarityDataset. The trainingPath must be a .jsonl file where each line has the following format: {"sentence1":"A plane is taking off.","sentence2":"An air plane is taking off.","score":1.0} The score is a float value between 0 and 1. preprocessFunc here must be a function that takes a slice of SemanticSimilarityExample and returns a slice of SemanticSimilarityExample. This function can be used to apply any custom preprocessing to the example batch before they are passed to the model.

func (*SemanticSimilarityDataset) Close ¶ added in v0.3.5

func (s *SemanticSimilarityDataset) Close() error

func (*SemanticSimilarityDataset) Reset ¶ added in v0.3.5

func (s *SemanticSimilarityDataset) Reset()

Reset resets the dataset to the beginning of the training data (after the epoch is done).

func (*SemanticSimilarityDataset) SetTokenizationPipeline ¶

func (s *SemanticSimilarityDataset) SetTokenizationPipeline(pipeline backends.Pipeline) error

func (*SemanticSimilarityDataset) SetVerbose ¶

func (s *SemanticSimilarityDataset) SetVerbose(v bool)

func (*SemanticSimilarityDataset) Validate ¶ added in v0.3.5

func (s *SemanticSimilarityDataset) Validate() error

func (*SemanticSimilarityDataset) Yield ¶ added in v0.3.5

func (s *SemanticSimilarityDataset) Yield() (spec any, inputs []*tensors.Tensor, labels []*tensors.Tensor, err error)

Yield returns the next batch of examples from the dataset. The examples are tokenized and converted to tensors for the training process.

func (*SemanticSimilarityDataset) YieldRaw ¶ added in v0.3.5

func (s *SemanticSimilarityDataset) YieldRaw() ([]SemanticSimilarityExample, error)

YieldRaw returns the next raw batch of examples from the dataset. Note that if a preprocessing function has been provided at creation time, the examples will be preprocessed before being returned.

type SemanticSimilarityExample ¶ added in v0.3.5

type SemanticSimilarityExample struct {
	Data      map[string]any // to store any additional data for the example. Not used by the dataset.
	Sentence1 string         `json:"sentence1"`
	Sentence2 string         `json:"sentence2"`
	Score     float32        `json:"score"`
}

SemanticSimilarityExample is a single example for the semantic similarity dataset.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL