Documentation
¶
Index ¶
- type Dataset
- type ExamplePreprocessFunc
- type SemanticSimilarityDataset
- func (s *SemanticSimilarityDataset) Close() error
- func (s *SemanticSimilarityDataset) Reset()
- func (s *SemanticSimilarityDataset) SetTokenizationPipeline(pipeline backends.Pipeline) error
- func (s *SemanticSimilarityDataset) SetVerbose(v bool)
- func (s *SemanticSimilarityDataset) Validate() error
- func (s *SemanticSimilarityDataset) Yield() (spec any, inputs []*tensors.Tensor, labels []*tensors.Tensor, err error)
- func (s *SemanticSimilarityDataset) YieldRaw() ([]SemanticSimilarityExample, error)
- type SemanticSimilarityExample
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ExamplePreprocessFunc ¶ added in v0.3.5
type ExamplePreprocessFunc func([]SemanticSimilarityExample) ([]SemanticSimilarityExample, error)
type SemanticSimilarityDataset ¶
SemanticSimilarityDataset is a dataset for fine-tuning a feature extraction pipeline for textual semantic similarity.
func NewInMemorySemanticSimilarityDataset ¶ added in v0.3.5
func NewInMemorySemanticSimilarityDataset(examples []SemanticSimilarityExample, batchSize int, preprocessFunc ExamplePreprocessFunc) (*SemanticSimilarityDataset, error)
NewInMemorySemanticSimilarityDataset creates a new SemanticSimilarityDataset in memory from a slice of examples. preprocessFunc here must be a function that takes a slice of SemanticSimilarityExample and returns a slice of SemanticSimilarityExample. This function can be used to apply any custom preprocessing to the example batch before they are passed to the model.
func NewSemanticSimilarityDataset ¶ added in v0.3.5
func NewSemanticSimilarityDataset(trainingPath string, batchSize int, preprocessFunc ExamplePreprocessFunc) (*SemanticSimilarityDataset, error)
NewSemanticSimilarityDataset creates a new SemanticSimilarityDataset. The trainingPath must be a .jsonl file where each line has the following format: {"sentence1":"A plane is taking off.","sentence2":"An air plane is taking off.","score":1.0} The score is a float value between 0 and 1. preprocessFunc here must be a function that takes a slice of SemanticSimilarityExample and returns a slice of SemanticSimilarityExample. This function can be used to apply any custom preprocessing to the example batch before they are passed to the model.
func (*SemanticSimilarityDataset) Close ¶ added in v0.3.5
func (s *SemanticSimilarityDataset) Close() error
func (*SemanticSimilarityDataset) Reset ¶ added in v0.3.5
func (s *SemanticSimilarityDataset) Reset()
Reset resets the dataset to the beginning of the training data (after the epoch is done).
func (*SemanticSimilarityDataset) SetTokenizationPipeline ¶
func (s *SemanticSimilarityDataset) SetTokenizationPipeline(pipeline backends.Pipeline) error
func (*SemanticSimilarityDataset) SetVerbose ¶
func (s *SemanticSimilarityDataset) SetVerbose(v bool)
func (*SemanticSimilarityDataset) Validate ¶ added in v0.3.5
func (s *SemanticSimilarityDataset) Validate() error
func (*SemanticSimilarityDataset) Yield ¶ added in v0.3.5
func (s *SemanticSimilarityDataset) Yield() (spec any, inputs []*tensors.Tensor, labels []*tensors.Tensor, err error)
Yield returns the next batch of examples from the dataset. The examples are tokenized and converted to tensors for the training process.
func (*SemanticSimilarityDataset) YieldRaw ¶ added in v0.3.5
func (s *SemanticSimilarityDataset) YieldRaw() ([]SemanticSimilarityExample, error)
YieldRaw returns the next raw batch of examples from the dataset. Note that if a preprocessing function has been provided at creation time, the examples will be preprocessed before being returned.
type SemanticSimilarityExample ¶ added in v0.3.5
type SemanticSimilarityExample struct {
Data map[string]any // to store any additional data for the example. Not used by the dataset.
Sentence1 string `json:"sentence1"`
Sentence2 string `json:"sentence2"`
Score float32 `json:"score"`
}
SemanticSimilarityExample is a single example for the semantic similarity dataset.