Documentation
¶
Overview ¶
Package chunking provides primitives to interact with the openapi HTTP API.
Code generated by github.com/oapi-codegen/oapi-codegen/v2 version v2.5.1 DO NOT EDIT.
Index ¶
- Constants
- func GetSwagger() (swagger *openapi3.T, err error)
- func PathToRawSpec(pathToFile string) map[string]func() ([]byte, error)
- type AudioChunkOptions
- type BinaryContent
- type Chunk
- func (t Chunk) AsBinaryContent() (BinaryContent, error)
- func (t Chunk) AsTextContent() (TextContent, error)
- func (t *Chunk) FromBinaryContent(v BinaryContent) error
- func (t *Chunk) FromTextContent(v TextContent) error
- func (c Chunk) GetText() string
- func (t Chunk) MarshalJSON() ([]byte, error)
- func (t *Chunk) MergeBinaryContent(v BinaryContent) error
- func (t *Chunk) MergeTextContent(v TextContent) error
- func (t *Chunk) UnmarshalJSON(b []byte) error
- type ChunkOptions
- type Chunker
- type TextChunkOptions
- type TextContent
Constants ¶
const ( // ModelFixedBert uses BERT's WordPiece tokenization (~30k vocab). // Good for general-purpose text and multilingual content. ModelFixedBert = "fixed-bert-tokenizer" // ModelFixedBPE uses OpenAI's tiktoken BPE tokenization (cl100k_base, ~100k vocab). // Good for GPT-style models and code. ModelFixedBPE = "fixed-bpe-tokenizer" )
Fixed chunker model names
const MIMETypePlainText = "text/plain"
MIMETypePlainText is the MIME type for text/plain content chunks.
Variables ¶
This section is empty.
Functions ¶
func GetSwagger ¶
GetSwagger returns the Swagger specification corresponding to the generated code in this file. The external references of Swagger specification are resolved. The logic of resolving external references is tightly connected to "import-mapping" feature. Externally referenced files must be embedded in the corresponding golang packages. Urls can be supported but this task was out of the scope.
Types ¶
type AudioChunkOptions ¶
type AudioChunkOptions struct {
// OverlapDurationMs Overlap duration in milliseconds between audio chunks (default: 0).
OverlapDurationMs int `json:"overlap_duration_ms,omitempty,omitzero"`
// WindowDurationMs Window duration in milliseconds for fixed-window audio chunking (default: 30000).
WindowDurationMs int `json:"window_duration_ms,omitempty,omitzero"`
}
AudioChunkOptions Options specific to audio chunking.
type BinaryContent ¶
type BinaryContent struct {
// Data Base64-encoded binary data (valid WAV, PNG, etc.)
Data []byte `json:"data,omitempty,omitzero"`
// EndTimeMs Audio: window end time in milliseconds
EndTimeMs float32 `json:"end_time_ms,omitempty,omitzero"`
// FrameDelayMs Animation: display delay in milliseconds
FrameDelayMs int `json:"frame_delay_ms,omitempty,omitzero"`
// FrameIndex Animation: frame number
FrameIndex int `json:"frame_index,omitempty,omitzero"`
// StartTimeMs Audio: window start time in milliseconds
StartTimeMs float32 `json:"start_time_ms,omitempty,omitzero"`
}
BinaryContent Binary media content with format-specific metadata.
type Chunk ¶
type Chunk struct {
// Id Sequence number of the chunk (0, 1, 2, ...)
Id uint32 `json:"id"`
// MimeType MIME type: text/plain, audio/wav, image/png, etc.
MimeType string `json:"mime_type"`
// contains filtered or unexported fields
}
Chunk defines model for Chunk.
func NewTextChunk ¶
NewTextChunk creates a Chunk containing text content with the given parameters.
func (Chunk) AsBinaryContent ¶
func (t Chunk) AsBinaryContent() (BinaryContent, error)
AsBinaryContent returns the union data inside the Chunk as a BinaryContent
func (Chunk) AsTextContent ¶
func (t Chunk) AsTextContent() (TextContent, error)
AsTextContent returns the union data inside the Chunk as a TextContent
func (*Chunk) FromBinaryContent ¶
func (t *Chunk) FromBinaryContent(v BinaryContent) error
FromBinaryContent overwrites any union data inside the Chunk as the provided BinaryContent
func (*Chunk) FromTextContent ¶
func (t *Chunk) FromTextContent(v TextContent) error
FromTextContent overwrites any union data inside the Chunk as the provided TextContent
func (Chunk) GetText ¶
GetText returns the text content of a text chunk. Returns empty string if the chunk is not a text chunk or cannot be decoded.
func (Chunk) MarshalJSON ¶
func (*Chunk) MergeBinaryContent ¶
func (t *Chunk) MergeBinaryContent(v BinaryContent) error
MergeBinaryContent performs a merge with any union data inside the Chunk, using the provided BinaryContent
func (*Chunk) MergeTextContent ¶
func (t *Chunk) MergeTextContent(v TextContent) error
MergeTextContent performs a merge with any union data inside the Chunk, using the provided TextContent
func (*Chunk) UnmarshalJSON ¶
type ChunkOptions ¶
type ChunkOptions struct {
// Audio Options specific to audio chunking.
Audio AudioChunkOptions `json:"audio,omitempty,omitzero"`
// MaxChunks Maximum number of chunks to generate per document.
MaxChunks int `json:"max_chunks,omitempty,omitzero"`
// Text Options specific to text chunking.
Text TextChunkOptions `json:"text,omitempty,omitzero"`
// Threshold Confidence threshold for model-based chunking (0.0-1.0).
Threshold float32 `json:"threshold,omitempty,omitzero"`
}
ChunkOptions Per-request configuration for chunking. All fields are optional - zero/omitted values use chunker defaults.
type Chunker ¶
type Chunker interface {
// Chunk splits text using the provided per-request options.
// Options that are nil use the chunker's default values.
Chunk(ctx context.Context, text string, opts ChunkOptions) ([]Chunk, error)
Close() error
}
Chunker splits text into semantically meaningful chunks. ChunkOptions is generated from openapi.yaml - see openapi.gen.go
type TextChunkOptions ¶
type TextChunkOptions struct {
// OverlapTokens Number of tokens to overlap between consecutive chunks. Helps maintain context across chunk boundaries. Only used by fixed-size chunkers.
OverlapTokens int `json:"overlap_tokens,omitempty,omitzero"`
// Separator Separator string for splitting (e.g., '\n\n' for paragraphs). Only used by fixed-size chunkers.
Separator string `json:"separator,omitempty,omitzero"`
// TargetTokens Target number of tokens per chunk.
TargetTokens int `json:"target_tokens,omitempty,omitzero"`
}
TextChunkOptions Options specific to text chunking.
type TextContent ¶
type TextContent struct {
// EndChar Character position in original text where chunk ends (exclusive)
EndChar int `json:"end_char"`
// StartChar Character position in original text where chunk starts
StartChar int `json:"start_char"`
// Text The chunk text content
Text string `json:"text"`
}
TextContent Text content with character offsets.