Documentation
¶
Index ¶
- func CreateImageGenModel(modelName, modelDir, quantize string, createLayer LayerCreator, ...) error
- func CreateSafetensorsModel(modelName, modelDir, quantize string, createLayer LayerCreator, ...) error
- func ExpertGroupPrefix(tensorName string) string
- func GetModelArchitecture(modelName string) (string, error)
- func GetTensorQuantization(name string, shape []int32, quantize string) string
- func IsImageGenModel(modelName string) bool
- func IsSafetensorsLLMModel(modelName string) bool
- func IsSafetensorsModel(modelName string) bool
- func IsSafetensorsModelDir(dir string) bool
- func IsTensorModelDir(dir string) bool
- func ShouldQuantize(name, component string) bool
- func ShouldQuantizeTensor(name string, shape []int32, quantize string) bool
- type LayerCreator
- type LayerInfo
- type Manifest
- type ManifestLayer
- type ManifestWriter
- type ModelConfig
- type PackedTensorInput
- type PackedTensorLayerCreator
- type QuantizingTensorLayerCreator
- type TensorLayerCreator
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CreateImageGenModel ¶
func CreateImageGenModel(modelName, modelDir, quantize string, createLayer LayerCreator, createTensorLayer QuantizingTensorLayerCreator, writeManifest ManifestWriter, fn func(status string)) error
CreateImageGenModel imports an image generation model from a directory. Stores each tensor as a separate blob for fine-grained deduplication. If quantize is specified, linear weights in transformer/text_encoder are quantized. Supported quantization types: int4, int8, nvfp4, mxfp8 (or empty for no quantization). Layer creation and manifest writing are done via callbacks to avoid import cycles.
func CreateSafetensorsModel ¶
func CreateSafetensorsModel(modelName, modelDir, quantize string, createLayer LayerCreator, createTensorLayer QuantizingTensorLayerCreator, writeManifest ManifestWriter, fn func(status string), createPackedLayer ...PackedTensorLayerCreator) error
CreateSafetensorsModel imports a standard safetensors model from a directory. This handles Hugging Face style models with config.json and *.safetensors files. Stores each tensor as a separate blob for fine-grained deduplication. Expert tensors are packed into per-layer blobs when createPackedLayer is non-nil. If quantize is non-empty (e.g., "int8"), eligible tensors will be quantized.
func ExpertGroupPrefix ¶ added in v0.16.0
ExpertGroupPrefix returns the group prefix for expert tensors that should be packed together. For example:
- "model.layers.1.mlp.experts.0.down_proj.weight" -> "model.layers.1.mlp.experts"
- "model.layers.1.mlp.shared_experts.down_proj.weight" -> "model.layers.1.mlp.shared_experts"
- "model.layers.0.mlp.down_proj.weight" -> "" (dense layer, no experts)
- "model.layers.1.mlp.gate.weight" -> "" (routing gate, not an expert)
func GetModelArchitecture ¶
GetModelArchitecture returns the architecture from the model's config.json layer.
func GetTensorQuantization ¶ added in v0.15.5
GetTensorQuantization returns the appropriate quantization type for a tensor. Returns "" if the tensor should not be quantized. This implements mixed-precision quantization:
- Attention MLA weights (q_a, q_b, kv_a, kv_b): unquantized (most sensitive)
- Output projection, gate/up weights: int4 (less sensitive)
- Down projection weights: int8 (more sensitive, would be Q6 in GGML but no MLX kernel)
- Norms, embeddings, biases, routing gates: no quantization
func IsImageGenModel ¶
IsImageGenModel checks if a model is an image generation model (has image capability).
func IsSafetensorsLLMModel ¶
IsSafetensorsLLMModel checks if a model is a safetensors LLM model (has completion capability, not image generation).
func IsSafetensorsModel ¶
IsSafetensorsModel checks if a model was created with the experimental safetensors builder by checking the model format in the config.
func IsSafetensorsModelDir ¶
IsSafetensorsModelDir checks if the directory contains a standard safetensors model by looking for config.json and at least one .safetensors file.
func IsTensorModelDir ¶
IsTensorModelDir checks if the directory contains a diffusers-style tensor model by looking for model_index.json, which is the standard diffusers pipeline config.
func ShouldQuantize ¶
ShouldQuantize returns true if a tensor should be quantized. For image gen models (component non-empty): quantizes linear weights, skipping VAE, embeddings, norms. For LLM models (component empty): quantizes linear weights, skipping embeddings, norms, and small tensors.
func ShouldQuantizeTensor ¶
ShouldQuantizeTensor returns true if a tensor should be quantized based on name, shape, and quantize type. This is a more detailed check that also considers tensor dimensions. The quantize parameter specifies the quantization type (e.g., "int4", "nvfp4", "int8", "mxfp8").
Types ¶
type LayerCreator ¶
LayerCreator is called to create a blob layer. name is the path-style name (e.g., "tokenizer/tokenizer.json")
type LayerInfo ¶
type LayerInfo struct {
Digest string
Size int64
MediaType string
Name string // Path-style name: "component/tensor" or "path/to/config.json"
}
LayerInfo holds metadata for a created layer.
type Manifest ¶
type Manifest struct {
SchemaVersion int `json:"schemaVersion"`
MediaType string `json:"mediaType"`
Config ManifestLayer `json:"config"`
Layers []ManifestLayer `json:"layers"`
}
Manifest represents the manifest JSON structure.
type ManifestLayer ¶
type ManifestLayer struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int64 `json:"size"`
Name string `json:"name,omitempty"`
}
ManifestLayer represents a layer in the manifest.
type ManifestWriter ¶
ManifestWriter writes the manifest file.
type ModelConfig ¶
type ModelConfig struct {
ModelFormat string `json:"model_format"`
Capabilities []string `json:"capabilities"`
}
ModelConfig represents the config blob stored with a model.
type PackedTensorInput ¶ added in v0.16.0
type PackedTensorInput struct {
Name string
Dtype string
Shape []int32
Quantize string // per-tensor quantization type (may differ within group)
Reader io.Reader // safetensors-wrapped tensor data
}
PackedTensorInput holds metadata for a tensor that will be packed into a multi-tensor blob.
type PackedTensorLayerCreator ¶ added in v0.16.0
type PackedTensorLayerCreator func(groupName string, tensors []PackedTensorInput) (LayerInfo, error)
PackedTensorLayerCreator creates a single blob layer containing multiple packed tensors. groupName is the group prefix (e.g., "model.layers.1.mlp.experts").
type QuantizingTensorLayerCreator ¶
type QuantizingTensorLayerCreator func(r io.Reader, name, dtype string, shape []int32, quantize string) ([]LayerInfo, error)
QuantizingTensorLayerCreator creates tensor layers with optional quantization. When quantize is non-empty (e.g., "int8"), returns multiple layers (weight + scales + biases).