stable_diffusion

package module

v0.1.0 Latest Latest Go to latest Published: Jan 2, 2026 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/orangelang/stable-diffusion-go

Links

Open Source Insights

README ¶

stable-diffusion-go

简体中文

A pure Golang binding library for stable-diffusion.cpp based on github.com/ebitengine/purego, no cgo dependency required, supporting cross-platform operation.

🌟 Project Features

Pure Go Implementation: Based on the purego library, calls C++ dynamic libraries without cgo
Cross-platform Support: Supports Windows, Linux, macOS, and other mainstream operating systems
Complete Functionality: Implements the main APIs of stable-diffusion.cpp, including text-to-image, image-to-image, video generation, etc.
Simple and Easy to Use: Provides a concise Go language API for easy integration into existing projects
High Performance: Supports performance optimization features like FlashAttention and model quantization
Includes Precompiled Libraries: Provides precompiled dynamic libraries for Windows platform, ready to use out of the box

📁 Project Structure

stable-diffusion-go/
├── examples/           # Example programs directory
│   ├── txt2img.go      # Text-to-image generation example
│   └── txt2vid.go      # Text-to-video generation example
├── lib/        # Dynamic library directory
│   ├── darwin/ # macOS platform dynamic library
│   │   └── libstable-diffusion.dylib
│   ├── linux/  # Linux platform dynamic library
│   │   └── libstable-diffusion.so
│   ├── windows/ # Windows platform dynamic library
│   │   ├── avx/      # AVX instruction set version
│   │   │   └── stable-diffusion.dll
│   │   ├── avx2/     # AVX2 instruction set version
│   │   │   └── stable-diffusion.dll
│   │   ├── avx512/   # AVX512 instruction set version
│   │   │   └── stable-diffusion.dll
│   │   ├── cuda12/   # CUDA 12 version
│   │   │   └── stable-diffusion.dll
│   │   ├── noavx/    # No AVX instruction set version
│   │   │   └── stable-diffusion.dll
│   │   ├── rocm/     # ROCm version
│   │   │   └── stable-diffusion.dll
│   │   └── vulkan/   # Vulkan version
│   │       └── stable-diffusion.dll
│   ├── ggml.txt
│   ├── stable-diffusion.cpp.txt
│   └── version.txt
├── pkg/                # Go package directory
│   └── sd/             # Core binding library
│       ├── load_library_unix.go   # Unix platform dynamic library loading
│       ├── load_library_windows.go # Windows platform dynamic library loading
│       ├── stable_diffusion.go    # Core functionality implementation
│       └── utils.go               # Auxiliary utility functions
├── .gitignore          # Git ignore file configuration
├── README.md           # Project documentation
├── go.mod              # Go module file
├── go.sum              # Go dependency checksum file
└── stable_diffusion.go # Root directory entry file

Note: All dynamic link library files in the lib directory need to be downloaded from https://github.com/leejet/stable-diffusion.cpp/releases according to the version in lib/version.txt

🚀 Quick Start

1. Install Dependencies

go get github.com/orangelang/stable-diffusion-go

2. Prepare Model Files

Model files need to be prepared before use, supporting multiple formats:

Diffusion models: .gguf format (e.g., z_image_turbo-Q4_K_M.gguf)
LLM models: .gguf format (e.g., Qwen3-4B-Instruct-2507-Q4_K_M.gguf)
VAE models: .safetensors format (e.g., diffusion_pytorch_model.safetensors)

3. Dynamic Library Description

The project includes precompiled dynamic libraries for multiple platforms, located in the pkg/sd/lib/ directory:

Windows: Multiple versions to suit different hardware
- avx/: Supports AVX instruction set
- avx2/: Supports AVX2 instruction set
- avx512/: Supports AVX512 instruction set
- cuda12/: Supports CUDA 12
- noavx/: No AVX instruction set dependency
- rocm/: Supports ROCm
- vulkan/: Supports Vulkan
Linux: libstable-diffusion.so
macOS: libstable-diffusion.dylib

The program automatically selects the appropriate dynamic library based on the current environment, no manual specification required.

4. Run Examples

Text-to-Image Generation

# Enter the examples directory
cd examples

# Run text-to-image example
go run txt2img.go

Example code:

package main

import (
	"fmt"
	stablediffusion "github.com/orangelang/stable-diffusion-go"
)

func main() {
	fmt.Println("Stable Diffusion Go - Text to Image Example")
	fmt.Println("===============================================")

	// Create Stable Diffusion instance
	sd, err := stablediffusion.NewStableDiffusion(&stablediffusion.ContextParams{
		DiffusionModelPath: "path/to/diffusion_model.gguf",
		LLMPath:            "path/to/llm_model.gguf",
		VAEPath:            "path/to/vae_model.safetensors",
		DiffusionFlashAttn: true,
		OffloadParamsToCPU: true,
	})

	if err != nil {
		fmt.Println("Failed to create instance:", err)
		return
	}
	defer sd.Free()

	// Generate image
	err = sd.GenerateImage(&stablediffusion.ImgGenParams{
		Prompt:      "一位穿着明朝服饰的美女行走在花园中",
		Width:       512,
		Height:      512,
		SampleSteps: 10,
		CfgScale:    1.0,
	}, "output.png")

	if err != nil {
		fmt.Println("Failed to generate image:", err)
		return
	}

	fmt.Println("Image generated successfully!")
}

Text-to-Video Generation

# Run text-to-video example
go run txt2vid.go

📚 Core Features

1. Context Management

Create and destroy Stable Diffusion contexts
Support multiple model path configurations
Provide rich performance optimization parameters

2. Text-to-Image Generation (txt2img)

Generate high-quality images from text descriptions
Support Chinese and English prompts
Adjustable image dimensions, sampling steps, CFG scale, and other parameters
Support random seed generation

3. Text-to-Video Generation (txt2vid)

Generate videos from text prompts
Support custom frame count and resolution
Support Easycache optimization
Integrate FFmpeg for video encoding

📝 Usage Guide

Basic Usage

Create Instance: Use NewStableDiffusion to create a Stable Diffusion instance
Configure Parameters: Set context parameters and generation parameters
Generate Content: Call GenerateImage or GenerateVideo to generate content
Release Resources: Use defer sd.Free() to release resources

Context Parameters Description

Parameter Name	Type	Description
DiffusionModelPath	string	Diffusion model file path
LLMPath	string	LLM model file path
VAEPath	string	VAE model file path
NThreads	int	Number of threads
DiffusionFlashAttn	bool	Whether to enable FlashAttention
OffloadParamsToCPU	bool	Whether to offload some parameters to CPU
WType	SDType	Model quantization type

Image Generation Parameters Description

Parameter Name	Type	Description
Prompt	string	Prompt text
NegativePrompt	string	Negative prompt text
Width	int	Image width
Height	int	Image height
Seed	int	Random seed
SampleSteps	int	Number of sampling steps
CfgScale	float64	CFG scale
Strength	float64	Initial image strength (img2img only)

🔧 Performance Optimization

1. Adjust Thread Count

Adjust the NThreads parameter according to the number of CPU cores:

ctxParams := &stablediffusion.ContextParams{
    // Other parameters...
    NThreads: 8, // Adjust according to CPU core count
}

2. Use Quantized Models

Using quantized models can improve performance and reduce memory usage:

ctxParams := &stablediffusion.ContextParams{
    // Other parameters...
    WType: stablediffusion.SDTypeQ4_K, // Use Q4_K quantized model
}

3. Adjust Sampling Steps

Reducing the number of sampling steps can improve generation speed but may reduce image quality:

imgGenParams := &stablediffusion.ImgGenParams{
    // Other parameters...
    SampleSteps: 10, // Reduce sampling steps
}

4. Enable FlashAttention

Enabling FlashAttention can accelerate the diffusion process:

ctxParams := &stablediffusion.ContextParams{
    // Other parameters...
    DiffusionFlashAttn: true,
}

⚠️ Notes

Dynamic Library Path: The program automatically selects the appropriate dynamic library from the pkg/sd/lib/ directory and current environment
Model Compatibility: Ensure using model formats compatible with stable-diffusion.cpp
Dependencies: Install dependencies like CUDA or Vulkan as needed
Video Generation: Requires FFmpeg for video encoding
Memory Usage: Large models may require more memory, it is recommended to use quantized models
About AMD Graphics Cards (Windows Platform): If using AMD graphics cards (including AMD integrated graphics), you need to download the ROCm library and place it in the project root directory, download link: https://github.com/leejet/stable-diffusion.cpp/releases/download/master-453-4ff2c8c/sd-master-4ff2c8c-bin-win-rocm-x64.zip
About Vulkan: If using non-nvidia graphics cards (such as AMD or Intel graphics cards, including integrated graphics), you can install Vulkan to enable GPU acceleration

📦 Example Programs

Text-to-Image Example

package main

import (
	"fmt"
	stablediffusion "github.com/orangelang/stable-diffusion-go"
)

func main() {
	// Create instance
	sd, err := stablediffusion.NewStableDiffusion(&stablediffusion.ContextParams{
		DiffusionModelPath: "models/z_image_turbo-Q4_K_M.gguf",
		LLMPath:            "models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf",
		VAEPath:            "models/diffusion_pytorch_model.safetensors",
		DiffusionFlashAttn: true,
	})
	if err != nil {
		fmt.Println("Failed to create instance:", err)
		return
	}
	defer sd.Free()

	// Generate image
	err = sd.GenerateImage(&stablediffusion.ImgGenParams{
		Prompt:      "A cute Corgi dog running on the grass",
		Width:       512,
		Height:      512,
		SampleSteps: 15,
		CfgScale:    2.0,
	}, "output_corgi.png")

	if err != nil {
		fmt.Println("Failed to generate image:", err)
		return
	}

	fmt.Println("Image generated successfully!")
}

Text-to-Video Example

package main

import (
	"fmt"
	stablediffusion "github.com/orangelang/stable-diffusion-go"
)

func main() {
	// Create instance
	sd, err := stablediffusion.NewStableDiffusion(&stablediffusion.ContextParams{
		DiffusionModelPath: "D:\\hf-mirror\\wan2.1\\wan2.1_t2v_1.3B_bf16.safetensors",
		T5XXLPath:          "D:\\hf-mirror\\wan2.1\\umt5-xxl-encoder-Q4_K_M.gguf",
		VAEPath:            "D:\\hf-mirror\\wan2.1\\wan_2.1_vae.safetensors",
		DiffusionFlashAttn: true,
		KeepClipOnCPU:      true,
		OffloadParamsToCPU: true,
		NThreads:           4,
		FlowShift:          3.0,
	})

	if err != nil {
		fmt.Println("Failed to create stable diffusion instance:", err)
		return
	}
	defer sd.Free()

	err = sd.GenerateVideo(&stablediffusion.VidGenParams{
		Prompt:      "一个在长满桃花树下拍照的美女",
		Width:       300,
		Height:      300,
		SampleSteps: 40,
		VideoFrames: 33,
		CfgScale:    6.0,
	}, "./output.mp4")

	if err != nil {
		fmt.Println("Failed to generate video:", err)
		return
	}

	fmt.Println("Video generated successfully!")
}

📄 License

MIT License

🤝 Contribution

Welcome to submit Issues and Pull Requests!

stable-diffusion.cpp: C++ implementation of Stable Diffusion model
purego: Go language FFI library without cgo

📞 Support

If you encounter problems during use, please:

Check the example code
Check the dynamic library path and model files
Check project Issues
Submit a new Issue

Thank you for using stable-diffusion-go! If this project has helped you, please give us a Star ⭐️

Documentation ¶

Index ¶

Variables
func Convert(inputPath, vaePath, outputPath, outputType, tensorTypeRules string, ...) error
type ContextParams
type Embedding
type ImgGenParams
type Lora
type PMParams
type StableDiffusion
- func NewStableDiffusion(ctxParams *ContextParams) (*StableDiffusion, error)
type Upscaler
- func NewUpscaler(params *UpscalerParams) *Upscaler
- func (us *Upscaler) Upscale(inputImagePath string, upscaleFactor uint32, outputImagePath string) error
type UpscalerParams
type VidGenParams

Constants ¶

This section is empty.

Variables ¶

View Source

var LoraApplyModeMap = map[string]sd.LoraApplyMode{
	"auto":                  sd.LoraApplyAuto,
	"immediately":           sd.LoraApplyImmediately,
	"at_runtime":            sd.LoraApplyAtRuntime,
	"lora_apply_mode_count": sd.LoraApplyModeCount,
}

LoraApplyModeMap LoRA apply mode mapping

View Source

var PredictionMap = map[string]sd.Prediction{
	"eps":        sd.EPSPred,
	"v":          sd.VPred,
	"edm_v":      sd.EDMVPred,
	"flow":       sd.FlowPred,
	"flux_flow":  sd.FluxFlowPred,
	"flux2_flow": sd.Flux2FlowPred,
	"default":    sd.PredictionCount,
}

PredictionMap prediction type mapping

View Source

var PreviewMap = map[string]sd.Preview{
	"none":          sd.PreviewNone,
	"proj":          sd.PreviewProj,
	"tae":           sd.PreviewTAE,
	"vae":           sd.PreviewVAE,
	"preview_count": sd.PreviewCount,
}

PreviewMap preview type mapping

View Source

var RNGTypeMap = map[string]sd.RngType{
	"default":    sd.DefaultRNG,
	"cuda":       sd.CUDARNG,
	"cpu":        sd.CPURNG,
	"type_count": sd.RNGTypeCount,
}

RNGTypeMap RNG type mapping

View Source

var SDTypeMap = map[string]sd.SDType{
	"f32":  sd.SDTypeF32,
	"f16":  sd.SDTypeF16,
	"q4_0": sd.SDTypeQ4_0,
	"q4_1": sd.SDTypeQ4_1,
	"q5_0": sd.SDTypeQ5_0,
	"q5_1": sd.SDTypeQ5_1,
	"q8_0": sd.SDTypeQ8_0,
	"q8_1": sd.SDTypeQ8_1,

	"q2_k":    sd.SDTypeQ2_K,
	"q3_k":    sd.SDTypeQ3_K,
	"q4_k":    sd.SDTypeQ4_K,
	"q5_k":    sd.SDTypeQ5_K,
	"q6_k":    sd.SDTypeQ6_K,
	"q8_k":    sd.SDTypeQ8_K,
	"iq2_xxs": sd.SDTypeIQ2_XXS,
	"iq2_xs":  sd.SDTypeIQ2_XS,
	"iq3_xxs": sd.SDTypeIQ3_XXS,
	"iq1_s":   sd.SDTypeIQ1_S,
	"iq4_nl":  sd.SDTypeIQ4_NL,
	"iq3_s":   sd.SDTypeIQ3_S,
	"iq2_s":   sd.SDTypeIQ2_S,
	"iq4_xs":  sd.SDTypeIQ4_XS,
	"i8":      sd.SDTypeI8,
	"i16":     sd.SDTypeI16,
	"i32":     sd.SDTypeI32,
	"i64":     sd.SDTypeI64,
	"f64":     sd.SDTypeF64,
	"iq1_m":   sd.SDTypeIQ1_M,
	"bf16":    sd.SDTypeBF16,

	"tq1_0": sd.SDTypeTQ1_0,
	"tq2_0": sd.SDTypeTQ2_0,

	"mxfp4":   sd.SDTypeMXFP4,
	"default": sd.SDTypeCount,
}

SDTypeMap SDType mapping

View Source

var SampleMethodMap = map[string]sd.SampleMethod{
	"default":             -1,
	"euler":               sd.EulerSampleMethod,
	"euler_a":             sd.EulerASampleMethod,
	"heun":                sd.HeunSampleMethod,
	"dpm2":                sd.DPM2SampleMethod,
	"dpm++2s_a":           sd.DPMPP2SASampleMethod,
	"dpm++2m":             sd.DPMPP2MSampleMethod,
	"dpm++2mv2":           sd.DPMPP2Mv2SampleMethod,
	"ipndm":               sd.IPNDMSampleMethod,
	"ipndm_v":             sd.IPNDMSampleMethodV,
	"lcm":                 sd.LCMSampleMethod,
	"ddim_trailing":       sd.DDIMTrailingSampleMethod,
	"tcd":                 sd.TCDSampleMethod,
	"sample_method_count": sd.SampleMethodCount,
}

SampleMethodMap sampling method mapping

View Source

var SchedulerMap = map[string]sd.Scheduler{
	"default":         -1,
	"discrete":        sd.DiscreteScheduler,
	"karras":          sd.KarrasScheduler,
	"exponential":     sd.ExponentialScheduler,
	"ays":             sd.AYSScheduler,
	"gits":            sd.GITScheduler,
	"sgm_uniform":     sd.SGMUniformScheduler,
	"simple":          sd.SimpleScheduler,
	"smoothstep":      sd.SmoothstepScheduler,
	"kl_optimal":      sd.KLOptimalScheduler,
	"lcm":             sd.LCMScheduler,
	"scheduler_count": sd.SchedulerCount,
}

SchedulerMap scheduler mapping

Functions ¶

func Convert ¶

func Convert(inputPath, vaePath, outputPath, outputType, tensorTypeRules string, convertName bool) error

Convert model conversion function, convert a model to gguf format. inputPath: Path to the input model. vaePath: Path to the vae. outputPath: Path to save the converted model. outputType: The weight type (default: auto). tensorTypeRules: Weight type per tensor pattern (example: "^vae\\\\.=f16,model\\\\.=q8_0")

Types ¶

type ContextParams ¶

type ContextParams struct {
	ModelPath                   string     // Full model path
	ClipLPath                   string     // CLIP-L text encoder path
	ClipGPath                   string     // CLIP-G text encoder path
	ClipVisionPath              string     // CLIP Vision encoder path
	T5XXLPath                   string     // T5-XXL text encoder path
	LLMPath                     string     // LLM text encoder path (e.g., qwenvl2.5 for qwen-image, mistral-small3.2 for flux2)
	LLMVisionPath               string     // LLM Vision encoder path
	DiffusionModelPath          string     // Standalone diffusion model path
	HighNoiseDiffusionModelPath string     // Standalone high noise diffusion model path
	VAEPath                     string     // VAE model path
	TAESDPath                   string     // TAE-SD model path, uses Tiny AutoEncoder for fast decoding (low quality)
	ControlNetPath              string     // ControlNet model path
	Embeddings                  *Embedding // Embedding information
	EmbeddingCount              uint32     // Number of embeddings
	PhotoMakerPath              string     // PhotoMaker model path
	TensorTypeRules             string     // Weight type rules per tensor pattern (e.g., "^vae\.=f16,model\.=q8_0")
	VAEDecodeOnly               bool       // Process VAE using only decode mode
	FreeParamsImmediately       bool       // Whether to free parameters immediately
	NThreads                    int32      // Number of threads to use for generation
	WType                       string     // Weight type (default: auto-detect from model file)
	RNGType                     string     // Random number generator type (default: "cuda")
	SamplerRNGType              string     // Sampler random number generator type (default: "cuda")
	Prediction                  string     // Prediction type override
	LoraApplyMode               string     // LoRA application mode (default: "auto")
	OffloadParamsToCPU          bool       // Keep weights in RAM to save VRAM, auto-load to VRAM when needed
	EnableMmap                  bool       // Whether to enable memory mapping
	KeepClipOnCPU               bool       // Keep CLIP on CPU (for low VRAM)
	KeepControlNetOnCPU         bool       // Keep ControlNet on CPU (for low VRAM)
	KeepVAEOnCPU                bool       // Keep VAE on CPU (for low VRAM)
	DiffusionFlashAttn          bool       // Use Flash attention in diffusion model (significantly reduces memory usage)
	TAEPreviewOnly              bool       // Prevent decoding final image with taesd (for preview="tae")
	DiffusionConvDirect         bool       // Use Conv2d direct in diffusion model
	VAEConvDirect               bool       // Use Conv2d direct in VAE model (should improve performance)
	CircularX                   bool       // Enable circular padding on X axis
	CircularY                   bool       // Enable circular padding on Y axis
	ForceSDXLVAConvScale        bool       // Force conv scale on SDXL VAE
	ChromaUseDitMask            bool       // Whether Chroma uses DiT mask
	ChromaUseT5Mask             bool       // Whether Chroma uses T5 mask
	ChromaT5MaskPad             int32      // Chroma T5 mask padding size
	QwenImageZeroCondT          bool       // Qwen-image zero condition T parameter
	FlowShift                   float32    // Shift value for Flow models (e.g., SD3.x or WAN)
}

ContextParams context parameters structure for initializing Stable Diffusion context

type Embedding ¶

type Embedding struct {
	Name string // Embedding name
	Path string // Embedding file path
}

Embedding embedding structure for defining model embeddings

type ImgGenParams ¶

type ImgGenParams struct {
	Loras              *Lora             // LoRA parameters
	LoraCount          uint32            // Number of LoRAs
	Prompt             string            // Prompt to render
	NegativePrompt     string            // Negative prompt
	ClipSkip           int32             // Skip last layers of CLIP network (1 = no skip, 2 = skip one layer, <=0 = not specified)
	InitImagePath      string            // Initial image path for guidance
	RefImagesPath      []string          // Array of reference image paths for Flux Kontext models
	RefImagesCount     int32             // Number of reference images
	AutoResizeRefImage bool              // Whether to auto-resize reference images
	IncreaseRefIndex   bool              // Whether to auto-increase index based on reference image list order (starting from 1)
	MaskImagePath      string            // Inpainting mask image path
	Width              int32             // Image width (pixels)
	Height             int32             // Image height (pixels)
	CfgScale           float32           // Unconditional guidance scale.
	ImageCfgScale      float32           // Image guidance scale for inpaint or instruct-pix2pix models (default: same as `CfgScale`).
	DistilledGuidance  float32           // Distilled guidance scale for models with guidance input.
	SkipLayers         []int32           // Layers to skip for SLG steps (SLG will be enabled at step int([STEPS]x[START]) and disabled at int([STEPS]x[END])).
	SkipLayerStart     float32           // SLG enabling point.
	SkipLayerEnd       float32           // SLG disabling point.
	SlgScale           float32           // Skip layer guidance (SLG) scale, only for DiT models.
	Scheduler          string            // Denoiser sigma scheduler (default: discrete).
	SampleMethod       string            // Sampling method (default: euler for Flux/SD3/Wan, euler_a otherwise).
	SampleSteps        int32             // Number of sample steps.
	Eta                float32           // Eta in DDIM, only for DDIM and TCD.
	ShiftedTimestep    int32             // Shift timestep for NitroFusion models, default: 0, recommended N for NitroSD-Realism around 250 and 500 for NitroSD-Vibrant.
	CustomSigmas       []float32         // Custom sigma values for the sampler, comma-separated (e.g. "14.61,7.8,3.5,0.0").
	Strength           float32           // Noise/denoise strength (range [0.0, 1.0])
	Seed               int64             // RNG seed (< 0 for random seed)
	BatchCount         int32             // Number of images to generate
	ControlImagePath   string            // Control condition image path for ControlNet
	ControlStrength    float32           // Strength to apply ControlNet
	PMParams           *PMParams         // PhotoMaker parameters
	VAETilingParams    sd.SDTilingParams // VAE tiling parameters for reducing memory usage
	CacheParams        sd.SDCacheParams  // Cache parameters for DiT models
}

ImgGenParams image generation parameters structure for defining image generation related parameters

type Lora ¶

type Lora struct {
	IsHighNoise bool    // Whether it's a high noise LoRA
	Multiplier  float32 // LoRA multiplier
	Path        string  // LoRA file path
}

Lora LoRA structure for defining LoRA model parameters

type PMParams ¶

type PMParams struct {
	IDImages      *sd.SDImage // ID images pointer
	IDImagesCount int32       // Number of ID images
	IDEmbedPath   string      // PhotoMaker v2 ID embedding path
	StyleStrength float32     // Strength to keep PhotoMaker input identity
}

PMParams PhotoMaker parameters structure for defining PhotoMaker related parameters

type StableDiffusion ¶

type StableDiffusion struct {
	// contains filtered or unexported fields
}

StableDiffusion Stable Diffusion structure containing context pointer

func NewStableDiffusion ¶

func NewStableDiffusion(ctxParams *ContextParams) (*StableDiffusion, error)

NewStableDiffusion creates a stable diffusion instance

func (*StableDiffusion) Free ¶

func (sDiffusion *StableDiffusion) Free()

Free frees the stable diffusion context

func (*StableDiffusion) GenerateImage ¶

func (sDiffusion *StableDiffusion) GenerateImage(imgGenParams *ImgGenParams, newImagePath string) error

GenerateImage generates image from text or image

func (*StableDiffusion) GenerateVideo ¶

func (sDiffusion *StableDiffusion) GenerateVideo(vidGenParams *VidGenParams, newVideoPath string) error

GenerateVideo generates video

type Upscaler ¶

type Upscaler struct {
	// contains filtered or unexported fields
}

func NewUpscaler ¶

func NewUpscaler(params *UpscalerParams) *Upscaler

NewUpscaler creates a new upscaler context

func (*Upscaler) Upscale ¶

func (us *Upscaler) Upscale(inputImagePath string, upscaleFactor uint32, outputImagePath string) error

Upscale upscaling function

type UpscalerParams ¶

type UpscalerParams struct {
	EsrganPath         string // ESRGAN model path
	OffloadParamsToCPU bool   // Whether to save parameters to CPU
	Direct             bool   // Whether to use direct mode
	NThreads           int    // Number of threads to use
	TileSize           int    // Tile size
}

type VidGenParams ¶

type VidGenParams struct {
	Loras             *Lora    // LoRA parameters
	LoraCount         uint32   // Number of LoRAs
	Prompt            string   // Prompt to render
	NegativePrompt    string   // Negative prompt
	ClipSkip          int32    // Skip last layers of CLIP network (1 = no skip, 2 = skip one layer, <=0 = not specified)
	InitImagePath     string   // Initial image path for starting generation
	EndImagePath      string   // End image path for ending generation (required for flf2v)
	ControlFramesPath []string // Array of control frame image paths for video
	ControlFramesSize int32    // Control frame size
	Width             int32    // Video width (pixels)
	Height            int32    // Video height (pixels)

	CfgScale          float32   // Unconditional guidance scale.
	ImageCfgScale     float32   // Image guidance scale for inpaint or instruct-pix2pix models (default: same as `CfgScale`).
	DistilledGuidance float32   // Distilled guidance scale for models with guidance input.
	SkipLayers        []int32   // Layers to skip for SLG steps (SLG will be enabled at step int([STEPS]x[START]) and disabled at int([STEPS]x[END])).
	SkipLayerStart    float32   // SLG enabling point.
	SkipLayerEnd      float32   // SLG disabling point.
	SlgScale          float32   // Skip layer guidance (SLG) scale, only for DiT models.
	Scheduler         string    // Denoiser sigma scheduler (default: discrete).
	SampleMethod      string    // Sampling method (default: euler for Flux/SD3/Wan, euler_a otherwise).
	SampleSteps       int32     // Number of sample steps.
	Eta               float32   // Eta in DDIM, only for DDIM and TCD.
	ShiftedTimestep   int32     // Shift timestep for NitroFusion models, default: 0, recommended N for NitroSD-Realism around 250 and 500 for NitroSD-Vibrant.
	CustomSigmas      []float32 // Custom sigma values for the sampler, comma-separated (e.g. "14.61,7.8,3.5,0.0").

	HighNoiseCfgScale          float32   // High noise diffusion model equivalent of `cfg_scale`.
	HighNoiseImageCfgScale     float32   // High noise diffusion model equivalent of `image_cfg_scale`.
	HighNoiseDistilledGuidance float32   // High noise diffusion model equivalent of `guidance`.
	HighNoiseSkipLayers        []int32   // High noise diffusion model equivalent of `skip_layers`.
	HighNoiseSkipLayerStart    float32   // High noise diffusion model equivalent of `skip_layer_start`.
	HighNoiseSkipLayerEnd      float32   // High noise diffusion model equivalent of `skip_layer_end`.
	HighNoiseSlgScale          float32   // High noise diffusion model equivalent of `slg_scale`.
	HighNoiseScheduler         string    // High noise diffusion model equivalent of `scheduler`.
	HighNoiseSampleMethod      string    // High noise diffusion model equivalent of `sample_method`.
	HighNoiseSampleSteps       int32     // High noise diffusion model equivalent of `sample_steps` (default: -1 = auto).
	HighNoiseEta               float32   // High noise diffusion model equivalent of `eta`.
	HighNoiseShiftedTimestep   int32     // Shift timestep for NitroFusion models, default: 0, recommended N for NitroSD-Realism around 250 and 500 for NitroSD-Vibrant.
	HighNoiseCustomSigmas      []float32 // Custom sigma values for the sampler, comma-separated (e.g. "14.61,7.8,3.5,0.0").

	MOEBoundary  float32          // Timestep boundary for Wan2.2 MoE models
	Strength     float32          // Noise/denoise strength (range [0.0, 1.0])
	Seed         int64            // RNG seed (< 0 for random seed)
	VideoFrames  int32            // Number of video frames to generate
	VaceStrength float32          // Wan VACE strength
	CacheParams  sd.SDCacheParams // Cache parameters for DiT models
}

VidGenParams video generation parameters structure for defining video generation related parameters

Source Files ¶

View all Source files

stable_diffusion.go

Directories ¶

Path	Synopsis
examples
pkg
sd

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

stable-diffusion-go

🌟 Project Features

📁 Project Structure

🚀 Quick Start

1. Install Dependencies

2. Prepare Model Files

3. Dynamic Library Description

4. Run Examples

Text-to-Image Generation

Text-to-Video Generation

📚 Core Features

1. Context Management

2. Text-to-Image Generation (txt2img)

3. Text-to-Video Generation (txt2vid)

📝 Usage Guide

Basic Usage

Context Parameters Description

Image Generation Parameters Description

🔧 Performance Optimization

1. Adjust Thread Count

2. Use Quantized Models

3. Adjust Sampling Steps

4. Enable FlashAttention

⚠️ Notes

📦 Example Programs

Text-to-Image Example

Text-to-Video Example

📄 License

🤝 Contribution

🔗 Related Projects

📞 Support

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func Convert ¶

Types ¶

type ContextParams ¶

type Embedding ¶

type ImgGenParams ¶

type Lora ¶

type PMParams ¶

type StableDiffusion ¶

func NewStableDiffusion ¶

func (*StableDiffusion) Free ¶

func (*StableDiffusion) GenerateImage ¶

func (*StableDiffusion) GenerateVideo ¶

type Upscaler ¶

func NewUpscaler ¶

func (*Upscaler) Upscale ¶

type UpscalerParams ¶

type VidGenParams ¶

Source Files ¶

Directories ¶