ducklake

package
v0.0.0-...-d42ee61 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package ducklake holds DuckLake configuration, the metadata-store version migration check, and the SQL fragments needed to ATTACH a DuckLake catalog.

This package has no dependency on github.com/duckdb/duckdb-go: the migration check connects to the tenant's metadata Postgres via pgx, and the ATTACH statement helpers return strings. This lets the multi-tenant control plane run the pre-flight migration check without linking libduckdb.

Index

Constants

View Source
const DefaultSpecVersion = "1.0"

DefaultSpecVersion is the DuckLake spec version that this build of duckgres expects. When the metadata store is at an older version, we backup and migrate automatically. This must match the DuckLake version bundled with the current DuckDB driver.

Variables

This section is empty.

Functions

func BackupMetadata

func BackupMetadata(dlCfg Config, dataDir string) error

BackupMetadata runs only the metadata backup (no version check). Exported for use by the control plane to run the backup asynchronously.

func BuildAttachStmt

func BuildAttachStmt(dlCfg Config, migrate bool) string

BuildAttachStmt builds the ATTACH statement for DuckLake. If migrate is true, adds AUTOMATIC_MIGRATION TRUE to the options.

func BuildDeltaAttachStmt

func BuildDeltaAttachStmt(dlCfg Config) string

BuildDeltaAttachStmt builds the ATTACH statement for the Delta extension catalog. The catalog path defaults to a sibling delta/ prefix at the DuckLake object-store root when the config doesn't set one explicitly.

func CheckAndBackupMigration

func CheckAndBackupMigration(dlCfg Config, dataDir string, targetVersion string) (bool, error)

CheckAndBackupMigration runs the migration check for the given DuckLake config and returns whether migration is needed. If migration is needed, it backs up the metadata store first. This is exported for use by the control plane, which runs the check once before activating workers.

func CheckMigrationVersion

func CheckMigrationVersion(dlCfg Config, targetVersion string) (needed bool, version string, err error)

CheckMigrationVersion checks only whether a DuckLake metadata store needs migration, without performing the backup. This is fast (<1s) and safe to call during startup without risking timeouts.

func DefaultDeltaCatalogPath

func DefaultDeltaCatalogPath(dlCfg Config) string

DefaultDeltaCatalogPath returns the default Delta Lake catalog location for a DuckLake-backed worker as a sibling of the DuckLake prefix at the same parent level. For s3://bucket/team/ducklake/ this returns s3://bucket/team/delta/, so per-tenant prefixes do not collapse to a shared bucket-root delta/.

func DeltaCatalogPath

func DeltaCatalogPath(dlCfg Config) string

DeltaCatalogPath resolves the Delta catalog path for a worker config: if the config sets an explicit DeltaCatalogPath, that wins; otherwise, fall back to the default sibling-of-DuckLake-prefix path.

func EnsureMigrationCheck

func EnsureMigrationCheck(dlCfg Config, dataDir string)

EnsureMigrationCheck runs the migration check, retrying on transient errors. Once the check succeeds (regardless of whether migration is needed), the result is locked in and subsequent calls are no-ops. If the check fails, the error is stored but the next call will retry — this prevents a transient failure (e.g., metadata store not yet reachable during pod startup) from permanently blocking all connections.

The backup file is written to dataDir. This should be called BEFORE acquiring the DuckLake attachment semaphore, since the backup can take minutes for large metadata stores.

func MigrationCheckError

func MigrationCheckError(connStr string) error

MigrationCheckError returns the cached error from the most recent migration check for connStr, or nil if the check has not run or succeeded. Use this instead of MigrationNeeded when you need to bail out of the attach path on a failed pre-flight check.

func MigrationCheckedVersion

func MigrationCheckedVersion(connStr string) string

MigrationCheckedVersion returns the version found in the metadata store. Returns "" if the check has not run or the metadata store had no version.

func MigrationNeeded

func MigrationNeeded(connStr string) bool

MigrationNeeded returns whether the ATTACH statement should include AUTOMATIC_MIGRATION TRUE. Safe to call after EnsureMigrationCheck.

Types

type Config

type Config struct {
	// MetadataStore is the connection string for the DuckLake metadata database.
	// Format: "postgres:host=<host> user=<user> password=<password> dbname=<db>"
	MetadataStore string

	// DisableMetadataThreadLocalCache disables postgres_scanner thread-local
	// connection caching for the hidden DuckLake metadata pool as early as
	// possible, before ATTACH creates that pool. This trades some warm-reuse
	// performance for a lower retained metadata-connection footprint.
	// Nil means use the server default (enabled).
	DisableMetadataThreadLocalCache *bool

	// ObjectStore is the S3-compatible storage path for DuckLake data files.
	// Format: "s3://bucket/path/" for S3/MinIO.
	// If not specified, uses DataPath for local storage.
	ObjectStore string

	// DataPath is the local file system path for DuckLake data files.
	// Used when ObjectStore is not set (for local/non-S3 storage).
	DataPath string

	// DeltaCatalogEnabled attaches the DuckDB Delta extension catalog at worker
	// startup/activation in addition to DuckLake. DeltaCatalogPath defaults to a
	// sibling delta/ prefix at the DuckLake object-store root when omitted.
	DeltaCatalogEnabled bool
	DeltaCatalogPath    string

	// S3 credential provider: "config" (explicit credentials) or "credential_chain"
	// (AWS SDK chain). Default: "config" if S3AccessKey is set, otherwise
	// "credential_chain".
	S3Provider string

	// S3 configuration for "config" provider (explicit credentials for MinIO or S3).
	S3Endpoint     string // e.g., "localhost:9000" for MinIO
	S3AccessKey    string // S3 access key ID
	S3SecretKey    string // S3 secret access key
	S3SessionToken string // STS session token for temporary credentials
	S3Region       string // S3 region (default: us-east-1)
	S3UseSSL       bool   // Use HTTPS for S3 connections (default: false for MinIO)
	S3URLStyle     string // "path" or "vhost" (default: "path" for MinIO compatibility)

	// S3 configuration for "credential_chain" provider (AWS SDK credential chain).
	// Chain specifies which credential sources to check, semicolon-separated.
	// Options: env, config, sts, sso, instance, process.
	// Default: checks all sources in AWS SDK order.
	S3Chain   string // e.g., "env;config" to check env vars then config files
	S3Profile string // AWS profile name to use (for "config" chain)

	// HTTPProxy routes DuckDB httpfs traffic through a forward HTTP proxy.
	// When set, DuckDB signs S3 requests for the real S3 hostname and sends them
	// through the proxy as plain HTTP (requires S3UseSSL=false). Used by the
	// local cache proxy DaemonSet for NVMe caching.
	HTTPProxy string

	// CheckpointInterval controls how often DuckLake CHECKPOINT runs.
	// CHECKPOINT performs full catalog maintenance: expire snapshots,
	// merge adjacent files, rewrite data files, and clean up orphaned files.
	// Set to 0 to disable. Default: 24h.
	CheckpointInterval time.Duration

	// DataInliningRowLimit controls the maximum number of rows to inline
	// in DuckLake metadata instead of writing to Parquet files.
	// Default: 0 (disabled). Set to a positive value to enable inlining.
	DataInliningRowLimit *int

	// Migrate is set by the control plane after running the migration check.
	// When true, AttachDuckLake uses AUTOMATIC_MIGRATION TRUE without
	// re-running the version check. This avoids redundant backups and
	// long-running checks in worker processes.
	Migrate bool `json:"migrate,omitempty" yaml:"-"`

	// SpecVersion is the target DuckLake spec version for this connection.
	// When empty, the worker uses its own built-in default.
	SpecVersion string `json:"spec_version,omitempty" yaml:"-"`

	// ViaPgBouncer is set by the control plane when the DuckLake metadata
	// connection is routed through a network-level pooler (e.g. PgBouncer)
	// rather than direct to Postgres. When true, the worker disables the
	// postgres_scanner in-process pool via `SET GLOBAL pg_pool_max_connections = 0`.
	// See duckdb/ducklake#1031: behind a network pooler, client-side pooling
	// is redundant and prevents the pooler from reclaiming idle connections.
	ViaPgBouncer bool `json:"via_pgbouncer,omitempty" yaml:"-"`
}

Config configures DuckLake catalog attachment.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL