lakeorm

package module
v0.0.0-...-efe30ab Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2026 License: MIT Imports: 34 Imported by: 0

README

lake-orm

lake-orm is a batteries-included ORM for building on the lakehouse in Go, compatible with Databricks, Delta, and Iceberg.

A Go ORM for Iceberg and Delta Lake tables over Spark Connect. CQRS-shaped, untyped at the core, typed at the edge. Typed structs in. Tagged columns out. A real Iceberg or Delta table on real object storage. No Python sidecars, no hand-rolled gRPC, no bronze landing zone filling up with half-parsed garbage.

Philosophy

The medallion architecture is not an architecture.

The bronze layer you've been told is "best practice" exists for two reasons, and neither of them is data quality. Spark is pull-based — until Spark Connect landed in April 2023, external applications couldn't push data into a running cluster, so everything had to land in S3 first for Spark to pull from. That's the first reason bronze exists. The second is Delta's optimistic concurrency control: concurrent MERGE operations conflict at the file level, so teams batch incoming data into one append-only table and merge in bulk to minimise contention. Both are engine limitations, not principles of OLAP.

Software engineers have validated inputs at the application boundary for decades. A Go struct, a protobuf message, a pydantic model — the type system is the contract, and the contract is enforced before anything touches persistent storage. Whether the storage is Postgres or parquet on S3 is orthogonal to where validation should live. Spark Connect — gRPC over HTTP/2, language-neutral — lets a stateful Go application dial directly into a cluster, construct a DataFrame from already-validated inputs, and write straight to silver. No bronze. No autoloader. No async pipeline that silently fails at 7pm. That's what lake-orm is: a Go ORM that pushes validation upstream and writes validated, tagged structs straight through Spark Connect into Iceberg or Delta.

Read the long form at callumdempseyleach.tech/writing/medallion-architecture. The short version: bronze is a workaround for Spark's IO model dressed up in a marketing department's best clothes. lake-orm treats it as such.

Features

  • Go structs as the schema contract. Tag a field lake:"id,pk" and the column, primary-key, merge-key, partition, and nullability all flow from the struct. lake:"...", lakeorm:"...", and spark:"..." parse equivalently — pick the spelling that reads best (they coexist the way gorm:"..." and json:"..." do on a Go struct).

  • Validation at the application boundary. lakeorm.Validate(records) runs every validate=... check before any I/O. HTTP-handler-shaped: fail fast, fail loud.

  • Two write paths, one API. Small batches go gRPC-direct; large ones stream through the fast path — parquet to object storage, then one Spark INSERT ... SELECT FROM parquet.<staging> so payloads never traverse your Go process.

  • Typed reads, three forms. Query[T] buffers into []T, QueryStream[T] yields one row at a time with backpressure, QueryFirst[T] returns *T. SQL is explicit — there is no chainable query builder.

  • CQRS for reads that involve joins. Drop to the raw DataFrame, write a result-shape struct, materialise with CollectAs[T] / StreamAs[T] / FirstAs[T]. One struct per projection.

  • Iceberg and Delta dialects. Pluggable — new dialects implement the same Dialect interface.

  • Three drivers. Write once, run against any of:

    • driver/spark — generic Spark Connect (spark.Remote(uri)). Self-hosted, EMR, Glue, lake-k8s.
    • driver/databricksconnect — Databricks Connect (databricksconnect.Driver(auth)). OAuth M2M + PAT, automatic token refresh, cluster-ID header. Uses the Spark Connect wire protocol against a Databricks workspace.
    • driver/databricksDatabricks native (databricks.Driver(*sql.DB)). Bring-your-own-connection via databricks-sql-go. You configure OAuth, warehouse selection, CloudFetch, session params, connection-pool lifecycle however your environment requires; lake-orm does the struct ↔ SQL translation through the same lake:"..." tags. See examples/databricks.

    All three implement the same lakeorm.Driver interface. The ORM surface is identical across them.

  • S3, GCS, file, memory backends. Pluggable — Backend is a public interface.

  • Django-style migration authoring. MigrateGenerate replays the most-recent .sql file's State-JSON header to reconstruct prior state, diffs against your struct, emits a goose-format file with -- DESTRUCTIVE: <reason> comments on risky ops. An atlas.sum manifest catches post-generation edits.

  • UUIDv7 ingest_id threading. Tag a string field auto=ingestID and every batch insert populates it with a time-sortable UUID used for the staging prefix + cleanup janitor.

  • Schema fingerprinting. lakeorm.SchemaFingerprint(Type) returns a sha256 hash of the parsed schema — drift detection at startup.

  • No telemetry. The library makes network calls only to endpoints you configured. No opt-in, no opt-out, no "anonymous aggregate usage".

Install

go get github.com/datalake-go/lake-orm

Quick start

import (
    "github.com/datalake-go/lake-orm"
    "github.com/datalake-go/lake-orm/backend"
    "github.com/datalake-go/lake-orm/dialect/iceberg"
    "github.com/datalake-go/lake-orm/driver/spark"
)

type User struct {
    ID    string `lake:"id,pk"`
    Email string `lake:"email,mergeKey,validate=email,required"`
}

store, _ := backend.S3("s3://bucket/lake")
db, _   := lakeorm.Open(spark.Remote("sc://host:15002"), iceberg.Dialect(), store)
defer db.Close()

_ = db.Migrate(ctx, &User{})
_ = db.Insert(ctx, []*User{{ID: "u1", Email: "alice@example.com"}})

users, _ := lakeorm.Query[User](ctx, db, "SELECT * FROM users WHERE id = ?", "u1")

For Databricks, swap the driver:

import "github.com/datalake-go/lake-orm/driver/databricksconnect"

drv := databricksconnect.Driver(databricksconnect.OAuthM2M{
    WorkspaceURL: "https://acme.cloud.databricks.com",
    ClientID:     os.Getenv("DATABRICKS_CLIENT_ID"),
    ClientSecret: os.Getenv("DATABRICKS_CLIENT_SECRET"),
    ClusterID:    "0123-456789-abcdef01",
})
db, _ := lakeorm.Open(drv, iceberg.Dialect(), store)

The rest of the code is identical — Dialect, Backend, Query[T], Migrate, everything. "Databricks Connect" is the specific Spark Connect client shape Databricks workspaces expose; it's distinct from the Databricks Workspace SDK (cluster / job administration), and the ORM names the driver accordingly.

For the composed runtime that wires this together with Spark + migrations + an optional dashboard, see lakehouse.

Examples

Under examples/:

  • basic — the hero path: migrate, validate, insert, query, DataFrame escape hatch.
  • bulk — the fast-path write: 10k rows inserted via ViaObjectStorage() in one call.
  • streamQueryStream[T] with constant memory and early-break.
  • joins — CQRS: write-side entities, read-side result struct, CollectAs[T] for the join.
  • typed-helpersQuery[T] / QueryStream[T] / QueryFirst[T] / CollectAs[T] side by side.
  • delta — one-line dialect swap from Iceberg to Delta with deletion-vector + Z-order options.
  • lake-tag — the canonical lake:"..." tag with lakeorm:"..." / spark:"..." as accepted synonyms.
  • validation — built-in validators, custom validator registration, typed ValidationError.
  • ingest-id — UUIDv7 threading for batch reconciliation and staging-prefix cleanup.
  • migrations — Django-style offline migration authoring with MigrateGenerate + atlas.sum.
  • databricksconnect — Databricks Connect (OAuth M2M against a workspace cluster).
  • databricks — Databricks native (BYO *sql.DB via databricks-sql-go).

Longer-form docs: MIGRATIONS.md for the full migration workflow, TYPING.md for the typed/untyped boundary.

The tag trio

Three tag names are accepted and parse equivalently: lake:"...", lakeorm:"...", spark:"...". Think of them the way a Go struct already carries gorm:"..." and json:"..." side by side — one tag per reader, no shared contract required.

  • lake:"..." — canonical. New projects should default to this.
  • lakeorm:"..." — written-out synonym. Useful when a team already has single-word tags (db, json) and wants the library name to be unambiguous at a glance.
  • spark:"..." — historical driver-level tag. Retained because the spark-connect-go fork also reads it; leaving it in place lets the same struct be handed directly to a typed sparkconnect.Collect[T] call without a second set of tags.

Mixing tag names across fields of the same struct is fine. Mixing them on one field is a parse error (ErrInvalidTag) — the ambiguity of "which tag is the source of truth?" is caught at init.

Architecture

Intuition

Lakehouse tables are analytical, columnar, and append-heavy. ORMs designed for OLTP — one struct per entity, joins flattened into the struct graph — don't translate. Joins produce row shapes nobody declared up-front, and pretending otherwise makes queries return wrong types silently. A lakehouse ORM has to treat writes and reads as different shapes on purpose.

Approach

CQRS by structure. Write-side: one lake:"..."-tagged struct per table; the struct is the source of truth for the persisted schema. Read-side: one struct per projection, also tagged, used only to shape rows coming back. Joins go through raw DataFrame + a result-shape struct + CollectAs[T]. Typing is opt-in at the materialization edge — DataFrame stays untyped at the core; Query[T] / QueryStream[T] / QueryFirst[T] are thin ergonomic wrappers, not chainable builders.

Three orthogonal concerns compose at Open:

  • Driver — transport (Spark Connect)
  • Dialect — data dialect (Iceberg, Delta: DDL grammar, MERGE, partitioning, schema-evolution rules)
  • Backend — where bytes live (S3, GCS, file, memory)
Implementation
Go structs (lake:"..." tags)
    │
    ▼
lakeorm.Client ──► Dialect.PlanInsert ──► Driver.Execute ──► Spark Connect
    │  (Iceberg / Delta DDL)                                       │
    │                                                              ▼
    ├──► lakeorm.Query[T] / QueryStream[T] / QueryFirst[T]    Iceberg / Delta
    │    (typed at the materialisation edge)                       │
    │                                                              ▼
    └──► db.DataFrame(ctx, sql) → CollectAs[T] / StreamAs[T]   Backend (S3 / GCS)
         (escape hatch for joins + aggregates)

The fast path (large-batch writes) is the most operationally load-bearing piece and is documented in TECH_SPEC.md. Parquet parts stream through a PartitionWriter onto object storage, then one INSERT INTO target SELECT * FROM parquet.<staging> lands them — the Go process orchestrates, never relays the bytes.

Documentation

Overview

Package dorm is the Go ORM for data lakes. Typed structs are the schema contract; validation happens at the application boundary; writes go directly to the target Iceberg or Delta table via Spark Connect — no bronze landing zone by default.

Entry points:

lakeorm.Open(driver, dialect, backend, opts...)
lakeorm.Query[T](db)             // typed query builder
lakeorm.Validate(records)        // boundary-validate before I/O

Composition is always three pieces: a Driver (transport — how we connect and execute), a Dialect (the data-dialect — DDL, DML, capabilities, semantics; Iceberg vs Delta), and a Backend (where the bytes live). Each is a stable public interface; concrete implementations live under driver/, dialect/, and backend/.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNotImplemented    = errors.New("lakeorm: not implemented")
	ErrAlreadyCommitted  = errors.New("lakeorm: finalizer already committed")
	ErrNoRows            = errors.New("lakeorm: no rows")
	ErrSessionPoolClosed = errors.New("lakeorm: session pool closed")
	ErrInvalidTag        = errors.New("lakeorm: invalid struct tag")
	ErrUnknownDriver     = errors.New("lakeorm: unknown driver")
	ErrDriverMismatch    = errors.New("lakeorm: driver mismatch")
)

Sentinel errors used across the package.

Functions

func CollectAs

func CollectAs[T any](ctx context.Context, df DataFrame) ([]T, error)

CollectAs materialises every row of df into []T, hiding the type-assertion to the driver's native DataFrame. At v0 the only supported driver is Spark Connect, so df must unwrap to a sparksql.DataFrame — callers get ErrDriverMismatch otherwise.

Equivalent to:

sparkDF, ok := df.DriverType().(sparksql.DataFrame)
if !ok { ... }
rows, err := sparksql.Collect[T](ctx, sparkDF)

but compressed to a single call site. Use this when the result shape is known at compile time — the `spark:"..."` tags on T bind result columns to fields.

func FirstAs

func FirstAs[T any](ctx context.Context, df DataFrame) (*T, error)

FirstAs returns the first row of df decoded as T, or ErrNoRows if the DataFrame produced no rows. Symmetric with CollectAs / StreamAs — edge-typed materialisation, no chainable builders.

func IsClusterNotReady

func IsClusterNotReady(err error) bool

IsClusterNotReady is a convenience for errors.As callers.

func Query

func Query[T any](ctx context.Context, db Client, sql string, args ...any) ([]T, error)

Query runs a SQL statement against db and materialises every row into []T. One-shot ergonomic wrapper over db.DataFrame + CollectAs[T] — equivalent to:

df, err := db.DataFrame(ctx, sql, args...)
if err != nil { return nil, err }
return CollectAs[T](ctx, df)

but compressed to a single call. Use when the result shape is known at compile time — the `spark:"..."` tags on T bind result columns to fields.

Query is a top-level convenience, not a typed-query-builder abstraction. Filtering, ordering, projection, and joins belong in the SQL string, not in chainable Where/OrderBy/Select methods. Typed DataFrame transformations are deliberately absent — see TYPING.md for the design contract.

func QueryFirst

func QueryFirst[T any](ctx context.Context, db Client, sql string, args ...any) (*T, error)

QueryFirst runs a SQL statement and returns the first matching row decoded as T. Returns ErrNoRows if the result is empty. Ergonomic wrapper over db.DataFrame + FirstAs[T].

func QueryStream

func QueryStream[T any](ctx context.Context, db Client, sql string, args ...any) iter.Seq2[T, error]

QueryStream runs a SQL statement and yields T values one at a time with constant memory. Rangeable via Go 1.23's iter.Seq2:

for row, err := range lakeorm.QueryStream[User](ctx, db,
    "SELECT * FROM users WHERE country = ?", "UK") {
    if err != nil { break }
    // use row
}

Schema mismatch (a field in T that the projection doesn't contain) surfaces through the iterator's error channel on the first row.

func RegisterValidator

func RegisterValidator(name string, fn ValidatorFunc)

RegisterValidator installs a custom validator callable via `spark:"...,validate=name"`. Safe to call from init().

func SchemaFingerprint

func SchemaFingerprint(v any) (string, error)

SchemaFingerprint returns a stable SHA-256 over the expected LakeSchema for v (or reflect.Type(v) if v is a reflect.Type). The fingerprint is:

  • deterministic across process invocations
  • insensitive to field declaration order within a struct (so struct refactors that only reorder fields don't invalidate previously-applied migrations)
  • sensitive to column names, SQL-level types (reflect.Kind plus the time.Time special case), nullability, pk/mergeKey membership, and the table name

Client.AssertSchema hashes the compiled expectation and compares against the catalog's runtime fingerprint; a mismatch surfaces as a startup error rather than a silent drift.

func StreamAs

func StreamAs[T any](ctx context.Context, df DataFrame) iter.Seq2[T, error]

StreamAs yields decoded T values one at a time. Constant memory regardless of result size. Iteration uses Go 1.23's iter.Seq2 so callers can range directly:

for row, err := range lakeorm.StreamAs[User](ctx, df) { ... }

Schema binding happens on the first row; a mismatch surfaces through the second return value on the first iteration.

func Table

func Table(model any, name string)

Table overrides the derived table name for a Go type. Call once at init, before any Insert / Query / Migrate.

func Validate

func Validate(records any) error

Validate runs all registered validators on a record or slice of records. The public entry point — callable from HTTP handlers to fail fast before any I/O happens. Insert calls this internally, so callers who want the fast-fail behaviour at the API boundary call it directly first.

func Verify

func Verify(_ context.Context, _ Client) error

Verify runs an end-to-end reachability probe against an opened Client. Writes a small probe parquet via the Backend, reads it back via the Driver, compares. Returns nil if all three legs (client → backend, driver → backend, URI agreement) succeed.

v0 stub — implementation lands in a follow-up commit. The signature is stable so callers can wire it into service startup today.

Types

type AutoBehavior

type AutoBehavior int

AutoBehavior describes auto-populated column content. Client.Insert visits every field with AutoBehavior != AutoNone before validation and writes the canonical value for the behaviour.

const (
	AutoNone AutoBehavior = iota
	AutoCreateTime
	AutoUpdateTime
	// AutoIngestID marks a string field to receive the UUIDv7
	// ingest_id of the current Insert operation. Useful for
	// correlating rows in the target table with the staging prefix
	// they were uploaded through. Only string-typed fields are
	// accepted; non-string fields produce ErrInvalidTag at first
	// use. See INGEST_ID.md for the full flow.
	AutoIngestID
)

type Backend

type Backend interface {
	Name() string

	// RootURI returns the URI that the Driver will interpolate into
	// its SQL. Client and Driver must resolve this string to the same
	// physical storage; endpoint/credential differences are handled
	// per-actor, not in Backend.
	RootURI() string

	TableLocation(tableName string) types.Location
	StagingPrefix(ingestID string) string
	// StagingLocation returns the absolute URI that Spark should read
	// from (e.g. "s3a://bucket/lake/staging/<id>"). Distinct from
	// StagingPrefix — Spark's Hadoop-AWS integration requires s3a://
	// even though the Backend's own SDK calls use s3://. The scheme
	// translation happens here, not in the Dialect.
	StagingLocation(ingestID string) types.Location

	Writer(ctx context.Context, key string) (io.WriteCloser, error)
	Reader(ctx context.Context, key string) (io.ReadCloser, error)
	Delete(ctx context.Context, key string) error
	List(ctx context.Context, prefix string) ([]string, error)

	// CleanupStaging removes every object under prefix. Called by
	// Finalizer.Abort and by the staging-TTL janitor.
	CleanupStaging(ctx context.Context, prefix string) error
}

Backend is where the bytes live. Each concrete backend owns its SDK directly (aws-sdk-go-v2 for S3, cloud.google.com/go/storage for GCS, stdlib for File / Memory) — no generic abstraction layer.

type CleanupReport

type CleanupReport struct {
	Deleted []string
	Failed  []string
	Scanned int
}

CleanupReport summarises CleanupStaging's pass. Deleted carries prefix URIs that were removed; Failed carries prefixes that matched the TTL cutoff but whose delete call errored — the operator can retry selectively.

type Client

type Client interface {
	// Core CRUD. records is *T, []*T, or []T where T has `spark` tags.
	// Validation runs before any I/O.
	Insert(ctx context.Context, records any, opts ...InsertOption) error
	InsertRaw(ctx context.Context, records any, opts ...InsertOption) RawInsertion
	Update(ctx context.Context, records any, opts ...UpdateOption) error
	Upsert(ctx context.Context, records any, opts ...UpsertOption) error
	Delete(ctx context.Context, records any, opts ...DeleteOption) error

	// Query is the dynamic (non-generic) entry point. Prefer the
	// top-level lakeorm.Query[T] / QueryStream[T] / QueryFirst[T]
	// helpers when T is known at compile time.
	Query(ctx context.Context) QueryBuilder

	// Escape hatches to raw SQL / DataFrame / Spark.
	Exec(ctx context.Context, sql string, args ...any) (ExecResult, error)
	DataFrame(ctx context.Context, sql string, args ...any) (DataFrame, error)
	Session(ctx context.Context) (*PooledSession, error)

	// Migrate is the bootstrap path — idempotent CREATE TABLE IF NOT
	// EXISTS derived from struct tags. Sufficient for dev + fresh
	// tables; schema evolution on existing tables goes through
	// lake-goose (authoring via MigrateGenerate, execution via the
	// goose-spark-ish CLI or goose's library API against the
	// database/sql driver in datalake-go/spark-connect-go).
	Migrate(ctx context.Context, models ...any) error
	Maintain() Maintenance
	Ping(ctx context.Context) error
	Close() error

	// MetricsRegistry returns the Prometheus registry lakeorm writes
	// runtime metrics into. Returns nil at v0 as a placeholder for
	// v1+ integration.
	MetricsRegistry() *prometheus.Registry

	// MigrateGenerate writes iceberg/delta-dialect .sql files for any
	// pending struct diffs into dir, in goose's migration-file format.
	// Destructive operations (DROP COLUMN, RENAME COLUMN, type
	// narrowings, NOT-NULL tightenings) land with a `-- DESTRUCTIVE:
	// <reason>` informational comment so the reviewer notices them in
	// the PR diff.
	//
	// Execution is not this library's job — it belongs to lake-goose
	// running against the Spark Connect database/sql driver. An
	// atlas.sum manifest is emitted alongside so downstream tooling
	// can detect post-generation edits.
	//
	// Returns the list of generated file paths.
	MigrateGenerate(ctx context.Context, dir string, structs ...any) ([]string, error)

	// AssertSchema verifies the catalog's current schema matches
	// compiled code's expectation (SchemaFingerprint) for each
	// struct. Recommended at app startup after lakeorm.Verify.
	// v0 stub — the DESCRIBE TABLE catalog read lands in v1.
	AssertSchema(ctx context.Context, structs ...any) error
}

Client is the main entry point. All methods are safe to call concurrently; the internal session pool serializes per-session state as needed.

func Local

func Local() (Client, error)

Local is an intentional no-op in the root package — it exists only so godoc surfaces the "how do I run this locally?" question here. The real implementation lives in the `local` subpackage:

import lakeormlocal "github.com/datalake-go/lake-orm/local"
db, err := lakeormlocal.Open()

The split exists because the real Local() must import the spark, iceberg, and backend driver packages, each of which already imports this package for the Driver/Dialect/Backend interfaces — composing them at the top-level would introduce an import cycle. Making Local() a subpackage avoids the cycle and matches database/sql's pattern of sub-package factory imports.

func Open

func Open(driver Driver, dialect Dialect, backend Backend, opts ...ClientOption) (Client, error)

Open composes a Client from a Driver, a Dialect, a Backend, and options. All three positional arguments are required.

func OpenFromConfig

func OpenFromConfig(_ context.Context, _ string) (Client, error)

OpenFromConfig loads lakeorm.toml from disk and opens a Client. v0 stub — real TOML parsing lands alongside the `cmd/dorm stack` subcommand.

type ClientOption

type ClientOption func(*clientConfig)

ClientOption configures the Client at construction time. Functional options so the struct stays extensible without API breakage.

func WithCompression

func WithCompression(c Compression) ClientOption

WithCompression selects the compression codec for fast-path parquet parts. Default ZSTD — 2-3x smaller than Snappy on the shapes we care about (KSUIDs, repeated S3 paths), natively readable by all engines.

func WithDefaultCatalog

func WithDefaultCatalog(name string) ClientOption

WithDefaultCatalog / WithDefaultDatabase set fallbacks for unqualified table names.

func WithDefaultDatabase

func WithDefaultDatabase(name string) ClientOption

func WithDefaultIdempotencyTTL

func WithDefaultIdempotencyTTL(d time.Duration) ClientOption

WithDefaultIdempotencyTTL sets how long idempotency tokens remain valid in the dedup table. Default 24h.

func WithFastPathThreshold

func WithFastPathThreshold(bytes int) ClientOption

WithFastPathThreshold is the advisory byte crossover the Dialect uses to choose between gRPC and object-storage ingest. Default 128 MiB.

func WithLogger

func WithLogger(l zerolog.Logger) ClientOption

WithLogger sets the zerolog logger for the Client. Passed into every driver/format/backend component via the plumbing in lakeorm.Open.

func WithMaxInflightIngests

func WithMaxInflightIngests(n int) ClientOption

WithMaxInflightIngests caps the number of fast-path ingests that can be running concurrently. The bounded semaphore propagates backpressure into Write() — without it a bursty Go producer OOMs the process even when the partition writer "flushes correctly." Default 4.

func WithSessionPoolSize

func WithSessionPoolSize(n int) ClientOption

WithSessionPoolSize controls how many Spark Connect sessions the Client borrows in flight. Default 8 — large enough for moderate concurrency, small enough to bound cluster-side overhead. Increase for high-fanout services; decrease on tight clusters.

type ColumnInfo

type ColumnInfo struct {
	Name     string
	DataType string
	Nullable bool
}

type Compression

type Compression int

Compression selects the codec used by the fast-path partition writer.

const (
	CompressionZSTD Compression = iota
	CompressionSnappy
	CompressionGzip
	CompressionUncompressed
)

type ConflictStrategy

type ConflictStrategy int

ConflictStrategy for Insert.

const (
	ErrorOnConflict ConflictStrategy = iota
	IgnoreOnConflict
	UpdateOnConflict
)

type DataFrame

type DataFrame interface {
	Schema(ctx context.Context) ([]ColumnInfo, error)
	Collect(ctx context.Context) ([][]any, error)
	Count(ctx context.Context) (int64, error)
	Stream(ctx context.Context) iter.Seq2[Row, error]

	// DriverType returns the Driver's native DataFrame handle for
	// callers that need to drop to the underlying API. Kept as any
	// to avoid leaking driver-specific types into the public
	// signature; callers type-assert to the concrete type they
	// expect (e.g. sparksql.DataFrame). Prefer the lakeorm.CollectAs
	// / lakeorm.StreamAs helpers when materialising into a typed
	// result struct.
	DriverType() any
}

DataFrame is the Driver-agnostic handle to a remote DataFrame. At v0 it wraps the Spark Connect DataFrame; v1+ drivers may implement their own. The interface is deliberately minimal — anything fancier is available by unwrapping with DriverType().

type DeleteOption

type DeleteOption func(*deleteConfig)

type DeleteRequest

type DeleteRequest struct {
	Ctx     context.Context
	Schema  *LakeSchema
	Where   string
	Args    []any
	Backend Backend
}

type Dialect

type Dialect interface {
	Name() string

	CreateTableDDL(schema *LakeSchema, loc types.Location) (string, error)
	AlterTableDDL(schema *LakeSchema, existing *TableInfo) ([]string, error)

	PlanInsert(req WriteRequest) (ExecutionPlan, error)
	PlanUpsert(req UpsertRequest) (ExecutionPlan, error)
	PlanDelete(req DeleteRequest) (ExecutionPlan, error)
	PlanQuery(req QueryRequest) (ExecutionPlan, error)

	IndexStrategy(intent IndexIntent) IndexStrategy
	LayoutStrategy(intent LayoutIntent) LayoutStrategy

	Maintenance() Maintenance
}

Dialect describes the data-dialect opinion: DDL shape, DML shape, capabilities, and semantics for Iceberg vs Delta vs any future lakehouse dialect. "Data dialect" rather than "format" because it's not just on-disk layout — it covers CREATE TABLE clauses, MERGE semantics, table properties, partition grammar, and schema- evolution rules. Exactly one Dialect per Client at v0.

type Driver

type Driver interface {
	Name() string

	// Execute runs a plan and returns a Finalizer for two-phase commit.
	// Single-phase plans (e.g. small-batch direct ingest) return a
	// no-op Finalizer so callers never need to branch.
	Execute(ctx context.Context, plan ExecutionPlan) (Result, Finalizer, error)

	// ExecuteStreaming runs a read plan and returns a pull-based row
	// stream. No finalizer — reads don't stage.
	ExecuteStreaming(ctx context.Context, plan ExecutionPlan) (RowStream, error)

	// DataFrame is the escape hatch — raw SQL to a DataFrame that the
	// caller can then chain Spark operations on.
	DataFrame(ctx context.Context, sql string, args ...any) (DataFrame, error)

	// Exec runs a SQL statement that returns no rows (DDL, DML that
	// doesn't need the Dialect-planned path).
	Exec(ctx context.Context, sql string, args ...any) (ExecResult, error)

	Close() error
}

Driver is the connection + execution mechanism. At v0 both concrete Drivers are Spark Connect variants (spark.Remote, databricksconnect.Driver) — same protocol, different connection setup.

type ErrClientStaging

type ErrClientStaging struct {
	URI         string
	BackendName string
	Op          string // "stage-write" | "probe-write" | "cleanup"
	Cause       error
}

ErrClientStaging is returned when the Go client fails to write a staging object via Backend. Indicates a client-side reachability or credential problem.

func (*ErrClientStaging) Error

func (e *ErrClientStaging) Error() string

func (*ErrClientStaging) Unwrap

func (e *ErrClientStaging) Unwrap() error

type ErrClusterNotReady

type ErrClusterNotReady struct {
	State     string // e.g. "Pending", "PENDING"
	RequestID string
	Message   string
	Cause     error
}

ErrClusterNotReady indicates the backing Spark cluster is warming up and the operation should be retried. Databricks in particular returns [FailedPrecondition] errors with state Pending during cluster startup — without this typed error, callers get an opaque gRPC failure and no way to distinguish "cluster warming up, retry" from "cluster dead, give up."

func NewClusterNotReady

func NewClusterNotReady(err error) *ErrClusterNotReady

NewClusterNotReady parses a Databricks gRPC error for the [FailedPrecondition] + "state Pending" pattern and returns a typed error. Returns nil if the error doesn't match. The string-matching looks fragile, but it's the canonical detection pattern and has been stable across multiple Databricks runtime versions.

func (*ErrClusterNotReady) Error

func (e *ErrClusterNotReady) Error() string

func (*ErrClusterNotReady) IsRetryable

func (e *ErrClusterNotReady) IsRetryable() bool

IsRetryable marks this error as safe to retry after backoff.

func (*ErrClusterNotReady) Unwrap

func (e *ErrClusterNotReady) Unwrap() error

type ErrDriverRead

type ErrDriverRead struct {
	StagingURI string
	DriverName string
	Cause      error
}

ErrDriverRead is returned when the compute driver fails to read a staging prefix that the client wrote successfully. Indicates the driver cannot see (or cannot authenticate against) storage the client can reach — the split-view failure mode where client and driver credentials diverge silently.

func (*ErrDriverRead) Error

func (e *ErrDriverRead) Error() string

func (*ErrDriverRead) Unwrap

func (e *ErrDriverRead) Unwrap() error

type ErrURIMismatch

type ErrURIMismatch struct {
	ClientURI string
	DriverURI string
	Detail    string
}

ErrURIMismatch is returned when client and driver both succeed at their respective operations but resolve the same URI to different physical storage. Detected by lakeorm.Verify via probe comparison.

func (*ErrURIMismatch) Error

func (e *ErrURIMismatch) Error() string

type ExecResult

type ExecResult struct {
	RowsAffected int64
}

ExecResult mirrors database/sql.Result for the raw Exec escape hatch.

type ExecutionPlan

type ExecutionPlan struct {
	Kind    PlanKind
	SQL     string         // For KindSQL / KindStream / KindDDL
	Args    []any          // SQL-parameter bindings
	Target  string         // table name for writes
	Staging StagingRef     // populated for KindParquetIngest
	Rows    any            // typed slice for KindDirectIngest
	Schema  *LakeSchema    // referenced by the Driver for type-aware ingest
	Options map[string]any // Dialect-specific hints opaque to the Driver
}

ExecutionPlan is the opaque artifact a Dialect hands to a Driver. The Driver executes the plan without knowing the Dialect's name. Internally it is a tagged union of variants (DirectIngest, ParquetIngest, SQL, Stream); Dialect constructs them, Driver reads the Kind and dispatches.

The type is intentionally a struct (not an interface) so the wire shape can evolve without breaking v1+ drivers that consume plans they didn't construct.

type FieldError

type FieldError struct {
	Field   string
	Column  string
	Message string
	Value   any
}

FieldError is one per-field failure.

type Finalizer

type Finalizer interface {
	Commit(ctx context.Context) error
	Abort(ctx context.Context) error
}

Finalizer is the commit phase of a two-phase write. Modeled on database/sql.Tx: call Commit on success, Abort otherwise, safely defer Abort for the error path. Commit is idempotent.

type IndexIntent

type IndexIntent int

IndexIntent captures the `indexed` / `sortable` / `mergeKey` tag intent. Dialects read this and return a concrete IndexStrategy.

const (
	IntentNone IndexIntent = iota
	IntentIndexed
	IntentSortable
	IntentMergeKey
)

type IndexStrategy

type IndexStrategy string

IndexStrategy / LayoutStrategy are opaque to the core — they are stringified descriptors that Dialect implementations emit for their own DDL generation. Kept as strings rather than typed-per-Dialect enums to avoid a compatibility matrix in the core package.

type InsertOption

type InsertOption func(*insertConfig)

InsertOption configures a single Insert call.

func OnConflict

func OnConflict(strategy ConflictStrategy) InsertOption

OnConflict sets the behaviour for primary-key collisions.

func ViaGRPC

func ViaGRPC() InsertOption

ViaGRPC forces the direct gRPC ingest path regardless of batch size. Useful when latency matters more than throughput.

func ViaObjectStorage

func ViaObjectStorage() InsertOption

ViaObjectStorage forces the S3-Parquet fast path regardless of batch size. Useful for benchmarks or when you know the batch is large.

func WithIdempotencyKey

func WithIdempotencyKey(key string) InsertOption

WithIdempotencyKey sets the idempotency token. If omitted, a SortableID is generated automatically.

type LakeField

type LakeField struct {
	Name         string // Go field name
	Column       string // DB column name
	Index        []int  // reflect.FieldByIndex path
	Type         reflect.Type
	IsNullable   bool
	IsPointer    bool
	AutoBehavior AutoBehavior
	IsJSON       bool
	IsRequired   bool
	Intent       IndexIntent
	Layout       LayoutIntent
	Validators   []string // names resolved against the validator registry
	Ignored      bool
}

LakeField is the parsed metadata for one struct field.

type LakePartitionSpec

type LakePartitionSpec struct {
	FieldIndex int
	Strategy   PartitionStrategy
	Param      int // N for Bucket(N) / Truncate(N); zero otherwise
}

LakePartitionSpec names the field index and strategy of a partition.

type LakeSchema

type LakeSchema struct {
	GoType      reflect.Type
	TableName   string
	Fields      []LakeField
	PrimaryKeys []int
	MergeKeys   []int
	Partitions  []LakePartitionSpec
}

LakeSchema is the parsed form of a struct tagged with `spark:"..."`.

func ParseSchema

func ParseSchema(t reflect.Type) (*LakeSchema, error)

ParseSchema extracts LakeSchema from a Go type via reflection. Cached per-type; safe to call repeatedly.

func (*LakeSchema) ColumnNames

func (s *LakeSchema) ColumnNames() []string

ColumnNames returns the column names in declaration order, skipping ignored fields. Used by the query builder to emit projection-pushdown SELECTs instead of SELECT *.

type LayoutIntent

type LayoutIntent int

LayoutIntent is the coarse layout hint currently exposed — may grow later to cover clustering, data-skipping ranges, etc.

const (
	LayoutNone LayoutIntent = iota
	LayoutSortable
)

type LayoutStrategy

type LayoutStrategy string

IndexStrategy / LayoutStrategy are opaque to the core — they are stringified descriptors that Dialect implementations emit for their own DDL generation. Kept as strings rather than typed-per-Dialect enums to avoid a compatibility matrix in the core package.

type Maintenance

type Maintenance interface {
	Optimize(ctx context.Context, table string, opts MaintenanceOptions) error
	Vacuum(ctx context.Context, table string, opts VacuumOptions) error
	Stats(ctx context.Context, table string) (TableStats, error)
}

Maintenance is the Dialect-level physical-optimization surface. At v0 all implementations return ErrNotImplemented — the interface exists so the Client API doesn't shift when v1 fleshes each Dialect out.

type MaintenanceOptions

type MaintenanceOptions struct {
	Filter string
	ZOrder []string
}

MaintenanceOptions / VacuumOptions are the v0 stubs. Specific Dialects may carry richer option structs behind Dialect.Maintenance() as v1 lands.

type MergeAction

type MergeAction int

MergeAction is the Upsert conflict strategy.

const (
	MergeUpdate MergeAction = iota
	MergeIgnore
	MergeError
)

type MergeFuture

type MergeFuture interface {
	Wait(ctx context.Context) error
}

MergeFuture is returned by RawInsertion.AsyncThenMerge. Stub at v0.

type MergeOpts

type MergeOpts struct {
	Key         string
	TargetTable string
	Filter      string
}

MergeOpts is passed to RawInsertion.ThenMerge for the raw-then-merge escape hatch.

type OrderSpec

type OrderSpec struct {
	Column string
	Desc   bool
}

type ParquetSchema

type ParquetSchema struct {
	// contains filtered or unexported fields
}

ParquetSchema is the bridge between a LakeSchema (the user's dorm-tagged model) and a parquet schema (what the fast-path writer serializes against). The whole point: users tag their structs with `spark:"..."` once; they do NOT need to also write `parquet:"..."` tags. The translation lives here.

Internally it synthesizes an anonymous struct type at runtime whose parquet tags are derived from dorm tags, then hands that type to parquet.SchemaOf. At write time, user rows are projected field-by- field into the synthesized struct.

func BuildParquetSchema

func BuildParquetSchema(lake *LakeSchema) (*ParquetSchema, error)

BuildParquetSchema translates a LakeSchema into a parquet schema + row projector. The returned ParquetSchema is cheap to call; the reflection work happens once and is reused per Write call.

The fast-path writer (internal/parquet) wires this up automatically when Insert routes through object storage — users don't construct ParquetSchema directly.

func (*ParquetSchema) Convert

func (p *ParquetSchema) Convert(row any) any

Convert projects a user-struct row into the synthesized-struct row parquet-go writes. Accepts either T or *T; same semantics either way. Intended to be passed as the RowConverter to the internal partition writer.

func (*ParquetSchema) Schema

func (p *ParquetSchema) Schema() *pq.Schema

Schema returns the underlying parquet schema. Pass to parquet.NewWriter (or to the internal partition writer).

type PartitionStrategy

type PartitionStrategy int

PartitionStrategy is the tag-declared partition intent. Dialect implementations translate this into a physical strategy (Iceberg bucket, Delta ZORDER, etc.) via IndexStrategy().

const (
	PartitionNone PartitionStrategy = iota
	PartitionRaw
	PartitionBucket
	PartitionTruncate
)

type PlanKind

type PlanKind int

PlanKind identifies the plan variant. Values are stable — drivers branch on them.

const (
	KindSQL PlanKind = iota
	KindStream
	KindDDL
	KindDirectIngest
	KindParquetIngest
)

type PooledSession

type PooledSession struct {
	// contains filtered or unexported fields
}

PooledSession is the typed-session handle returned by Client.Session. v0: wraps driver.DataFrame/Exec; in v1 it carries the raw scsql.SparkSession through for users who need it directly.

func (*PooledSession) Exec

func (s *PooledSession) Exec(ctx context.Context, query string) (ExecResult, error)

func (*PooledSession) Sql

func (s *PooledSession) Sql(ctx context.Context, query string) (DataFrame, error)

func (*PooledSession) Table

func (s *PooledSession) Table(name string) (DataFrame, error)

type QueryBuilder

type QueryBuilder interface {
	From(table string) QueryBuilder
	Where(sql string, args ...any) QueryBuilder
	OrderBy(col string, desc bool) QueryBuilder
	Limit(n int) QueryBuilder
	Offset(n int) QueryBuilder
	Select(cols ...string) QueryBuilder

	Collect(ctx context.Context) ([][]any, error)
	DataFrame(ctx context.Context) (DataFrame, error)
}

QueryBuilder is the dynamic (non-generic) query entry point for cases where the result type isn't known at compile time. Prefer the top-level Query[T] generic when possible.

type QueryRequest

type QueryRequest struct {
	Ctx      context.Context
	Schema   *LakeSchema
	Table    string
	Columns  []string
	Where    string
	WhereArg []any
	OrderBy  []OrderSpec
	Limit    int
	Offset   int
}

QueryRequest is what the Client hands the Dialect on query execution.

type RawInsertion

type RawInsertion interface {
	ThenMerge(ctx context.Context, opts MergeOpts) error
	AsyncThenMerge(opts MergeOpts) MergeFuture
	Commit(ctx context.Context) error
}

RawInsertion is the opt-in raw-then-merge escape hatch for genuinely untrusted external inputs (research datasets, untyped third-party feeds, CSVs of unknown provenance). It is NOT the default — db.Insert writes straight to the target table. Bronze- style landing zones exist here as a conscious opt-in, never as an accidental default.

type Result

type Result struct {
	RowsAffected int64
}

Result is the outcome of a single ExecutionPlan that did not produce rows (writes, DDL). Reads return a RowStream instead.

type Row

type Row interface {
	Values() []any
	Columns() []string
}

Row is a Driver-agnostic row handle. Scanner consumes these to populate typed structs.

type RowStream

type RowStream = iter.Seq2[Row, error]

RowStream is the Driver's streaming primitive — one Row per iteration, constant memory, natural backpressure. It is an alias for iter.Seq2[Row, error] so it rangeable directly.

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner provides reflection-based row scanning into Go structs via the lake-orm tag trio (`lake`, `lakeorm`, `spark`). It is the Driver- agnostic counterpart of what every Driver's underlying protocol produces (Spark Connect Rows, DuckDB vectors, future Arrow Flight batches) — each Driver converts its native rows into lakeorm.Row and the Scanner takes over.

Two design decisions worth calling out:

  • sqlx/reflectx for field resolution (embedded structs, dot-notation, pointer-to-struct auto-init). Rolling our own would diverge from every other sqlx-using Go service on edge cases.
  • scannerTarget (defined below) for nullable custom types that implement sql.Scanner — SortableID, Location, etc. The wrapper tracks NULL separately from the underlying Scan so pointer fields can be set to nil rather than zero-valued.

sqlx/reflectx binds one tag key per mapper, so we carry three in priority order (lake > lakeorm > spark) and consult them in turn when resolving a column. Same precedence as tag.go's effectiveTag.

func NewScanner

func NewScanner() *Scanner

NewScanner creates a Scanner backed by sqlx's reflectx field mapper. One mapper per accepted tag key — lookups walk them in priority order.

func (*Scanner) ScanRow

func (s *Scanner) ScanRow(row Row, dest any, _ *LakeSchema) error

ScanRow scans a single Row produced by a Driver into dest (a *T). schema is used only for column-intent introspection; the actual mapping is by column name via the reflectx mapper.

func (*Scanner) ScanRows

func (s *Scanner) ScanRows(rows *sql.Rows, destSlicePtr any) error

ScanRows scans sql.Rows into a slice of struct pointers. Kept from the production port so existing database/sql-style callers have a path when they use Driver.Exec + database/sql behind the scenes. destSlicePtr must be *[]*T.

type StagingLister

type StagingLister interface {
	ListStagingPrefixes(ctx context.Context) ([]StagingPrefix, error)
}

StagingLister is the optional backend extension Client.CleanupStaging relies on. Backends that can enumerate their _staging/ namespace implement this; backends that can't leave it unimplemented and CleanupStaging returns ErrNotImplemented.

Kept as a separate interface (not folded into Backend) so the Backend contract stays narrow: the fast-path write path only needs StagingPrefix / StagingLocation / CleanupStaging by-URI, and that's the minimum any backend must implement.

type StagingPrefix

type StagingPrefix struct {
	URI      string
	IngestID string
}

StagingPrefix is one entry a StagingLister backend yields. URI is the absolute backend URI; IngestID is the final path segment, which CleanupStaging parses as a UUIDv7.

type StagingRef

type StagingRef struct {
	Backend  Backend
	Prefix   string
	PartKeys []string
	Location types.Location
}

StagingRef names the URI prefix and parts produced by a fast-path partition writer. The Driver reads this to emit the `INSERT ... SELECT FROM parquet.<prefix>/*.parquet` statement.

type TableInfo

type TableInfo struct {
	Name       string
	Location   types.Location
	Columns    []ColumnInfo
	Partitions []string
	SnapshotID string
}

type TableStats

type TableStats struct {
	NumRows       int64
	NumDataFiles  int64
	TotalBytes    int64
	SnapshotCount int
}

type UpdateOption

type UpdateOption func(*updateConfig)

UpdateOption / UpsertOption / DeleteOption are kept as stubs so the Client signature doesn't shift when v1 adds real tuning knobs.

type UpsertOption

type UpsertOption func(*upsertConfig)

type UpsertRequest

type UpsertRequest struct {
	Ctx     context.Context
	Schema  *LakeSchema
	Records any
	OnMatch MergeAction
	Backend Backend
}

UpsertRequest / DeleteRequest share the same shape at v0.

type VacuumOptions

type VacuumOptions struct {
	RetentionHours int
}

type ValidationError

type ValidationError struct {
	Errors []FieldError
}

ValidationError aggregates per-field failures so the caller sees every problem rather than just the first.

func (*ValidationError) Error

func (e *ValidationError) Error() string

func (*ValidationError) Field

func (e *ValidationError) Field(name string) *FieldError

Field returns the FieldError for a specific struct field, or nil.

type ValidatorFunc

type ValidatorFunc func(value any) error

ValidatorFunc is the shape of a user-registered custom validator. Validators receive the raw field value (already dereferenced from pointer) and must return nil or a human-readable error message.

type WritePath

type WritePath int

WritePath is the caller's optional override of the Dialect's routing.

const (
	WritePathAuto WritePath = iota
	WritePathGRPC
	WritePathObjectStorage
)

type WriteRequest

type WriteRequest struct {
	Ctx    context.Context
	Schema *LakeSchema
	// IngestID is a UUIDv7 correlation ID generated by Client.Insert
	// for every operation. Threads through staging prefix, logs, and
	// the Finalizer's cleanup target. Required for the fast path;
	// Dialect implementations use this (not Idempotency) to compute
	// staging locations.
	IngestID       string
	Records        any
	RecordCount    int
	ApproxRowBytes int
	// Idempotency is an optional caller-supplied deduplication token.
	// Distinct from IngestID: IngestID is internal operational
	// correlation; Idempotency is the contract with the caller for
	// retry safety. Empty when the caller didn't supply one.
	Idempotency   string
	Backend       Backend
	FastPathBytes int            // advisory crossover threshold
	ForcePath     WritePath      // None / ViaGRPC / ViaObjectStorage
	Options       map[string]any // dialect-specific overrides
}

WriteRequest is what the Client hands the Dialect on Insert.

Directories

Path Synopsis
Package backend exposes the public constructors for every Backend implementation in a single, ergonomic surface.
Package backend exposes the public constructors for every Backend implementation in a single, ergonomic surface.
file
Package file provides a local-disk Backend.
Package file provides a local-disk Backend.
gcs
Package gcs provides a GCS Backend.
Package gcs provides a GCS Backend.
memory
Package memory provides an in-memory Backend, useful for tests and for the fast-path regression harness.
Package memory provides an in-memory Backend, useful for tests and for the fast-path regression harness.
s3
Package s3 provides an S3-compatible Backend built directly on aws-sdk-go-v2/service/s3.
Package s3 provides an S3-compatible Backend built directly on aws-sdk-go-v2/service/s3.
cmd
lakeorm command
Command lakeorm is the lakeorm CLI.
Command lakeorm is the lakeorm CLI.
dialect
delta
Package delta is the Delta Lake Dialect.
Package delta is the Delta Lake Dialect.
iceberg
Package iceberg implements lakeorm's default Dialect — Apache Iceberg tables written via Spark Connect.
Package iceberg implements lakeorm's default Dialect — Apache Iceberg tables written via Spark Connect.
driver
databricks
Package databricks is lake-orm's native Databricks driver.
Package databricks is lake-orm's native Databricks driver.
databricksconnect
Package databricksconnect is lake-orm's Databricks Connect driver.
Package databricksconnect is lake-orm's Databricks Connect driver.
spark
Package spark provides lakeorm's generic Spark Connect driver.
Package spark provides lakeorm's generic Spark Connect driver.
examples
basic command
Example: the hero path for lakeorm.
Example: the hero path for lakeorm.
bulk command
Example: the fast-path write — parquet through object storage with a Spark `INSERT ...
Example: the fast-path write — parquet through object storage with a Spark `INSERT ...
databricks command
Example: lake-orm against a Databricks SQL warehouse using the native `databricks-sql-go` driver.
Example: lake-orm against a Databricks SQL warehouse using the native `databricks-sql-go` driver.
databricksconnect command
Example: lake-orm against a Databricks cluster via Databricks Connect (Spark Connect over a Databricks workspace).
Example: lake-orm against a Databricks cluster via Databricks Connect (Spark Connect over a Databricks workspace).
delta command
Example: writing against a Delta Lake table (instead of Iceberg).
Example: writing against a Delta Lake table (instead of Iceberg).
ingest-id command
Example: UUIDv7 ingest_id threading via `auto=ingestID`.
Example: UUIDv7 ingest_id threading via `auto=ingestID`.
joins command
Example: reading joins and aggregates with CQRS-style output types.
Example: reading joins and aggregates with CQRS-style output types.
lake-tag command
Example: the canonical `lake:"..."` tag.
Example: the canonical `lake:"..."` tag.
migrations command
Example: Django-style offline migration authoring.
Example: Django-style offline migration authoring.
stream command
Example: streaming a large result back with constant memory.
Example: streaming a large result back with constant memory.
typed-helpers command
Example: the typed helper API — Query[T] / QueryStream[T] / QueryFirst[T] and their DataFrame-shaped siblings CollectAs[T] / StreamAs[T] / FirstAs[T].
Example: the typed helper API — Query[T] / QueryStream[T] / QueryFirst[T] and their DataFrame-shaped siblings CollectAs[T] / StreamAs[T] / FirstAs[T].
validation command
Example: applying validation at the application boundary.
Example: applying validation at the application boundary.
internal
migrations/migrate
Package migrate is lakeorm's side of the schema-evolution story: it computes struct-diffs against a prior target-state (replayed from the most recent migration file's State-JSON header — the Django MigrationLoader pattern), classifies each change as destructive or not per the target dialect's rule table, and emits goose-format .sql files.
Package migrate is lakeorm's side of the schema-evolution story: it computes struct-diffs against a prior target-state (replayed from the most recent migration file's State-JSON header — the Django MigrationLoader pattern), classifies each change as destructive or not per the target dialect's rule table, and emits goose-format .sql files.
parquet
Package parquet is lakeorm's parquet-writing layer.
Package parquet is lakeorm's parquet-writing layer.
sqlbuild
Package sqlbuild renders the generic SELECT shape shared by every Dialect's read path.
Package sqlbuild renders the generic SELECT shape shared by every Dialect's read path.
Package local is the lake-k8s docker-compose on-ramp.
Package local is the lake-k8s docker-compose on-ramp.
Package teste2e is the black-box end-to-end suite for lakeorm.
Package teste2e is the black-box end-to-end suite for lakeorm.
Package testutils is the shared test harness for lakeorm.
Package testutils is the shared test harness for lakeorm.
Package types defines the small value types lakeorm exposes on its public API: SortableID (KSUID-backed), ObjectURI/Location (typed storage addresses), and SparkTableName.
Package types defines the small value types lakeorm exposes on its public API: SortableID (KSUID-backed), ObjectURI/Location (typed storage addresses), and SparkTableName.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL