cosma

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2026 License: Apache-2.0

README

Cosma

Cosma is an Arrow-native dataframe engine for Go.

Fast columnar data workflows with a public expression API, lazy+streaming execution, parallel kernels, ADBC database connectivity, and Gonum integration.

Status

v0.1.0 — APIs are evolving. Core packages (dataframe, expr, scan, gonum) are stable enough for experimentation; breaking changes are possible before v1.0.

Install

go get github.com/karthedew/cosma@latest

Quickstart

package main

import (
    "fmt"

    "github.com/karthedew/cosma/dataframe"
)

func main() {
    ids, _ := dataframe.NewSeries("ids", []int64{1, 2, 3})
    names, _ := dataframe.NewSeries("names", []string{"alpha", "beta", "gamma"})

    df, _ := dataframe.New([]*dataframe.Series{ids, names})
    fmt.Println(df.String())
}

Filter & Expressions

Expressions are built with the public cosma/expr package and evaluated lazily or eagerly. The expression tree is fully public — inspectable, serializable, and safe for custom optimizer passes.

import (
    "context"
    "github.com/karthedew/cosma/dataframe"
    "github.com/karthedew/cosma/expr"
)

result, err := df.Filter(context.Background(),
    expr.Col("age").Gt(expr.Lit(int64(30))).And(expr.Col("active").Eq(expr.Lit(true))),
)

Supported: comparisons, arithmetic, boolean (Kleene semantics), IsNull/IsNotNull, Cast, aggregates (Sum, Count, Mean, Min, Max). Full Arrow type coverage including string, temporal, and Decimal128.

IO (CSV and Parquet)

df, err := dataframe.ReadCSV("data.csv")
err = dataframe.WriteParquet(df, "data.parquet")

Streaming Scan

import "github.com/karthedew/cosma/scan"

reader, err := scan.ScanCSV("data.csv", scan.WithCSVChunkSize(2048))
defer reader.Release()

for reader.Next() {
    rec := reader.Record()
    _ = rec
}
if err := reader.Err(); err != nil {
    panic(err)
}

Lazy Execution

Lazy() builds a logical plan. Collect executes it with predicate/projection/limit pushdown into scan nodes.

import (
    "context"
    "github.com/karthedew/cosma/dataframe"
    "github.com/karthedew/cosma/expr"
)

result, err := df.Lazy().
    Filter(expr.Col("age").Gt(expr.Lit(int64(30)))).
    Select("name", "age").
    Limit(100).
    Collect(context.Background())

For larger-than-memory datasets, use CollectStream to get a per-batch iterator instead of a materialized *DataFrame:

reader, err := df.Lazy().Filter(...).CollectStream(ctx)
defer reader.Release()
for reader.Next() { ... }

Sort, GroupBy, Join

// Multi-key sort
sorted, err := df.Sort(ctx,
    expr.By("age").Desc(),
    expr.By("name").WithNullsFirst(),
)

// GroupBy aggregation
grouped, err := df.GroupBy("department").Agg(ctx,
    expr.Col("salary").Sum().As("total_salary"),
    expr.Col("id").Count().As("headcount"),
)

// Hash join (inner or left)
joined, err := df.Join(ctx, other, "user_id", "inner")

Parallel Execution

Filter and groupby operations automatically fan out across CPU cores. The engine uses GOMAXPROCS by default; parallelism can be tuned internally via compute.SetParallelism. Demonstrated 3.1× filter speedup at 8 workers on a 1M-row benchmark.

ADBC Connectivity

Any ADBC-compatible database (DuckDB, PostgreSQL, FlightSQL) feeds into the same internal/ingest seam as CSV and Parquet — the result is an array.RecordReader that plugs directly into dataframe.FromRecordBatches. See examples/adbc/ for a complete end-to-end example.

Gonum Integration

import (
    cosmagonum "github.com/karthedew/cosma/gonum"
    "gonum.org/v1/gonum/mat"
)

// Export numeric columns to a Gonum matrix
m, err := cosmagonum.ToMatrix(df, cosmagonum.MatrixOptions{
    Cols:       []string{"x", "y", "z"},
    NullPolicy: cosmagonum.NullDrop,
})

// Export a single Series to a vector
v, err := cosmagonum.ToVector(series, cosmagonum.VectorOptions{
    NullPolicy: cosmagonum.NullFill,
    FillValue:  0,
})

NullPolicy must be chosen explicitly — NullError (default), NullDrop, or NullFill. No silent behavior.

Development

go test -race ./...   # full suite
go test -bench=. ./dataframe/   # benchmarks
golangci-lint run

Docs

License

Apache-2.0. See LICENSE.

Directories

Path Synopsis
cmd
cosma-csv command
cosma-dev command
cosma-self command
examples
adbc command
Command adbc shows how a database, exposed through ADBC, feeds into Cosma as just another scan source.
Command adbc shows how a database, exposed through ADBC, feeds into Cosma as just another scan source.
basic command
Package gonum exports Cosma DataFrames and Series to Gonum matrices and vectors.
Package gonum exports Cosma DataFrames and Series to Gonum matrices and vectors.
internal
compute
Package compute holds the chunk-level Arrow kernels that back Cosma's eager DataFrame operations: expression evaluation, filtering, and (in time) arithmetic, aggregation, and hashing.
Package compute holds the chunk-level Arrow kernels that back Cosma's eager DataFrame operations: expression evaluation, filtering, and (in time) arithmetic, aggregation, and hashing.
expr
Package expr holds engine-internal binding and coercion helpers that operate on the public cosma/expr tree.
Package expr holds engine-internal binding and coercion helpers that operate on the public cosma/expr tree.
ingest
Package ingest is Cosma's single source-to-record-batch ingestion seam.
Package ingest is Cosma's single source-to-record-batch ingestion seam.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL