Documentation
¶
Overview ¶
Package gorilla provides a high-performance, in-memory DataFrame library for Go.
Gorilla is built on Apache Arrow and provides fast, efficient data manipulation with lazy evaluation and automatic parallelization. It offers a clear, intuitive API for filtering, selecting, transforming, and analyzing tabular data.
Core Concepts ¶
DataFrame: A 2-dimensional table of data with named columns (Series). Series: A 1-dimensional array of homogeneous data representing a single column. LazyFrame: Deferred computation that builds an optimized query plan. Expression: Type-safe operations for filtering, transforming, and aggregating data.
Memory Management ¶
Gorilla uses Apache Arrow's memory management. Always call Release() on DataFrames and Series to prevent memory leaks. The recommended pattern is:
df := gorilla.NewDataFrame(series1, series2) defer df.Release() // Essential for proper cleanup
Basic Usage ¶
package main
import (
"fmt"
"github.com/apache/arrow-go/v18/arrow/memory"
"github.com/paveg/gorilla"
)
func main() {
mem := memory.NewGoAllocator()
// Create Series (columns)
names := gorilla.NewSeries("name", []string{"Alice", "Bob", "Charlie"}, mem)
ages := gorilla.NewSeries("age", []int64{25, 30, 35}, mem)
defer names.Release()
defer ages.Release()
// Create DataFrame
df := gorilla.NewDataFrame(names, ages)
defer df.Release()
// Lazy evaluation with method chaining
result, err := df.Lazy().
Filter(gorilla.Col("age").Gt(gorilla.Lit(int64(30)))).
Select("name").
Collect()
if err != nil {
panic(err)
}
defer result.Release()
fmt.Println(result)
}
Performance Features ¶
- Zero-copy operations using Apache Arrow columnar format - Automatic parallelization for DataFrames with 1000+ rows - Lazy evaluation with query optimization - Efficient join algorithms with automatic strategy selection - Memory-efficient aggregations and transformations
This package is the sole public API for the library.
Index ¶
- Constants
- func Case() *expr.CaseExpr
- func Coalesce(exprs ...expr.Expr) *expr.FunctionExpr
- func Col(name string) *expr.ColumnExpr
- func Concat(exprs ...expr.Expr) *expr.FunctionExpr
- func Count(column expr.Expr) *expr.AggregationExpr
- func If(condition, thenValue, elseValue expr.Expr) *expr.FunctionExpr
- func Lit(value interface{}) *expr.LiteralExpr
- func Max(column expr.Expr) *expr.AggregationExpr
- func Mean(column expr.Expr) *expr.AggregationExpr
- func Min(column expr.Expr) *expr.AggregationExpr
- func Sum(column expr.Expr) *expr.AggregationExpr
- func WithDataFrame(factory func() *DataFrame, fn func(*DataFrame) error) error
- func WithMemoryManager(allocator memory.Allocator, fn func(*MemoryManager) error) error
- func WithSeries(factory func() ISeries, fn func(ISeries) error) error
- type BatchManager
- type ChunkReader
- type ChunkWriter
- type DataFrame
- func (d *DataFrame) Column(name string) (ISeries, bool)
- func (d *DataFrame) Columns() []string
- func (d *DataFrame) Concat(others ...*DataFrame) *DataFrame
- func (d *DataFrame) Drop(names ...string) *DataFrame
- func (d *DataFrame) GroupBy(columns ...string) *GroupBy
- func (d *DataFrame) HasColumn(name string) bool
- func (d *DataFrame) Join(right *DataFrame, options *JoinOptions) (*DataFrame, error)
- func (d *DataFrame) Lazy() *LazyFrame
- func (d *DataFrame) Len() int
- func (d *DataFrame) Release()
- func (d *DataFrame) Select(names ...string) *DataFrame
- func (d *DataFrame) Slice(start, end int) *DataFrame
- func (d *DataFrame) Sort(column string, ascending bool) (*DataFrame, error)
- func (d *DataFrame) SortBy(columns []string, ascending []bool) (*DataFrame, error)
- func (d *DataFrame) String() string
- func (d *DataFrame) Width() int
- type GroupBy
- type ISeries
- type JoinOptions
- type JoinType
- type LazyFrame
- func (lf *LazyFrame) Collect() (*DataFrame, error)
- func (lf *LazyFrame) Filter(predicate expr.Expr) *LazyFrame
- func (lf *LazyFrame) GroupBy(columns ...string) *LazyGroupBy
- func (lf *LazyFrame) Join(right *LazyFrame, options *JoinOptions) *LazyFrame
- func (lf *LazyFrame) Release()
- func (lf *LazyFrame) Select(columns ...string) *LazyFrame
- func (lf *LazyFrame) Sort(column string, ascending bool) *LazyFrame
- func (lf *LazyFrame) SortBy(columns []string, ascending []bool) *LazyFrame
- func (lf *LazyFrame) String() string
- func (lf *LazyFrame) WithColumn(name string, expression expr.Expr) *LazyFrame
- type LazyGroupBy
- func (lgb *LazyGroupBy) Agg(aggregations ...*expr.AggregationExpr) *LazyFrame
- func (lgb *LazyGroupBy) Count(column string) *LazyFrame
- func (lgb *LazyGroupBy) Max(column string) *LazyFrame
- func (lgb *LazyGroupBy) Mean(column string) *LazyFrame
- func (lgb *LazyGroupBy) Min(column string) *LazyFrame
- func (lgb *LazyGroupBy) Sum(column string) *LazyFrame
- type MemoryAwareChunkReader
- type MemoryManager
- type MemoryStats
- type MemoryUsageMonitor
- func (m *MemoryUsageMonitor) CurrentUsage() int64
- func (m *MemoryUsageMonitor) GetStats() MemoryStats
- func (m *MemoryUsageMonitor) PeakUsage() int64
- func (m *MemoryUsageMonitor) RecordAllocation(bytes int64)
- func (m *MemoryUsageMonitor) RecordDeallocation(bytes int64)
- func (m *MemoryUsageMonitor) SetCleanupCallback(callback func() error)
- func (m *MemoryUsageMonitor) SetSpillCallback(callback func() error)
- func (m *MemoryUsageMonitor) SpillCount() int64
- func (m *MemoryUsageMonitor) StartMonitoring()
- func (m *MemoryUsageMonitor) StopMonitoring()
- type Releasable
- type SpillableBatch
- type StreamingOperation
- type StreamingProcessor
Constants ¶
const ( // DefaultCheckInterval is the default interval for memory monitoring DefaultCheckInterval = 5 * time.Second // HighMemoryPressureThreshold is the threshold for triggering cleanup HighMemoryPressureThreshold = 0.8 )
const ( // DefaultChunkSize is the default size for processing chunks DefaultChunkSize = 1000 // BytesPerValue is the estimated bytes per DataFrame value BytesPerValue = 8 )
Variables ¶
This section is empty.
Functions ¶
func Case ¶
Case starts a case expression.
Returns a concrete *expr.CaseExpr for optimal performance and type safety.
func Coalesce ¶
func Coalesce(exprs ...expr.Expr) *expr.FunctionExpr
Coalesce returns the first non-null expression.
Returns a concrete *expr.FunctionExpr for optimal performance and type safety.
func Col ¶
func Col(name string) *expr.ColumnExpr
Col creates a column expression that references a column by name.
This is the primary way to reference columns in filters, selections, and other DataFrame operations. The column name must exist in the DataFrame when the expression is evaluated.
Returns a concrete *expr.ColumnExpr for optimal performance and type safety.
Example:
// Reference the "age" column
ageCol := gorilla.Col("age")
// Use in filters and operations
result, err := df.Lazy().
Filter(gorilla.Col("age").Gt(gorilla.Lit(30))).
Select(gorilla.Col("name"), gorilla.Col("age")).
Collect()
func Concat ¶
func Concat(exprs ...expr.Expr) *expr.FunctionExpr
Concat concatenates string expressions.
Returns a concrete *expr.FunctionExpr for optimal performance and type safety.
func Count ¶
func Count(column expr.Expr) *expr.AggregationExpr
Count returns an aggregation expression for count.
Returns a concrete *expr.AggregationExpr for optimal performance and type safety.
func If ¶
func If(condition, thenValue, elseValue expr.Expr) *expr.FunctionExpr
If returns a conditional expression.
Returns a concrete *expr.FunctionExpr for optimal performance and type safety.
func Lit ¶
func Lit(value interface{}) *expr.LiteralExpr
Lit creates a literal expression that represents a constant value.
This is used to create expressions with constant values for comparisons, arithmetic operations, and other transformations. The value type should match the expected operation type.
Supported types: string, int64, int32, float64, float32, bool
Returns a concrete *expr.LiteralExpr for optimal performance and type safety.
Example:
// Literal values for comparisons
age30 := gorilla.Lit(int64(30))
name := gorilla.Lit("Alice")
score := gorilla.Lit(95.5)
active := gorilla.Lit(true)
// Use in operations
result, err := df.Lazy().
Filter(gorilla.Col("age").Gt(gorilla.Lit(int64(25)))).
WithColumn("bonus", gorilla.Col("salary").Mul(gorilla.Lit(0.1))).
Collect()
func Max ¶
func Max(column expr.Expr) *expr.AggregationExpr
Max returns an aggregation expression for max.
Returns a concrete *expr.AggregationExpr for optimal performance and type safety.
func Mean ¶
func Mean(column expr.Expr) *expr.AggregationExpr
Mean returns an aggregation expression for mean.
Returns a concrete *expr.AggregationExpr for optimal performance and type safety.
func Min ¶
func Min(column expr.Expr) *expr.AggregationExpr
Min returns an aggregation expression for min.
Returns a concrete *expr.AggregationExpr for optimal performance and type safety.
func Sum ¶
func Sum(column expr.Expr) *expr.AggregationExpr
Sum returns an aggregation expression for sum.
Returns a concrete *expr.AggregationExpr for optimal performance and type safety.
func WithDataFrame ¶
WithDataFrame provides automatic resource management for DataFrame operations.
This helper function creates a DataFrame using the provided factory function, executes the given operation, and automatically releases the DataFrame when done. This pattern is useful for operations where you want guaranteed cleanup.
The factory function should create and return a DataFrame. The operation function receives the DataFrame and performs the desired operations. Any error from the operation function is returned to the caller.
Example:
err := gorilla.WithDataFrame(func() *gorilla.DataFrame {
mem := memory.NewGoAllocator()
series1 := gorilla.NewSeries("name", []string{"Alice", "Bob"}, mem)
series2 := gorilla.NewSeries("age", []int64{25, 30}, mem)
return gorilla.NewDataFrame(series1, series2)
}, func(df *gorilla.DataFrame) error {
result, err := df.Lazy().Filter(gorilla.Col("age").Gt(gorilla.Lit(25))).Collect()
if err != nil {
return err
}
defer result.Release()
fmt.Println(result)
return nil
})
// DataFrame is automatically released here
func WithMemoryManager ¶
func WithMemoryManager(allocator memory.Allocator, fn func(*MemoryManager) error) error
WithMemoryManager creates a memory manager, executes a function with it, and releases all tracked resources
Types ¶
type BatchManager ¶
type BatchManager struct {
// contains filtered or unexported fields
}
BatchManager manages a collection of spillable batches
func NewBatchManager ¶
func NewBatchManager(monitor *MemoryUsageMonitor) *BatchManager
NewBatchManager creates a new batch manager
func (*BatchManager) AddBatch ¶
func (bm *BatchManager) AddBatch(data *DataFrame)
AddBatch adds a new batch to the manager
func (*BatchManager) ReleaseAll ¶
func (bm *BatchManager) ReleaseAll()
ReleaseAll releases all batches and clears the manager
func (*BatchManager) SpillLRU ¶
func (bm *BatchManager) SpillLRU() error
SpillLRU spills the least recently used batch to free memory
type ChunkReader ¶
type ChunkReader interface {
// ReadChunk reads the next chunk of data
ReadChunk() (*DataFrame, error)
// HasNext returns true if there are more chunks to read
HasNext() bool
// Close closes the chunk reader and releases resources
Close() error
}
ChunkReader represents a source of data chunks for streaming processing
type ChunkWriter ¶
type ChunkWriter interface {
// WriteChunk writes a processed chunk of data
WriteChunk(*DataFrame) error
// Close closes the chunk writer and releases resources
Close() error
}
ChunkWriter represents a destination for processed data chunks
type DataFrame ¶
type DataFrame struct {
// contains filtered or unexported fields
}
DataFrame represents a 2-dimensional table of data with named columns.
A DataFrame is composed of multiple Series (columns) and provides operations for filtering, selecting, transforming, joining, and aggregating data. It supports both eager and lazy evaluation patterns.
Key features:
- Zero-copy operations using Apache Arrow columnar format
- Automatic parallelization for large datasets (1000+ rows)
- Type-safe operations through the Expression system
- Memory-efficient storage and computation
Memory management: DataFrames must be released to prevent memory leaks:
df := gorilla.NewDataFrame(series1, series2) defer df.Release()
Operations can be performed eagerly or using lazy evaluation:
// Eager: operations execute immediately
filtered := df.Filter(gorilla.Col("age").Gt(gorilla.Lit(30)))
// Lazy: operations build a query plan, execute on Collect()
result, err := df.Lazy().
Filter(gorilla.Col("age").Gt(gorilla.Lit(30))).
Select("name", "age").
Collect()
func NewDataFrame ¶
NewDataFrame creates a new DataFrame from one or more Series.
A DataFrame is a 2-dimensional table with named columns. Each Series becomes a column in the DataFrame. All Series must have the same length.
Memory management: The returned DataFrame must be released by calling Release() to prevent memory leaks. Use defer for automatic cleanup:
df := gorilla.NewDataFrame(series1, series2) defer df.Release()
Example:
mem := memory.NewGoAllocator()
names := gorilla.NewSeries("name", []string{"Alice", "Bob"}, mem)
ages := gorilla.NewSeries("age", []int64{25, 30}, mem)
defer names.Release()
defer ages.Release()
df := gorilla.NewDataFrame(names, ages)
defer df.Release()
fmt.Println(df) // Displays the DataFrame
func (*DataFrame) Join ¶
func (d *DataFrame) Join(right *DataFrame, options *JoinOptions) (*DataFrame, error)
Join performs a join operation with another DataFrame.
func (*DataFrame) Release ¶
func (d *DataFrame) Release()
Release frees the memory used by the DataFrame.
type GroupBy ¶
type GroupBy struct {
// contains filtered or unexported fields
}
GroupBy is the public type for eager group by operations.
type ISeries ¶
type ISeries interface {
Name() string // Returns the name of the Series
Len() int // Returns the number of elements
DataType() arrow.DataType // Returns the Apache Arrow data type
IsNull(index int) bool // Checks if the value at index is null
String() string // Returns a string representation
Array() arrow.Array // Returns the underlying Arrow array
Release() // Releases memory resources
}
ISeries provides a type-erased interface for Series of any supported type.
ISeries allows different typed Series to be used together in DataFrames and operations. It wraps Apache Arrow arrays and provides common operations for all Series types.
Supported data types: string, int64, int32, float64, float32, bool
Memory management: Series implement the Release() method and must be released to prevent memory leaks:
series := gorilla.NewSeries("name", []string{"Alice", "Bob"}, mem)
defer series.Release()
The interface provides:
- Name() - column name for use in DataFrames
- Len() - number of elements in the Series
- DataType() - Apache Arrow data type information
- IsNull(index) - null value checking
- String() - human-readable representation
- Array() - access to underlying Arrow array
- Release() - memory cleanup
func NewSeries ¶
NewSeries creates a new typed Series from a slice of values.
A Series is a 1-dimensional array of homogeneous data that represents a single column in a DataFrame. The type parameter T determines the data type of the Series.
Supported types: string, int64, int32, float64, float32, bool
Parameters:
- name: The name of the Series (becomes the column name in a DataFrame)
- values: Slice of values to populate the Series
- mem: Apache Arrow memory allocator for managing the underlying storage
Memory management: The returned Series must be released by calling Release() to prevent memory leaks. Use defer for automatic cleanup:
series := gorilla.NewSeries("age", []int64{25, 30, 35}, mem)
defer series.Release()
Example:
mem := memory.NewGoAllocator()
// Create different types of Series
names := gorilla.NewSeries("name", []string{"Alice", "Bob", "Charlie"}, mem)
ages := gorilla.NewSeries("age", []int64{25, 30, 35}, mem)
scores := gorilla.NewSeries("score", []float64{95.5, 87.2, 92.1}, mem)
active := gorilla.NewSeries("active", []bool{true, false, true}, mem)
defer names.Release()
defer ages.Release()
defer scores.Release()
defer active.Release()
type JoinOptions ¶
type JoinOptions struct {
Type JoinType
LeftKey string // Single join key for left DataFrame
RightKey string // Single join key for right DataFrame
LeftKeys []string // Multiple join keys for left DataFrame
RightKeys []string // Multiple join keys for right DataFrame
}
JoinOptions specifies parameters for join operations
type LazyFrame ¶
type LazyFrame struct {
// contains filtered or unexported fields
}
LazyFrame provides deferred computation with query optimization.
LazyFrame builds an Abstract Syntax Tree (AST) of operations without executing them immediately. This allows for query optimization, operation fusion, and efficient memory usage. Operations are only executed when Collect() is called.
Benefits of lazy evaluation:
- Query optimization (predicate pushdown, operation fusion)
- Memory efficiency (only final results allocated)
- Parallel execution planning
- Operation chaining with method syntax
Example:
lazyResult := df.Lazy().
Filter(gorilla.Col("age").Gt(gorilla.Lit(25))).
WithColumn("bonus", gorilla.Col("salary").Mul(gorilla.Lit(0.1))).
GroupBy("department").
Agg(gorilla.Sum(gorilla.Col("salary")).As("total_salary")).
Select("department", "total_salary")
// Execute the entire plan efficiently
result, err := lazyResult.Collect()
defer result.Release()
func (*LazyFrame) Collect ¶
Collect executes all pending operations and returns the final DataFrame.
func (*LazyFrame) Filter ¶
Filter adds a filter operation to the LazyFrame.
The predicate can be any expression that implements expr.Expr interface.
func (*LazyFrame) GroupBy ¶
func (lf *LazyFrame) GroupBy(columns ...string) *LazyGroupBy
GroupBy adds a group by operation to the LazyFrame.
func (*LazyFrame) Join ¶
func (lf *LazyFrame) Join(right *LazyFrame, options *JoinOptions) *LazyFrame
Join adds a join operation to the LazyFrame.
func (*LazyFrame) Release ¶
func (lf *LazyFrame) Release()
Release frees the memory used by the LazyFrame.
type LazyGroupBy ¶
type LazyGroupBy struct {
// contains filtered or unexported fields
}
LazyGroupBy is the public type for lazy group by operations.
func (*LazyGroupBy) Agg ¶
func (lgb *LazyGroupBy) Agg(aggregations ...*expr.AggregationExpr) *LazyFrame
Agg adds aggregation operations to the LazyGroupBy.
Takes aggregation expressions directly without wrapper overhead.
func (*LazyGroupBy) Count ¶
func (lgb *LazyGroupBy) Count(column string) *LazyFrame
Count adds a count aggregation for the specified column.
func (*LazyGroupBy) Max ¶
func (lgb *LazyGroupBy) Max(column string) *LazyFrame
Max adds a max aggregation for the specified column.
func (*LazyGroupBy) Mean ¶
func (lgb *LazyGroupBy) Mean(column string) *LazyFrame
Mean adds a mean aggregation for the specified column.
func (*LazyGroupBy) Min ¶
func (lgb *LazyGroupBy) Min(column string) *LazyFrame
Min adds a min aggregation for the specified column.
func (*LazyGroupBy) Sum ¶
func (lgb *LazyGroupBy) Sum(column string) *LazyFrame
Sum adds a sum aggregation for the specified column.
type MemoryAwareChunkReader ¶
type MemoryAwareChunkReader struct {
// contains filtered or unexported fields
}
MemoryAwareChunkReader wraps a ChunkReader with memory monitoring
func NewMemoryAwareChunkReader ¶
func NewMemoryAwareChunkReader(reader ChunkReader, monitor *MemoryUsageMonitor) *MemoryAwareChunkReader
NewMemoryAwareChunkReader creates a new memory-aware chunk reader
func (*MemoryAwareChunkReader) Close ¶
func (mr *MemoryAwareChunkReader) Close() error
Close closes the underlying reader
func (*MemoryAwareChunkReader) HasNext ¶
func (mr *MemoryAwareChunkReader) HasNext() bool
HasNext returns true if there are more chunks to read
func (*MemoryAwareChunkReader) ReadChunk ¶
func (mr *MemoryAwareChunkReader) ReadChunk() (*DataFrame, error)
ReadChunk reads the next chunk and records memory usage
type MemoryManager ¶
type MemoryManager struct {
// contains filtered or unexported fields
}
MemoryManager helps track and release multiple resources automatically.
MemoryManager is useful for complex scenarios where many short-lived resources are created and need bulk cleanup. For most use cases, prefer the defer pattern with individual Release() calls for better readability.
Use MemoryManager when:
- Creating many temporary resources in loops
- Complex operations with unpredictable resource lifetimes
- Bulk operations where individual defer statements are impractical
The MemoryManager is safe for concurrent use from multiple goroutines.
Example:
err := gorilla.WithMemoryManager(mem, func(manager *gorilla.MemoryManager) error {
for i := 0; i < 1000; i++ {
temp := createTempDataFrame(i)
manager.Track(temp) // Will be released automatically
}
return processData()
})
// All tracked resources are released here
func NewMemoryManager ¶
func NewMemoryManager(allocator memory.Allocator) *MemoryManager
NewMemoryManager creates a new memory manager with the given allocator
func (*MemoryManager) Count ¶
func (m *MemoryManager) Count() int
Count returns the number of tracked resources
func (*MemoryManager) ReleaseAll ¶
func (m *MemoryManager) ReleaseAll()
ReleaseAll releases all tracked resources and clears the tracking list
func (*MemoryManager) Track ¶
func (m *MemoryManager) Track(resource Releasable)
Track adds a resource to be managed and automatically released
type MemoryStats ¶
type MemoryStats struct {
// Total allocated memory in bytes
AllocatedBytes int64
// Peak allocated memory in bytes
PeakAllocatedBytes int64
// Number of active allocations
ActiveAllocations int64
// Number of garbage collections triggered
GCCount int64
// Last garbage collection time
LastGCTime time.Time
// Memory pressure level (0.0 to 1.0)
MemoryPressure float64
}
MemoryStats provides detailed memory usage statistics
type MemoryUsageMonitor ¶
type MemoryUsageMonitor struct {
// contains filtered or unexported fields
}
MemoryUsageMonitor provides real-time memory usage monitoring and automatic spilling
func NewMemoryUsageMonitor ¶
func NewMemoryUsageMonitor(spillThreshold int64) *MemoryUsageMonitor
NewMemoryUsageMonitor creates a new memory usage monitor with the specified spill threshold
func (*MemoryUsageMonitor) CurrentUsage ¶
func (m *MemoryUsageMonitor) CurrentUsage() int64
CurrentUsage returns the current memory usage in bytes
func (*MemoryUsageMonitor) GetStats ¶
func (m *MemoryUsageMonitor) GetStats() MemoryStats
GetStats returns current memory usage statistics
func (*MemoryUsageMonitor) PeakUsage ¶
func (m *MemoryUsageMonitor) PeakUsage() int64
PeakUsage returns the peak memory usage in bytes
func (*MemoryUsageMonitor) RecordAllocation ¶
func (m *MemoryUsageMonitor) RecordAllocation(bytes int64)
RecordAllocation records a new memory allocation
func (*MemoryUsageMonitor) RecordDeallocation ¶
func (m *MemoryUsageMonitor) RecordDeallocation(bytes int64)
RecordDeallocation records a memory deallocation
func (*MemoryUsageMonitor) SetCleanupCallback ¶
func (m *MemoryUsageMonitor) SetCleanupCallback(callback func() error)
SetCleanupCallback sets the callback function to be called for cleanup operations
func (*MemoryUsageMonitor) SetSpillCallback ¶
func (m *MemoryUsageMonitor) SetSpillCallback(callback func() error)
SetSpillCallback sets the callback function to be called when memory usage exceeds threshold
func (*MemoryUsageMonitor) SpillCount ¶
func (m *MemoryUsageMonitor) SpillCount() int64
SpillCount returns the number of times spilling was triggered
func (*MemoryUsageMonitor) StartMonitoring ¶
func (m *MemoryUsageMonitor) StartMonitoring()
StartMonitoring starts the background memory monitoring goroutine
func (*MemoryUsageMonitor) StopMonitoring ¶
func (m *MemoryUsageMonitor) StopMonitoring()
StopMonitoring stops the background memory monitoring goroutine
type Releasable ¶
type Releasable interface {
Release()
}
Releasable represents any resource that can be released to free memory.
This interface is implemented by DataFrames, Series, and other resources that use Apache Arrow memory management. Always call Release() when done with a resource to prevent memory leaks.
The recommended pattern is to use defer for automatic cleanup:
df := gorilla.NewDataFrame(series1, series2) defer df.Release() // Automatic cleanup
type SpillableBatch ¶
type SpillableBatch struct {
// contains filtered or unexported fields
}
SpillableBatch represents a batch of data that can be spilled to disk
func NewSpillableBatch ¶
func NewSpillableBatch(data *DataFrame) *SpillableBatch
NewSpillableBatch creates a new spillable batch
func (*SpillableBatch) GetData ¶
func (sb *SpillableBatch) GetData() (*DataFrame, error)
GetData returns the data, loading from spill if necessary. If the data has been spilled to disk, this method attempts to reload it. Returns an error if the data cannot be loaded from spill storage.
func (*SpillableBatch) Release ¶
func (sb *SpillableBatch) Release()
Release releases the batch resources
func (*SpillableBatch) Spill ¶
func (sb *SpillableBatch) Spill() error
Spill spills the batch to disk and releases memory
type StreamingOperation ¶
type StreamingOperation interface {
// Apply applies the operation to a data chunk
Apply(*DataFrame) (*DataFrame, error)
// Release releases any resources held by the operation
Release()
}
StreamingOperation represents an operation that can be applied to data chunks
type StreamingProcessor ¶
type StreamingProcessor struct {
// contains filtered or unexported fields
}
StreamingProcessor handles processing of datasets larger than memory
func NewStreamingProcessor ¶
func NewStreamingProcessor(chunkSize int, allocator memory.Allocator, monitor *MemoryUsageMonitor) *StreamingProcessor
NewStreamingProcessor creates a new streaming processor with specified chunk size
func (*StreamingProcessor) Close ¶
func (sp *StreamingProcessor) Close() error
Close closes the streaming processor and releases resources
func (*StreamingProcessor) ProcessStreaming ¶
func (sp *StreamingProcessor) ProcessStreaming( reader ChunkReader, writer ChunkWriter, operations []StreamingOperation, ) error
ProcessStreaming processes data in streaming fashion using chunks
Directories
¶
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
gorilla-cli
command
|
|
|
config
command
Package main demonstrates how to use Gorilla's configurable processing parameters
|
Package main demonstrates how to use Gorilla's configurable processing parameters |
|
debug
command
|
|
|
io
command
|
|
|
internal
|
|
|
config
Package config provides configuration management for Gorilla DataFrame operations
|
Package config provides configuration management for Gorilla DataFrame operations |
|
dataframe
Package dataframe provides high-performance DataFrame operations
|
Package dataframe provides high-performance DataFrame operations |
|
errors
Package errors provides standardized error types for DataFrame operations.
|
Package errors provides standardized error types for DataFrame operations. |
|
expr
Package expr provides expression evaluation for DataFrame operations
|
Package expr provides expression evaluation for DataFrame operations |
|
io
Package io provides data input/output operations for DataFrames.
|
Package io provides data input/output operations for DataFrames. |
|
parallel
Package parallel provides concurrent processing utilities
|
Package parallel provides concurrent processing utilities |
|
series
Package series provides data structures for column operations
|
Package series provides data structures for column operations |
|
validation
Package validation provides input validation utilities for DataFrame operations.
|
Package validation provides input validation utilities for DataFrame operations. |