io

package

v0.4.0 Latest Latest Go to latest Published: Aug 2, 2025 License: Apache-2.0, MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/paveg/gorilla

Links

Open Source Insights

Documentation ¶

Overview ¶

Package io provides data input/output operations for DataFrames. It supports CSV reading and writing with type inference and various configuration options.

Package io provides I/O operations for reading and writing DataFrame data.

This package includes readers and writers for various data formats, with automatic type inference and schema handling. The primary implementation is CSV I/O with support for streaming large datasets.

Key components:

DataReader/DataWriter interfaces for pluggable I/O backends
CSVReader/CSVWriter for CSV file operations
Type inference for automatic schema detection
Configurable options for delimiters, headers, and batch sizes

Memory management: All I/O operations integrate with Apache Arrow's memory management system and require proper cleanup with defer patterns.

Index ¶

Constants
type CSVOptions
- func DefaultCSVOptions() CSVOptions
type CSVReader
- func NewCSVReader(reader io.Reader, options CSVOptions, mem memory.Allocator) *CSVReader
- func (r *CSVReader) Read() (*dataframe.DataFrame, error)
type CSVWriter
- func NewCSVWriter(writer io.Writer, options CSVOptions) *CSVWriter
- func (w *CSVWriter) Write(df *dataframe.DataFrame) error
type DataReader
type DataWriter
type JSONFormat
type JSONOptions
- func DefaultJSONOptions() JSONOptions
type JSONReader
- func NewJSONReader(reader io.Reader, options JSONOptions, mem memory.Allocator) *JSONReader
- func (r *JSONReader) Read() (*dataframe.DataFrame, error)
type JSONWriter
- func NewJSONWriter(writer io.Writer, options JSONOptions) *JSONWriter
- func (w *JSONWriter) Write(df *dataframe.DataFrame) error
type ParquetOptions
- func DefaultParquetOptions() ParquetOptions
type ParquetReader
- func NewParquetReader(reader io.Reader, options ParquetOptions, mem memory.Allocator) *ParquetReader
- func (r *ParquetReader) Read() (*dataframe.DataFrame, error)
type ParquetWriter
- func NewParquetWriter(writer io.Writer, options ParquetOptions) *ParquetWriter
- func (w *ParquetWriter) Write(df *dataframe.DataFrame) error

Constants ¶

View Source

const (
	// DefaultChunkSize is the default chunk size for parallel processing.
	DefaultChunkSize = 1000
	// DefaultBatchSize is the default batch size for I/O operations.
	DefaultBatchSize = 1000
	// DefaultRowGroupSize is the default row group size for Parquet files.
	DefaultRowGroupSize = 100000 // 100K rows per group
	// DefaultPageSize is the default page size for Parquet files.
	DefaultPageSize = 1048576 // 1MB pages
)

View Source

const (
	// ArrowTypeInt64 is the Arrow type name for int64 columns.
	ArrowTypeInt64 = "int64"
	// ArrowTypeInt32 is the Arrow type name for int32 columns.
	ArrowTypeInt32 = "int32"
	// ArrowTypeFloat64 is the Arrow type name for float64 columns.
	ArrowTypeFloat64 = "float64"
	// ArrowTypeFloat32 is the Arrow type name for float32 columns.
	ArrowTypeFloat32 = "float32"
	// ArrowTypeBool is the Arrow type name for bool columns.
	ArrowTypeBool = "bool"
	// ArrowTypeString is the Arrow type name for string columns.
	ArrowTypeString = "utf8"
)

Arrow data type name constants for consistent usage across I/O implementations.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type CSVOptions ¶

type CSVOptions struct {
	// Delimiter is the field delimiter (default: comma)
	Delimiter rune
	// Comment is the comment character (default: 0 = disabled)
	Comment rune
	// Header indicates whether the first row contains headers
	Header bool
	// SkipInitialSpace indicates whether to skip initial whitespace
	SkipInitialSpace bool
	// Parallel indicates whether to use parallel processing
	Parallel bool
	// ChunkSize is the size of chunks for parallel processing
	ChunkSize int
}

CSVOptions contains configuration options for CSV operations.

func DefaultCSVOptions ¶

func DefaultCSVOptions() CSVOptions

DefaultCSVOptions returns default CSV options.

type CSVReader ¶

type CSVReader struct {
	// contains filtered or unexported fields
}

CSVReader reads CSV data and converts it to DataFrames.

func NewCSVReader ¶

func NewCSVReader(reader io.Reader, options CSVOptions, mem memory.Allocator) *CSVReader

NewCSVReader creates a new CSV reader with the specified options.

func (*CSVReader) Read ¶

func (r *CSVReader) Read() (*dataframe.DataFrame, error)

Read reads CSV data and returns a DataFrame.

type CSVWriter ¶

type CSVWriter struct {
	// contains filtered or unexported fields
}

CSVWriter writes DataFrames to CSV format.

func NewCSVWriter ¶

func NewCSVWriter(writer io.Writer, options CSVOptions) *CSVWriter

NewCSVWriter creates a new CSV writer with the specified options.

func (*CSVWriter) Write ¶

func (w *CSVWriter) Write(df *dataframe.DataFrame) error

Write writes the DataFrame to CSV format.

type DataReader ¶

type DataReader interface {
	// Read reads data from the source and returns a DataFrame
	Read() (*dataframe.DataFrame, error)
}

DataReader defines the interface for reading data from various sources.

type DataWriter ¶

type DataWriter interface {
	// Write writes the DataFrame to the destination
	Write(df *dataframe.DataFrame) error
}

DataWriter defines the interface for writing data to various destinations.

type JSONFormat ¶ added in v0.4.0

type JSONFormat int

JSONFormat specifies the JSON format type.

const (
	// JSONArray format stores data as a JSON array of objects.
	JSONArray JSONFormat = iota
	// JSONLines format stores data as newline-delimited JSON objects.
	JSONLines
)

type JSONOptions ¶ added in v0.4.0

type JSONOptions struct {
	// Format specifies whether to use JSON array or JSON Lines format
	Format JSONFormat
	// TypeInference enables automatic type inference from JSON values
	TypeInference bool
	// DateFormat specifies the format for parsing date strings
	DateFormat string
	// NullValues specifies string values that should be treated as null
	NullValues []string
	// MaxRecords limits the number of records to read (0 = no limit)
	MaxRecords int
	// Parallel enables parallel processing for large JSON files
	Parallel bool
}

JSONOptions contains configuration options for JSON operations.

func DefaultJSONOptions ¶ added in v0.4.0

func DefaultJSONOptions() JSONOptions

DefaultJSONOptions returns default JSON options.

type JSONReader ¶ added in v0.4.0

type JSONReader struct {
	// contains filtered or unexported fields
}

JSONReader reads JSON data and converts it to DataFrames.

func NewJSONReader ¶ added in v0.4.0

func NewJSONReader(reader io.Reader, options JSONOptions, mem memory.Allocator) *JSONReader

NewJSONReader creates a new JSON reader with the specified options.

func (*JSONReader) Read ¶ added in v0.4.0

func (r *JSONReader) Read() (*dataframe.DataFrame, error)

Read reads JSON data and returns a DataFrame.

type JSONWriter ¶ added in v0.4.0

type JSONWriter struct {
	// contains filtered or unexported fields
}

JSONWriter writes DataFrames to JSON format.

func NewJSONWriter ¶ added in v0.4.0

func NewJSONWriter(writer io.Writer, options JSONOptions) *JSONWriter

NewJSONWriter creates a new JSON writer with the specified options.

func (*JSONWriter) Write ¶ added in v0.4.0

func (w *JSONWriter) Write(df *dataframe.DataFrame) error

Write writes the DataFrame to JSON format.

type ParquetOptions ¶

type ParquetOptions struct {
	// Compression type for Parquet files (snappy, gzip, lz4, zstd, uncompressed)
	Compression string
	// BatchSize for reading/writing operations
	BatchSize int
	// ColumnsToRead for selective column reading (nil reads all columns)
	ColumnsToRead []string
	// ParallelDecoding enables parallel decoding for better performance
	ParallelDecoding bool
	// RowGroupSize specifies the target size for row groups in rows
	RowGroupSize int64
	// PageSize specifies the target size for pages in bytes
	PageSize int64
	// EnableDict enables dictionary encoding for string columns
	EnableDict bool
}

ParquetOptions contains configuration options for Parquet operations.

func DefaultParquetOptions ¶

func DefaultParquetOptions() ParquetOptions

DefaultParquetOptions returns default Parquet options.

type ParquetReader ¶ added in v0.2.0

type ParquetReader struct {
	// contains filtered or unexported fields
}

ParquetReader reads Parquet data and converts it to DataFrames.

func NewParquetReader ¶ added in v0.2.0

func NewParquetReader(reader io.Reader, options ParquetOptions, mem memory.Allocator) *ParquetReader

NewParquetReader creates a new Parquet reader with the specified options.

func (*ParquetReader) Read ¶ added in v0.2.0

func (r *ParquetReader) Read() (*dataframe.DataFrame, error)

Read reads Parquet data and returns a DataFrame.

type ParquetWriter ¶ added in v0.2.0

type ParquetWriter struct {
	// contains filtered or unexported fields
}

ParquetWriter writes DataFrames to Parquet format.

func NewParquetWriter ¶ added in v0.2.0

func NewParquetWriter(writer io.Writer, options ParquetOptions) *ParquetWriter

NewParquetWriter creates a new Parquet writer with the specified options.

func (*ParquetWriter) Write ¶ added in v0.2.0

func (w *ParquetWriter) Write(df *dataframe.DataFrame) error

Write writes the DataFrame to Parquet format.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL