io

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 27, 2025 License: Apache-2.0, MIT Imports: 19 Imported by: 0

Documentation

Overview

Package io provides data input/output operations for DataFrames. It supports CSV reading and writing with type inference and various configuration options.

Package io provides I/O operations for reading and writing DataFrame data.

This package includes readers and writers for various data formats, with automatic type inference and schema handling. The primary implementation is CSV I/O with support for streaming large datasets.

Key components:

  • DataReader/DataWriter interfaces for pluggable I/O backends
  • CSVReader/CSVWriter for CSV file operations
  • Type inference for automatic schema detection
  • Configurable options for delimiters, headers, and batch sizes

Memory management: All I/O operations integrate with Apache Arrow's memory management system and require proper cleanup with defer patterns.

Index

Constants

View Source
const (
	// DefaultChunkSize is the default chunk size for parallel processing
	DefaultChunkSize = 1000
	// DefaultBatchSize is the default batch size for I/O operations
	DefaultBatchSize = 1000
)

Variables

This section is empty.

Functions

This section is empty.

Types

type CSVOptions

type CSVOptions struct {
	// Delimiter is the field delimiter (default: comma)
	Delimiter rune
	// Comment is the comment character (default: 0 = disabled)
	Comment rune
	// Header indicates whether the first row contains headers
	Header bool
	// SkipInitialSpace indicates whether to skip initial whitespace
	SkipInitialSpace bool
	// Parallel indicates whether to use parallel processing
	Parallel bool
	// ChunkSize is the size of chunks for parallel processing
	ChunkSize int
}

CSVOptions contains configuration options for CSV operations

func DefaultCSVOptions

func DefaultCSVOptions() CSVOptions

DefaultCSVOptions returns default CSV options

type CSVReader

type CSVReader struct {
	// contains filtered or unexported fields
}

CSVReader reads CSV data and converts it to DataFrames

func NewCSVReader

func NewCSVReader(reader io.Reader, options CSVOptions, mem memory.Allocator) *CSVReader

NewCSVReader creates a new CSV reader with the specified options

func (*CSVReader) Read

func (r *CSVReader) Read() (*dataframe.DataFrame, error)

Read reads CSV data and returns a DataFrame

type CSVWriter

type CSVWriter struct {
	// contains filtered or unexported fields
}

CSVWriter writes DataFrames to CSV format

func NewCSVWriter

func NewCSVWriter(writer io.Writer, options CSVOptions) *CSVWriter

NewCSVWriter creates a new CSV writer with the specified options

func (*CSVWriter) Write

func (w *CSVWriter) Write(df *dataframe.DataFrame) error

Write writes the DataFrame to CSV format

type DataReader

type DataReader interface {
	// Read reads data from the source and returns a DataFrame
	Read() (*dataframe.DataFrame, error)
}

DataReader defines the interface for reading data from various sources

type DataWriter

type DataWriter interface {
	// Write writes the DataFrame to the destination
	Write(df *dataframe.DataFrame) error
}

DataWriter defines the interface for writing data to various destinations

type ParquetOptions

type ParquetOptions struct {
	// Compression type for Parquet files
	Compression string
	// BatchSize for reading/writing operations
	BatchSize int
}

ParquetOptions contains configuration options for Parquet operations

func DefaultParquetOptions

func DefaultParquetOptions() ParquetOptions

DefaultParquetOptions returns default Parquet options

type ParquetReader added in v0.2.0

type ParquetReader struct {
	// contains filtered or unexported fields
}

ParquetReader reads Parquet data and converts it to DataFrames

func NewParquetReader added in v0.2.0

func NewParquetReader(reader io.Reader, options ParquetOptions, mem memory.Allocator) *ParquetReader

NewParquetReader creates a new Parquet reader with the specified options

func (*ParquetReader) Read added in v0.2.0

func (r *ParquetReader) Read() (*dataframe.DataFrame, error)

Read reads Parquet data and returns a DataFrame

type ParquetWriter added in v0.2.0

type ParquetWriter struct {
	// contains filtered or unexported fields
}

ParquetWriter writes DataFrames to Parquet format

func NewParquetWriter added in v0.2.0

func NewParquetWriter(writer io.Writer, options ParquetOptions) *ParquetWriter

NewParquetWriter creates a new Parquet writer with the specified options

func (*ParquetWriter) Write added in v0.2.0

func (w *ParquetWriter) Write(df *dataframe.DataFrame) error

Write writes the DataFrame to Parquet format

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL