Documentation
¶
Overview ¶
Package io provides data input/output operations for DataFrames. It supports CSV reading and writing with type inference and various configuration options.
Package io provides I/O operations for reading and writing DataFrame data.
This package includes readers and writers for various data formats, with automatic type inference and schema handling. The primary implementation is CSV I/O with support for streaming large datasets.
Key components:
- DataReader/DataWriter interfaces for pluggable I/O backends
- CSVReader/CSVWriter for CSV file operations
- Type inference for automatic schema detection
- Configurable options for delimiters, headers, and batch sizes
Memory management: All I/O operations integrate with Apache Arrow's memory management system and require proper cleanup with defer patterns.
Index ¶
Constants ¶
const ( // DefaultChunkSize is the default chunk size for parallel processing DefaultChunkSize = 1000 // DefaultBatchSize is the default batch size for I/O operations DefaultBatchSize = 1000 )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CSVOptions ¶
type CSVOptions struct {
// Delimiter is the field delimiter (default: comma)
Delimiter rune
// Comment is the comment character (default: 0 = disabled)
Comment rune
// Header indicates whether the first row contains headers
Header bool
// SkipInitialSpace indicates whether to skip initial whitespace
SkipInitialSpace bool
// Parallel indicates whether to use parallel processing
Parallel bool
// ChunkSize is the size of chunks for parallel processing
ChunkSize int
}
CSVOptions contains configuration options for CSV operations
func DefaultCSVOptions ¶
func DefaultCSVOptions() CSVOptions
DefaultCSVOptions returns default CSV options
type CSVReader ¶
type CSVReader struct {
// contains filtered or unexported fields
}
CSVReader reads CSV data and converts it to DataFrames
func NewCSVReader ¶
NewCSVReader creates a new CSV reader with the specified options
type CSVWriter ¶
type CSVWriter struct {
// contains filtered or unexported fields
}
CSVWriter writes DataFrames to CSV format
func NewCSVWriter ¶
func NewCSVWriter(writer io.Writer, options CSVOptions) *CSVWriter
NewCSVWriter creates a new CSV writer with the specified options
type DataReader ¶
type DataReader interface {
// Read reads data from the source and returns a DataFrame
Read() (*dataframe.DataFrame, error)
}
DataReader defines the interface for reading data from various sources
type DataWriter ¶
type DataWriter interface {
// Write writes the DataFrame to the destination
Write(df *dataframe.DataFrame) error
}
DataWriter defines the interface for writing data to various destinations
type ParquetOptions ¶
type ParquetOptions struct {
// Compression type for Parquet files
Compression string
// BatchSize for reading/writing operations
BatchSize int
}
ParquetOptions contains configuration options for Parquet operations
func DefaultParquetOptions ¶
func DefaultParquetOptions() ParquetOptions
DefaultParquetOptions returns default Parquet options
type ParquetReader ¶ added in v0.2.0
type ParquetReader struct {
// contains filtered or unexported fields
}
ParquetReader reads Parquet data and converts it to DataFrames
func NewParquetReader ¶ added in v0.2.0
func NewParquetReader(reader io.Reader, options ParquetOptions, mem memory.Allocator) *ParquetReader
NewParquetReader creates a new Parquet reader with the specified options
type ParquetWriter ¶ added in v0.2.0
type ParquetWriter struct {
// contains filtered or unexported fields
}
ParquetWriter writes DataFrames to Parquet format
func NewParquetWriter ¶ added in v0.2.0
func NewParquetWriter(writer io.Writer, options ParquetOptions) *ParquetWriter
NewParquetWriter creates a new Parquet writer with the specified options