Documentation
¶
Overview ¶
Package reader contains helpers for reading data and loading to Arrow.
Index ¶
- Constants
- Variables
- func InputMap(a any) (map[string]any, error)
- func TextMarshalerHookFunc() mapstructure.DecodeHookFuncValue
- type DataReader
- func (r *DataReader) Cencel()
- func (r *DataReader) Count() int
- func (r *DataReader) DataSource() DataSource
- func (r *DataReader) Err() error
- func (r *DataReader) InputBufferSize() int
- func (r *DataReader) Mode() int
- func (r *DataReader) Next() bool
- func (r *DataReader) NextBatch(batchSize int) bool
- func (r *DataReader) Opts() []Option
- func (r *DataReader) Peek() (int, int)
- func (r *DataReader) Read(a any) error
- func (r *DataReader) ReadToRecord(a any) (arrow.Record, error)
- func (r *DataReader) RecBufferSize() int
- func (r *DataReader) Record() arrow.Record
- func (r *DataReader) RecordBatch() []arrow.Record
- func (r *DataReader) Release()
- func (r *DataReader) Reset()
- func (r *DataReader) ResetCount()
- func (r *DataReader) Retain()
- func (r *DataReader) Schema() *arrow.Schema
- type DataSource
- type Encoder
- type EncoderConfig
- type Option
Constants ¶
const ( Manual int = iota Scanner )
const DefaultDelimiter byte = byte('\n')
Variables ¶
var ( ErrUndefinedInput = errors.New("nil input") ErrInvalidInput = errors.New("invalid input") )
var (
ErrNullStructData = errors.New("null struct data")
)
Functions ¶
func InputMap ¶
InputMap takes structured input data and attempts to decode it to map[string]any. Input data can be json in string or []byte, or any other Go data type which can be decoded by MapStructure/v2. MapStructure/v2: github.com/go-viper/mapstructure/v2
func TextMarshalerHookFunc ¶
func TextMarshalerHookFunc() mapstructure.DecodeHookFuncValue
TextMarshalerHookFunc returns a DecodeHookFuncValue that checks for the encoding.TextMarshaler interface and calls the MarshalText function if found.
Types ¶
type DataReader ¶ added in v0.3.0
type DataReader struct {
// contains filtered or unexported fields
}
func NewReader ¶ added in v0.3.0
func NewReader(schema *arrow.Schema, source DataSource, opts ...Option) (*DataReader, error)
func (*DataReader) Cencel ¶ added in v0.3.1
func (r *DataReader) Cencel()
Cancel cancels the Reader's io.Reader scan to Arrow.
func (*DataReader) Count ¶ added in v0.3.1
func (r *DataReader) Count() int
func (*DataReader) DataSource ¶ added in v0.3.1
func (r *DataReader) DataSource() DataSource
func (*DataReader) Err ¶ added in v0.3.0
func (r *DataReader) Err() error
Err returns the last error encountered during the reading of data.
func (*DataReader) InputBufferSize ¶ added in v0.3.1
func (r *DataReader) InputBufferSize() int
func (*DataReader) Mode ¶ added in v0.3.1
func (r *DataReader) Mode() int
func (*DataReader) Next ¶ added in v0.3.1
func (r *DataReader) Next() bool
Next returns whether a Record can be received from the converted record queue. The user should check Err() after a call to Next that returned false to check if an error took place.
func (*DataReader) NextBatch ¶ added in v0.3.2
func (r *DataReader) NextBatch(batchSize int) bool
NextBatch returns whether a []arrow.Record of a specified size can be received from the converted record queue. Will still return true if the queue channel is closed and last batch of records available < batch size specified. The user should check Err() after a call to NextBatch that returned false to check if an error took place.
func (*DataReader) Opts ¶ added in v0.3.1
func (r *DataReader) Opts() []Option
func (*DataReader) Peek ¶ added in v0.3.1
func (r *DataReader) Peek() (int, int)
Peek returns the length of the input data and Arrow Record queues.
func (*DataReader) Read ¶ added in v0.3.1
func (r *DataReader) Read(a any) error
Read loads one datum. If the Reader has an io.Reader, Read is a no-op.
func (*DataReader) ReadToRecord ¶ added in v0.3.0
func (r *DataReader) ReadToRecord(a any) (arrow.Record, error)
ReadToRecord decodes a datum directly to an arrow.Record. The record should be released by the user when done with it.
func (*DataReader) RecBufferSize ¶ added in v0.3.1
func (r *DataReader) RecBufferSize() int
func (*DataReader) Record ¶ added in v0.3.1
func (r *DataReader) Record() arrow.Record
Record returns the current Arrow record. It is valid until the next call to Next.
func (*DataReader) RecordBatch ¶ added in v0.3.2
func (r *DataReader) RecordBatch() []arrow.Record
Record returns the current Arrow record batch. It is valid until the next call to NextBatch.
func (*DataReader) Release ¶ added in v0.3.0
func (r *DataReader) Release()
Release decreases the reference count by 1. When the reference count goes to zero, the memory is freed. Release may be called simultaneously from multiple goroutines.
func (*DataReader) Reset ¶ added in v0.3.1
func (r *DataReader) Reset()
Reset resets a Reader to its initial state.
func (*DataReader) ResetCount ¶ added in v0.3.1
func (r *DataReader) ResetCount()
func (*DataReader) Retain ¶ added in v0.3.0
func (r *DataReader) Retain()
Retain increases the reference count by 1. Retain may be called simultaneously from multiple goroutines.
func (*DataReader) Schema ¶ added in v0.3.0
func (r *DataReader) Schema() *arrow.Schema
type DataSource ¶ added in v0.3.0
type DataSource int
const ( DataSourceGo DataSource = iota DataSourceJSON DataSourceAvro )
type Encoder ¶
type Encoder struct {
// contains filtered or unexported fields
}
An Encoder takes structured data and converts it into an interface following the mapstructure tags.
type EncoderConfig ¶
type EncoderConfig struct {
// EncodeHook, if set, is a way to provide custom encoding. It
// will be called before structs and primitive types.
EncodeHook mapstructure.DecodeHookFunc
}
EncoderConfig is the configuration used to create a new encoder.
type Option ¶ added in v0.3.0
type Option func(config)
Option configures an Avro reader/writer.
func WithAllocator ¶ added in v0.3.0
WithAllocator specifies the Arrow memory allocator used while building records.
func WithChunk ¶ added in v0.3.1
WithChunk specifies the chunk size used while reading data to Arrow records.
If n is zero or 1, no chunking will take place and the reader will create one record per row. If n is greater than 1, chunks of n rows will be read.
func WithIOReader ¶ added in v0.3.1
WithIOReader provides an io.Reader to Bodkin Reader, along with a delimiter to use to split datum in the data stream. Default delimiter '\n' if delimiter is not provided.
func WithInputBufferSize ¶ added in v0.3.1
WithInputBufferSize specifies the Bodkin Reader's input buffer size.
func WithJSONDecoder ¶ added in v0.3.0
func WithJSONDecoder() Option
WithJSONDecoder specifies whether to use goccy/json-go as the Bodkin Reader's decoder. The default is the Bodkin DataLoader, a linked list of builders which reduces recursive lookups in maps when loading data.
func WithRecordBufferSize ¶ added in v0.3.1
WithRecordBufferSize specifies the Bodkin Reader's record buffer size.