reader

package

v0.3.2 Latest Latest Go to latest Published: Nov 14, 2024 License: Apache-2.0 Imports: 24 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/loicalleyne/bodkin

Links

Open Source Insights

Documentation ¶

Overview ¶

Package reader contains helpers for reading data and loading to Arrow.

Index ¶

Constants
Variables
func InputMap(a any) (map[string]any, error)
func TextMarshalerHookFunc() mapstructure.DecodeHookFuncValue
type DataReader
- func NewReader(schema *arrow.Schema, source DataSource, opts ...Option) (*DataReader, error)
type DataSource
type Encoder
- func New(cfg *EncoderConfig) *Encoder
- func (e *Encoder) Encode(input any) (any, error)
type EncoderConfig
type Option

Constants ¶

View Source

const (
	Manual int = iota
	Scanner
)

View Source

const DefaultDelimiter byte = byte('\n')

Variables ¶

View Source

var (
	ErrUndefinedInput = errors.New("nil input")
	ErrInvalidInput   = errors.New("invalid input")
)

View Source

var (
	ErrNullStructData = errors.New("null struct data")
)

Functions ¶

func InputMap ¶

func InputMap(a any) (map[string]any, error)

InputMap takes structured input data and attempts to decode it to map[string]any. Input data can be json in string or []byte, or any other Go data type which can be decoded by MapStructure/v2. MapStructure/v2: github.com/go-viper/mapstructure/v2

func TextMarshalerHookFunc ¶

func TextMarshalerHookFunc() mapstructure.DecodeHookFuncValue

TextMarshalerHookFunc returns a DecodeHookFuncValue that checks for the encoding.TextMarshaler interface and calls the MarshalText function if found.

Types ¶

type DataReader ¶ added in v0.3.0

type DataReader struct {
	// contains filtered or unexported fields
}

func NewReader ¶ added in v0.3.0

func NewReader(schema *arrow.Schema, source DataSource, opts ...Option) (*DataReader, error)

func (*DataReader) Cencel ¶ added in v0.3.1

func (r *DataReader) Cencel()

Cancel cancels the Reader's io.Reader scan to Arrow.

func (*DataReader) Count ¶ added in v0.3.1

func (r *DataReader) Count() int

func (*DataReader) DataSource ¶ added in v0.3.1

func (r *DataReader) DataSource() DataSource

func (*DataReader) Err ¶ added in v0.3.0

func (r *DataReader) Err() error

Err returns the last error encountered during the reading of data.

func (*DataReader) InputBufferSize ¶ added in v0.3.1

func (r *DataReader) InputBufferSize() int

func (*DataReader) Mode ¶ added in v0.3.1

func (r *DataReader) Mode() int

func (*DataReader) Next ¶ added in v0.3.1

func (r *DataReader) Next() bool

Next returns whether a Record can be received from the converted record queue. The user should check Err() after a call to Next that returned false to check if an error took place.

func (*DataReader) NextBatch ¶ added in v0.3.2

func (r *DataReader) NextBatch(batchSize int) bool

NextBatch returns whether a []arrow.Record of a specified size can be received from the converted record queue. Will still return true if the queue channel is closed and last batch of records available < batch size specified. The user should check Err() after a call to NextBatch that returned false to check if an error took place.

func (*DataReader) Opts ¶ added in v0.3.1

func (r *DataReader) Opts() []Option

func (*DataReader) Peek ¶ added in v0.3.1

func (r *DataReader) Peek() (int, int)

Peek returns the length of the input data and Arrow Record queues.

func (*DataReader) Read ¶ added in v0.3.1

func (r *DataReader) Read(a any) error

Read loads one datum. If the Reader has an io.Reader, Read is a no-op.

func (*DataReader) ReadToRecord ¶ added in v0.3.0

func (r *DataReader) ReadToRecord(a any) (arrow.Record, error)

ReadToRecord decodes a datum directly to an arrow.Record. The record should be released by the user when done with it.

func (*DataReader) RecBufferSize ¶ added in v0.3.1

func (r *DataReader) RecBufferSize() int

func (*DataReader) Record ¶ added in v0.3.1

func (r *DataReader) Record() arrow.Record

Record returns the current Arrow record. It is valid until the next call to Next.

func (*DataReader) RecordBatch ¶ added in v0.3.2

func (r *DataReader) RecordBatch() []arrow.Record

Record returns the current Arrow record batch. It is valid until the next call to NextBatch.

func (*DataReader) Release ¶ added in v0.3.0

func (r *DataReader) Release()

Release decreases the reference count by 1. When the reference count goes to zero, the memory is freed. Release may be called simultaneously from multiple goroutines.

func (*DataReader) Reset ¶ added in v0.3.1

func (r *DataReader) Reset()

Reset resets a Reader to its initial state.

func (*DataReader) ResetCount ¶ added in v0.3.1

func (r *DataReader) ResetCount()

func (*DataReader) Retain ¶ added in v0.3.0

func (r *DataReader) Retain()

Retain increases the reference count by 1. Retain may be called simultaneously from multiple goroutines.

func (*DataReader) Schema ¶ added in v0.3.0

func (r *DataReader) Schema() *arrow.Schema

type DataSource ¶ added in v0.3.0

type DataSource int

const (
	DataSourceGo DataSource = iota
	DataSourceJSON
	DataSourceAvro
)

type Encoder ¶

type Encoder struct {
	// contains filtered or unexported fields
}

An Encoder takes structured data and converts it into an interface following the mapstructure tags.

func New ¶

func New(cfg *EncoderConfig) *Encoder

New returns a new encoder for the configuration.

func (*Encoder) Encode ¶

func (e *Encoder) Encode(input any) (any, error)

Encode takes the input and uses reflection to encode it to an interface based on the mapstructure spec.

type EncoderConfig ¶

type EncoderConfig struct {
	// EncodeHook, if set, is a way to provide custom encoding. It
	// will be called before structs and primitive types.
	EncodeHook mapstructure.DecodeHookFunc
}

EncoderConfig is the configuration used to create a new encoder.

type Option ¶ added in v0.3.0

type Option func(config)

Option configures an Avro reader/writer.

func WithAllocator ¶ added in v0.3.0

func WithAllocator(mem memory.Allocator) Option

WithAllocator specifies the Arrow memory allocator used while building records.

func WithChunk ¶ added in v0.3.1

func WithChunk(n int) Option

WithChunk specifies the chunk size used while reading data to Arrow records.

If n is zero or 1, no chunking will take place and the reader will create one record per row. If n is greater than 1, chunks of n rows will be read.

func WithIOReader ¶ added in v0.3.1

func WithIOReader(r io.Reader, delim byte) Option

WithIOReader provides an io.Reader to Bodkin Reader, along with a delimiter to use to split datum in the data stream. Default delimiter '\n' if delimiter is not provided.

func WithInputBufferSize ¶ added in v0.3.1

func WithInputBufferSize(n int) Option

WithInputBufferSize specifies the Bodkin Reader's input buffer size.

func WithJSONDecoder ¶ added in v0.3.0

func WithJSONDecoder() Option

WithJSONDecoder specifies whether to use goccy/json-go as the Bodkin Reader's decoder. The default is the Bodkin DataLoader, a linked list of builders which reduces recursive lookups in maps when loading data.

func WithRecordBufferSize ¶ added in v0.3.1

func WithRecordBufferSize(n int) Option

WithRecordBufferSize specifies the Bodkin Reader's record buffer size.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL