Documentation
¶
Overview ¶
Package imports provides functionality to read data contained in another format to populate a DataFrame. It provides inverse functionality to the exports package.
Index ¶
- func LoadFromCSV(ctx context.Context, r io.ReadSeeker, options ...CSVLoadOptions) (*dataframe.DataFrame, error)
- func LoadFromJSON(ctx context.Context, r io.ReadSeeker, options ...JSONLoadOptions) (*dataframe.DataFrame, error)
- func LoadFromParquet(ctx context.Context, src source.ParquetFile, opts ...ParquetLoadOptions) (*dataframe.DataFrame, error)
- func LoadFromSQL(ctx context.Context, stmt interface{}, options *SQLLoadOptions, ...) (*dataframe.DataFrame, error)
- type CSVLoadOptions
- type Converter
- type Database
- type GenericDataConverter
- type JSONLoadOptions
- type ParquetLoadOptions
- type SQLLoadOptions
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func LoadFromCSV ¶
func LoadFromCSV(ctx context.Context, r io.ReadSeeker, options ...CSVLoadOptions) (*dataframe.DataFrame, error)
LoadFromCSV will load data from a csv file.
func LoadFromJSON ¶
func LoadFromJSON(ctx context.Context, r io.ReadSeeker, options ...JSONLoadOptions) (*dataframe.DataFrame, error)
LoadFromJSON will load data from a jsonl file or a JSON array. The first row determines which fields will be imported for subsequent rows.
See: https://jsonlines.org for details on the file format.
func LoadFromParquet ¶
func LoadFromParquet(ctx context.Context, src source.ParquetFile, opts ...ParquetLoadOptions) (*dataframe.DataFrame, error)
LoadFromParquet will load data from a parquet file.
NOTE: This function is experimental and the implementation is likely to change.
Example (gist):
import "github.com/xitongsys/parquet-go-source/local"
import "github.com/kevinroundy/dataframe-go/imports"
func main() {
fr, _ := local.NewLocalFileReader("file.parquet")
defer fr.Close()
df, _ := imports.LoadFromParquet(ctx, fr)
}
func LoadFromSQL ¶
func LoadFromSQL(ctx context.Context, stmt interface{}, options *SQLLoadOptions, args ...interface{}) (*dataframe.DataFrame, error)
LoadFromSQL will load data from a sql database. stmt must be a *sql.Stmt or the equivalent from the mysql-go package.
See: https://godoc.org/github.com/rocketlaunchr/mysql-go#Stmt
Types ¶
type CSVLoadOptions ¶
type CSVLoadOptions struct {
// Comma is the field delimiter.
// The default value is ',' when CSVLoadOption is not provided.
// Comma must be a valid rune and must not be \r, \n,
// or the Unicode replacement character (0xFFFD).
Comma rune
// Comment, if not 0, is the comment character. Lines beginning with the
// Comment character without preceding whitespace are ignored.
// With leading whitespace the Comment character becomes part of the
// field, even if TrimLeadingSpace is true.
// Comment must be a valid rune and must not be \r, \n,
// or the Unicode replacement character (0xFFFD).
// It must also not be equal to Comma.
Comment rune
// If TrimLeadingSpace is true, leading white space in a field is ignored.
// This is done even if the field delimiter, Comma, is white space.
TrimLeadingSpace bool
// LargeDataSet should be set to true for large datasets.
// It will set the capacity of the underlying slices of the Dataframe by performing a basic parse
// of the full dataset before processing the data fully.
// Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case.
LargeDataSet bool
// DictateDataType is used to inform LoadFromCSV what the true underlying data type is for a given field name.
// The key must be the case-sensitive field name.
// The value for a given key must be of the data type of the data.
// eg. For a string use "". For an int64 use int64(0). What is relevant is the data type and not the value itself.
//
// NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work.
DictateDataType map[string]interface{}
// NilValue allows you to set what string value in the CSV file should be interpreted as a nil value for
// the purposes of insertion.
//
// Common values are: NULL, \N, NaN, NA
NilValue *string
// InferDataTypes can be set to true if the underlying data type should be automatically detected.
// Using DictateDataType is the recommended approach (especially for large datasets or memory constrained systems).
// DictateDataType always takes precedence when determining the type.
// If the data type could not be detected, SeriesString is used.
InferDataTypes bool
// Headers must be set if the CSV file does not contain a header row. This must be nil if the CSV file contains a
// header row.
Headers []string
}
CSVLoadOptions is likely to change.
type Converter ¶
type Converter struct {
ConcreteType interface{}
ConverterFunc GenericDataConverter
}
Converter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("dataframe.SeriesGeneric"). As a special case, if ConcreteType is time.Time, then a SeriesTime is used.
Example:
opts := imports.CSVLoadOptions{
DictateDataType: map[string]interface{}{
"Date": imports.Converter{
ConcreteType: time.Time{},
ConverterFunc: func(in interface{}) (interface{}, error) {
return time.Parse("2006-01-02", in.(string))
},
},
},
}
type Database ¶
type Database int
Database is used to set the Database. Different databases have different syntax for placeholders etc.
type GenericDataConverter ¶
type GenericDataConverter func(in interface{}) (interface{}, error)
GenericDataConverter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("SeriesGeneric").
type JSONLoadOptions ¶
type JSONLoadOptions struct {
// LargeDataSet should be set to true for large datasets.
// It will set the capacity of the underlying slices of the Dataframe by performing a basic parse
// of the full dataset before processing the data fully.
// Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case.
LargeDataSet bool
// DictateDataType is used to inform LoadFromJSON what the true underlying data type is for a given field name.
// The key must be the case-sensitive field name.
// The value for a given key must be of the data type of the data.
// eg. For a string use "". For an int64 use int64(0). What is relevant is the data type and not the value itself.
//
// NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work.
DictateDataType map[string]interface{}
// ErrorOnUnknownFields will generate an error if an unknown field is encountered after the first row.
ErrorOnUnknownFields bool
// Path sets the location of the array containing the data to import. It uses dot notation relative to the root
// JSON object. For JSONL files, it does nothing.
//
// NOTE: Not implemented.
Path string
}
JSONLoadOptions is likely to change.
type ParquetLoadOptions ¶
type ParquetLoadOptions struct {
}
ParquetLoadOptions is likely to change.
type SQLLoadOptions ¶
type SQLLoadOptions struct {
// KnownRowCount is used to set the capacity of the underlying slices of the Dataframe.
// The maximum number of rows supported (on a 64-bit machine) is 9,223,372,036,854,775,807 (half of 64 bit range).
// Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case.
//
// WARNING: Some databases may allow tables to contain more rows than the maximum supported.
KnownRowCount *int
// DictateDataType is used to inform LoadFromSQL what the true underlying data type is for a given column name.
// The key must be the case-sensitive column name.
// The value for a given key must be of the data type of the data.
// eg. For a string use "". For an int64 use int64(0). What is relevant is the data type and not the value itself.
//
// NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work.
DictateDataType map[string]interface{}
// Database is used to set the Database.
Database Database
// Query can be set to the sql stmt if a *sql.DB, *sql.TX, *sql.Conn or the equivalent from the mysql-go package is provided.
//
// See: https://godoc.org/github.com/rocketlaunchr/mysql-go
Query string
}
SQLLoadOptions is likely to change.