imports

package

v1.0.2 Latest Latest Go to latest Published: Apr 26, 2025 License: MIT Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kevinroundy/dataframe-go

Links

Open Source Insights

Documentation ¶

Overview ¶

Package imports provides functionality to read data contained in another format to populate a DataFrame. It provides inverse functionality to the exports package.

Index ¶

func LoadFromCSV(ctx context.Context, r io.ReadSeeker, options ...CSVLoadOptions) (*dataframe.DataFrame, error)
func LoadFromJSON(ctx context.Context, r io.ReadSeeker, options ...JSONLoadOptions) (*dataframe.DataFrame, error)
func LoadFromParquet(ctx context.Context, src source.ParquetFile, opts ...ParquetLoadOptions) (*dataframe.DataFrame, error)
func LoadFromSQL(ctx context.Context, stmt interface{}, options *SQLLoadOptions, ...) (*dataframe.DataFrame, error)
type CSVLoadOptions
type Converter
type Database
type GenericDataConverter
type JSONLoadOptions
type ParquetLoadOptions
type SQLLoadOptions

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func LoadFromCSV ¶

func LoadFromCSV(ctx context.Context, r io.ReadSeeker, options ...CSVLoadOptions) (*dataframe.DataFrame, error)

LoadFromCSV will load data from a csv file.

func LoadFromJSON ¶

func LoadFromJSON(ctx context.Context, r io.ReadSeeker, options ...JSONLoadOptions) (*dataframe.DataFrame, error)

LoadFromJSON will load data from a jsonl file or a JSON array. The first row determines which fields will be imported for subsequent rows.

See: https://jsonlines.org for details on the file format.

func LoadFromParquet ¶

func LoadFromParquet(ctx context.Context, src source.ParquetFile, opts ...ParquetLoadOptions) (*dataframe.DataFrame, error)

LoadFromParquet will load data from a parquet file.

NOTE: This function is experimental and the implementation is likely to change.

Example (gist):

import	"github.com/xitongsys/parquet-go-source/local"
import	"github.com/kevinroundy/dataframe-go/imports"

func main() {
	fr, _ := local.NewLocalFileReader("file.parquet")
	defer fr.Close()

	df, _ := imports.LoadFromParquet(ctx, fr)
}

func LoadFromSQL ¶

func LoadFromSQL(ctx context.Context, stmt interface{}, options *SQLLoadOptions, args ...interface{}) (*dataframe.DataFrame, error)

LoadFromSQL will load data from a sql database. stmt must be a *sql.Stmt or the equivalent from the mysql-go package.

See: https://godoc.org/github.com/rocketlaunchr/mysql-go#Stmt

Types ¶

type Converter ¶

type Converter struct {
	ConcreteType  interface{}
	ConverterFunc GenericDataConverter
}

Converter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("dataframe.SeriesGeneric"). As a special case, if ConcreteType is time.Time, then a SeriesTime is used.

Example:

opts := imports.CSVLoadOptions{
   DictateDataType: map[string]interface{}{
      "Date": imports.Converter{
         ConcreteType: time.Time{},
         ConverterFunc: func(in interface{}) (interface{}, error) {
            return time.Parse("2006-01-02", in.(string))
         },
      },
   },
}

type Database ¶

type Database int

Database is used to set the Database. Different databases have different syntax for placeholders etc.

const (
	// PostgreSQL database
	PostgreSQL Database = 0
	// MySQL database
	MySQL Database = 1
)

type GenericDataConverter ¶

type GenericDataConverter func(in interface{}) (interface{}, error)

GenericDataConverter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("SeriesGeneric").

type JSONLoadOptions ¶

type JSONLoadOptions struct {

	// LargeDataSet should be set to true for large datasets.
	// It will set the capacity of the underlying slices of the Dataframe by performing a basic parse
	// of the full dataset before processing the data fully.
	// Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case.
	LargeDataSet bool

	// DictateDataType is used to inform LoadFromJSON what the true underlying data type is for a given field name.
	// The key must be the case-sensitive field name.
	// The value for a given key must be of the data type of the data.
	// eg. For a string use "". For an int64 use int64(0). What is relevant is the data type and not the value itself.
	//
	// NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work.
	DictateDataType map[string]interface{}

	// ErrorOnUnknownFields will generate an error if an unknown field is encountered after the first row.
	ErrorOnUnknownFields bool

	// Path sets the location of the array containing the data to import. It uses dot notation relative to the root
	// JSON object. For JSONL files, it does nothing.
	//
	// NOTE: Not implemented.
	Path string
}

JSONLoadOptions is likely to change.

type ParquetLoadOptions ¶

type ParquetLoadOptions struct {
}

ParquetLoadOptions is likely to change.

type SQLLoadOptions ¶

type SQLLoadOptions struct {

	// KnownRowCount is used to set the capacity of the underlying slices of the Dataframe.
	// The maximum number of rows supported (on a 64-bit machine) is 9,223,372,036,854,775,807 (half of 64 bit range).
	// Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case.
	//
	// WARNING: Some databases may allow tables to contain more rows than the maximum supported.
	KnownRowCount *int

	// DictateDataType is used to inform LoadFromSQL what the true underlying data type is for a given column name.
	// The key must be the case-sensitive column name.
	// The value for a given key must be of the data type of the data.
	// eg. For a string use "". For an int64 use int64(0). What is relevant is the data type and not the value itself.
	//
	// NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work.
	DictateDataType map[string]interface{}

	// Database is used to set the Database.
	Database Database

	// Query can be set to the sql stmt if a *sql.DB, *sql.TX, *sql.Conn or the equivalent from the mysql-go package is provided.
	//
	// See: https://godoc.org/github.com/rocketlaunchr/mysql-go
	Query string
}

SQLLoadOptions is likely to change.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL