The highest tagged major version is v3.

csv

package module

v1.0.10 Latest Latest Go to latest Published: Mar 19, 2025 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/josephcopenhaver/csv-go

Links

Open Source Insights

README ¶

csv-go

Why does this exist?

I am tired of rewriting this over and over to cover edge cases where other language standard csv implementations have assertions on the format and formatting I cannot guarantee are valid for a given file and how it was constructed. I've written variations that cover far fewer concerns over the years, and I figured I'll make a superset of one that does everything I need. Feel free to use however you may wish.

Documentation ¶

Index ¶

Variables
type BufferedReader
type Reader
- func NewReader(options ...ReaderOption) (*Reader, error)
- func (r *Reader) Close() error
- func (r *Reader) Err() error
- func (r *Reader) IntoIter() iter.Seq[[]string]
- func (r *Reader) Row() []string
- func (r *Reader) Scan() bool
type ReaderOption
type ReaderOptions
- func ReaderOpts() ReaderOptions
- func (ReaderOptions) BorrowRow(b bool) ReaderOption
- func (ReaderOptions) ClearFreedDataMemory(b bool) ReaderOption
- func (ReaderOptions) Comment(r rune) ReaderOption
- func (ReaderOptions) CommentsAllowedAfterStartOfRecords(b bool) ReaderOption
- func (ReaderOptions) DiscoverRecordSeparator(b bool) ReaderOption
- func (ReaderOptions) ErrorOnNewlineInUnquotedField(b bool) ReaderOption
- func (ReaderOptions) ErrorOnNoByteOrderMarker(b bool) ReaderOption
- func (ReaderOptions) ErrorOnNoRows(b bool) ReaderOption
- func (ReaderOptions) ErrorOnQuotesInUnquotedField(b bool) ReaderOption
- func (ReaderOptions) Escape(r rune) ReaderOption
- func (ReaderOptions) ExpectHeaders(h []string) ReaderOption
- func (ReaderOptions) FieldSeparator(r rune) ReaderOption
- func (ReaderOptions) InitialRecordBuffer(v []byte) ReaderOption
- func (ReaderOptions) InitialRecordBufferSize(v int) ReaderOption
- func (ReaderOptions) NumFields(n int) ReaderOption
- func (ReaderOptions) Quote(r rune) ReaderOption
- func (ReaderOptions) Reader(r io.Reader) ReaderOption
- func (ReaderOptions) RecordSeparator(s string) ReaderOption
- func (ReaderOptions) RemoveByteOrderMarker(b bool) ReaderOption
- func (ReaderOptions) RemoveHeaderRow(b bool) ReaderOption
- func (ReaderOptions) TerminalRecordSeparatorEmitsRecord(b bool) ReaderOption
- func (ReaderOptions) TrimHeaders(b bool) ReaderOption
type WriteHeaderOption
type WriteHeaderOptions
- func WriteHeaderOpts() WriteHeaderOptions
- func (WriteHeaderOptions) CommentLines(s ...string) WriteHeaderOption
- func (WriteHeaderOptions) CommentRune(r rune) WriteHeaderOption
- func (WriteHeaderOptions) Headers(h ...string) WriteHeaderOption
- func (WriteHeaderOptions) IncludeByteOrderMarker(b bool) WriteHeaderOption
- func (WriteHeaderOptions) TrimHeaders(b bool) WriteHeaderOption
type Writer
- func NewWriter(options ...WriterOption) (*Writer, error)
- func (w *Writer) Close() error
- func (w *Writer) WriteHeader(options ...WriteHeaderOption) (int, error)
- func (w *Writer) WriteRow(row ...string) (int, error)
type WriterOption
type WriterOptions
- func WriterOpts() WriterOptions
- func (WriterOptions) ClearFreedDataMemory(b bool) WriterOption
- func (WriterOptions) ErrorOnNonUTF8(v bool) WriterOption
- func (WriterOptions) Escape(r rune) WriterOption
- func (WriterOptions) FieldSeparator(v rune) WriterOption
- func (WriterOptions) NumFields(v int) WriterOption
- func (WriterOptions) Quote(v rune) WriterOption
- func (WriterOptions) RecordSeparator(s string) WriterOption
- func (WriterOptions) Writer(v io.Writer) WriterOption

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// classifications
	ErrIO              = errors.New("io error")
	ErrParsing         = errors.New("parsing error")
	ErrFieldCount      = errors.New("field count error")
	ErrBadConfig       = errors.New("bad config")
	ErrBadReadRuneImpl = errors.New("bad ReadRune implementation")
	// instances
	ErrTooManyFields               = errors.New("too many fields")
	ErrNotEnoughFields             = errors.New("not enough fields")
	ErrReaderClosed                = errors.New("reader closed")
	ErrUnexpectedHeaderRowContents = errors.New("header row values do not match expectations")
	ErrBadRecordSeparator          = errors.New("record separator can only be one valid utf8 rune long or \"\\r\\n\"")
	ErrIncompleteQuotedField       = fmt.Errorf("incomplete quoted field: %w", io.ErrUnexpectedEOF)
	ErrQuoteInUnquotedField        = errors.New("quote found in unquoted field")
	ErrInvalidQuotedFieldEnding    = errors.New("unexpected character found after end of quoted field") // expecting field separator, record separator, quote char, or end of file if field count matches expectations
	ErrNoHeaderRow                 = fmt.Errorf("no header row: %w", io.ErrUnexpectedEOF)
	ErrNoRows                      = fmt.Errorf("no rows: %w", io.ErrUnexpectedEOF)
	ErrNoByteOrderMarker           = errors.New("no byte order marker")
	ErrNilReader                   = errors.New("nil reader")
	ErrInvalidEscapeInQuotedField  = errors.New("invalid escape sequence in quoted field")
	ErrNewlineInUnquotedField      = errors.New("newline rune found in unquoted field")
	ErrUnexpectedQuoteAfterField   = errors.New("unexpected quote after quoted+escaped field")
	ErrBadUnreadRuneImpl           = errors.New("UnreadRune failed")
	ErrUnsafeCRFileEnd             = fmt.Errorf("ended in a carriage return which must be quoted when record separator is CRLF: %w", io.ErrUnexpectedEOF)
	// ReadByte should never fail because we're always preceding this call with UnreadRune
	//
	// it could happen if someone is trying to read concurrently or made their own bad buffered reader implementation
	ErrBadReadByteImpl = errors.New("ReadByte failed")
)

View Source

var (
	ErrRowNilOrEmpty             = errors.New("row is nil or empty")
	ErrNonUTF8InRecord           = errors.New("non-utf8 characters in record")
	ErrNonUTF8InComment          = errors.New("non-utf8 characters in comment")
	ErrWriterClosed              = errors.New("writer closed")
	ErrHeaderWritten             = errors.New("header already written")
	ErrInvalidFieldCountInRecord = errors.New("invalid field count in record")
)

Functions ¶

This section is empty.

Types ¶

type BufferedReader ¶ added in v0.0.6

type BufferedReader interface {
	io.Reader
	ReadRune() (r rune, size int, err error)
	UnreadRune() error
	ReadByte() (byte, error)
}

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

func NewReader ¶

func NewReader(options ...ReaderOption) (*Reader, error)

NewReader creates a new instance of a CSV reader which is not safe for concurrent reads.

func (*Reader) Close ¶ added in v0.0.10

func (r *Reader) Close() error

Close should be called after reading all rows successfully from the underlying reader and checking the result of r.Err().

Close currently always returns nil, but in the future it may not. It is not a substitute for checking r.Err().

Should any configuration options require post-flight checks they will be implemented here.

It will never attempt to close the underlying reader.

func (*Reader) Err ¶

func (r *Reader) Err() error

func (*Reader) IntoIter ¶ added in v0.0.9

func (r *Reader) IntoIter() iter.Seq[[]string]

IntoIter converts the reader state into an iterator. Calling this method more than once returns the same iterator instance.

If the reader is configured with BorrowRow(true) then the resulting slice and field strings are only valid to use up until the next iteration and should not be saved to persistent memory.

It is best practice to check if Err() returns a non-nil error after fully traversing this iterator.

This is just a syntactic sugar method to work with range statements in go1.23 and later.

func (*Reader) Row ¶

func (r *Reader) Row() []string

Row returns a slice of strings that represents a row of a dataset.

Row only returns valid results after a call to Scan() return true. For efficiency reasons this method should not be called more than once between calls to Scan().

If the reader is configured with BorrowRow(true) then the resulting slice and field strings are only valid to use up until the next call to Scan and should not be saved to persistent memory.

func (*Reader) Scan ¶

func (r *Reader) Scan() bool

type ReaderOption ¶

type ReaderOption func(*rCfg)

type ReaderOptions ¶ added in v1.0.0

type ReaderOptions struct{}

ReaderOptions should never be instantiated manually

Instead call ReaderOpts()

This is only exported to allow godocs to discover the exported methods.

ReaderOptions will never have exported members and the zero value is not part of the semver guarantee. Instantiate it incorrectly at your own peril.

Calling the function is a nop that is compiled away anyways, you will not optimize anything at all. Use ReaderOpts()!

func ReaderOpts ¶

func ReaderOpts() ReaderOptions

func (ReaderOptions) BorrowRow ¶ added in v1.0.0

func (ReaderOptions) BorrowRow(b bool) ReaderOption

BorrowRow alters the row function to return the underlying string slice every time it is called rather than a copy.

Only set to true if the returned row slice and field strings within it are never used after the next call to Scan or Close. You must copy the slice and copy the strings within it via strings.Copy() if doing otherwise.

Please consider this to be a micro optimization in most circumstances just because is tightens the usage contract of the returned row in ways most would not normally consider.

func (ReaderOptions) ClearFreedDataMemory ¶ added in v1.0.4

func (ReaderOptions) ClearFreedDataMemory(b bool) ReaderOption

ClearFreedDataMemory ensures that whenever a shared memory buffer that contains data goes out of scope that zero values are written to every byte within the buffer.

This may significantly degrade performance and is recommended only for sensitive data or long-lived processes.

func (ReaderOptions) Comment ¶ added in v1.0.0

func (ReaderOptions) Comment(r rune) ReaderOption

func (ReaderOptions) CommentsAllowedAfterStartOfRecords ¶ added in v1.0.0

func (ReaderOptions) CommentsAllowedAfterStartOfRecords(b bool) ReaderOption

func (ReaderOptions) DiscoverRecordSeparator ¶ added in v1.0.0

func (ReaderOptions) DiscoverRecordSeparator(b bool) ReaderOption

func (ReaderOptions) ErrorOnNewlineInUnquotedField ¶ added in v1.0.0

func (ReaderOptions) ErrorOnNewlineInUnquotedField(b bool) ReaderOption

func (ReaderOptions) ErrorOnNoByteOrderMarker ¶ added in v1.0.0

func (ReaderOptions) ErrorOnNoByteOrderMarker(b bool) ReaderOption

func (ReaderOptions) ErrorOnNoRows ¶ added in v1.0.0

func (ReaderOptions) ErrorOnNoRows(b bool) ReaderOption

func (ReaderOptions) ErrorOnQuotesInUnquotedField ¶ added in v1.0.0

func (ReaderOptions) ErrorOnQuotesInUnquotedField(b bool) ReaderOption

func (ReaderOptions) Escape ¶ added in v1.0.0

func (ReaderOptions) Escape(r rune) ReaderOption

Escape is useful for specifying what character is used to escape a quote in a field and the literal escape character itself.

Without specifying this option a quote character is expected to be escaped by it just being doubled while the overall field is wrapped in quote characters.

This is mainly useful when processing a spark csv file as it does not follow strict rfc4180.

So set to '\\' if you have this need.

It is not valid to use this option without specifically setting a quote. Doing so will result in an error being returned on Reader creation.

func (ReaderOptions) ExpectHeaders ¶ added in v1.0.0

func (ReaderOptions) ExpectHeaders(h []string) ReaderOption

ExpectHeaders causes the first row to be recognized as a header row.

If the slice of header values does not match then the reader will error.

func (ReaderOptions) FieldSeparator ¶ added in v1.0.0

func (ReaderOptions) FieldSeparator(r rune) ReaderOption

func (ReaderOptions) InitialRecordBuffer ¶ added in v1.0.5

func (ReaderOptions) InitialRecordBuffer(v []byte) ReaderOption

InitialRecordBuffer is a hint to pre-allocate record buffer space once externally and pipe it in to reduce the number of re-allocations when processing a reader and reuse it at a later time after the reader is closed.

This option should generally not be used. It only exists to assist with processing large numbers of CSV files should memory be a clear constraint. There is no guarantee this buffer will always be used till the end of the csv Reader's lifecycle.

Please consider this to be a micro optimization in most circumstances just because is tightens the usage contract of the csv Reader in ways most would not normally consider.

func (ReaderOptions) InitialRecordBufferSize ¶ added in v1.0.5

func (ReaderOptions) InitialRecordBufferSize(v int) ReaderOption

InitialRecordBufferSize is a hint to pre-allocate record buffer space once and reduce the number of re-allocations when processing a reader.

Please consider this to be a micro optimization in most circumstances just because it's not likely that most users will know the maximum total record size they wish to target / be under and it's generally a better practice to leave these details to the go runtime to coordinate via standard garbage collection.

func (ReaderOptions) NumFields ¶ added in v1.0.0

func (ReaderOptions) NumFields(n int) ReaderOption

func (ReaderOptions) Quote ¶ added in v1.0.0

func (ReaderOptions) Quote(r rune) ReaderOption

func (ReaderOptions) Reader ¶ added in v1.0.0

func (ReaderOptions) Reader(r io.Reader) ReaderOption

func (ReaderOptions) RecordSeparator ¶ added in v1.0.0

func (ReaderOptions) RecordSeparator(s string) ReaderOption

func (ReaderOptions) RemoveByteOrderMarker ¶ added in v1.0.0

func (ReaderOptions) RemoveByteOrderMarker(b bool) ReaderOption

func (ReaderOptions) RemoveHeaderRow ¶ added in v1.0.0

func (ReaderOptions) RemoveHeaderRow(b bool) ReaderOption

RemoveHeaderRow causes the first row to be recognized as a header row.

The row will be skipped over by Scan() and will not be returned by Row().

func (ReaderOptions) TerminalRecordSeparatorEmitsRecord ¶ added in v1.0.0

func (ReaderOptions) TerminalRecordSeparatorEmitsRecord(b bool) ReaderOption

TerminalRecordSeparatorEmitsRecord only exists to acknowledge an edge case when processing csv documents that contain one column. If the file contents end in a record separator it's impossible to determine if that should indicate that a new record with an empty field should be emitted unless that record is enclosed in quotes or a config option like this exists.

In most cases this should not be an issue, unless the dataset is a single column list that allows empty strings for some use case and the writer used to create the file chooses to not always write the last record followed by a record separator. (treating the record separator like a record terminator)

func (ReaderOptions) TrimHeaders ¶ added in v1.0.0

func (ReaderOptions) TrimHeaders(b bool) ReaderOption

TrimHeaders causes the first row to be recognized as a header row and all values are returned with whitespace trimmed.

type WriteHeaderOption ¶ added in v1.0.6

type WriteHeaderOption func(*whCfg)

type WriteHeaderOptions ¶ added in v1.0.6

type WriteHeaderOptions struct{}

WriteHeaderOptions should never be instantiated manually

Instead call WriteHeaderOpts()

This is only exported to allow godocs to discover the exported methods.

WriteHeaderOptions will never have exported members and the zero value is not part of the semver guarantee. Instantiate it incorrectly at your own peril.

Calling the function is a nop that is compiled away anyways, you will not optimize anything at all. Use WriteHeaderOpts()!

func WriteHeaderOpts ¶ added in v1.0.6

func WriteHeaderOpts() WriteHeaderOptions

func (WriteHeaderOptions) CommentLines ¶ added in v1.0.6

func (WriteHeaderOptions) CommentLines(s ...string) WriteHeaderOption

func (WriteHeaderOptions) CommentRune ¶ added in v1.0.6

func (WriteHeaderOptions) CommentRune(r rune) WriteHeaderOption

func (WriteHeaderOptions) Headers ¶ added in v1.0.6

func (WriteHeaderOptions) Headers(h ...string) WriteHeaderOption

func (WriteHeaderOptions) IncludeByteOrderMarker ¶ added in v1.0.6

func (WriteHeaderOptions) IncludeByteOrderMarker(b bool) WriteHeaderOption

func (WriteHeaderOptions) TrimHeaders ¶ added in v1.0.6

func (WriteHeaderOptions) TrimHeaders(b bool) WriteHeaderOption

type Writer ¶ added in v1.0.6

type Writer struct {
	// contains filtered or unexported fields
}

func NewWriter ¶ added in v1.0.6

func NewWriter(options ...WriterOption) (*Writer, error)

NewWriter creates a new instance of a CSV writer which is not safe for concurrent reads.

func (*Writer) Close ¶ added in v1.0.6

func (w *Writer) Close() error

Close should be called after writing all rows successfully to the underlying writer.

Close currently always returns nil, but in the future it may not.

Should any configuration options require post-flight checks they will be implemented here.

It will never attempt to flush or close the underlying writer instance. That is left to the calling context.

func (*Writer) WriteHeader ¶ added in v1.0.6

func (w *Writer) WriteHeader(options ...WriteHeaderOption) (int, error)

func (*Writer) WriteRow ¶ added in v1.0.6

func (w *Writer) WriteRow(row ...string) (int, error)

type WriterOption ¶ added in v1.0.6

type WriterOption func(*wCfg)

type WriterOptions ¶ added in v1.0.6

type WriterOptions struct{}

WriterOptions should never be instantiated manually

Instead call WriterOpts()

This is only exported to allow godocs to discover the exported methods.

WriterOptions will never have exported members and the zero value is not part of the semver guarantee. Instantiate it incorrectly at your own peril.

Calling the function is a nop that is compiled away anyways, you will not optimize anything at all. Use WriterOpts()!

func WriterOpts ¶ added in v1.0.6

func WriterOpts() WriterOptions

func (WriterOptions) ClearFreedDataMemory ¶ added in v1.0.7

func (WriterOptions) ClearFreedDataMemory(b bool) WriterOption

ClearFreedDataMemory ensures that whenever a shared memory buffer that contains data goes out of scope that zero values are written to every byte within the buffer.

This may significantly degrade performance and is recommended only for sensitive data or long-lived processes.

func (WriterOptions) ErrorOnNonUTF8 ¶ added in v1.0.6

func (WriterOptions) ErrorOnNonUTF8(v bool) WriterOption

func (WriterOptions) Escape ¶ added in v1.0.6

func (WriterOptions) Escape(r rune) WriterOption

func (WriterOptions) FieldSeparator ¶ added in v1.0.6

func (WriterOptions) FieldSeparator(v rune) WriterOption

func (WriterOptions) NumFields ¶ added in v1.0.6

func (WriterOptions) NumFields(v int) WriterOption

func (WriterOptions) Quote ¶ added in v1.0.6

func (WriterOptions) Quote(v rune) WriterOption

func (WriterOptions) RecordSeparator ¶ added in v1.0.6

func (WriterOptions) RecordSeparator(s string) WriterOption

func (WriterOptions) Writer ¶ added in v1.0.6

func (WriterOptions) Writer(v io.Writer) WriterOption

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
cmd/generate command
examples/reader command
examples/writer command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL