csvpp

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 25, 2026 License: MIT Imports: 8 Imported by: 0

README

go-csvpp

Go Reference Go Report Card

A Go implementation of the IETF CSV++ specification (draft-mscaldas-csvpp-00).

CSV++ extends traditional CSV to support arrays and structured fields within cells, enabling complex data representation while maintaining CSV's simplicity.

Features

  • Full IETF CSV++ specification compliance
  • Wraps encoding/csv for RFC 4180 compatibility
  • Four field types: Simple, Array, Structured, ArrayStructured
  • Struct mapping with csvpp tags (Marshal/Unmarshal)
  • Configurable delimiters
  • Security-conscious design (nesting depth limits)

Requirements

  • Go 1.24 or later

Installation

go get github.com/osamingo/go-csvpp

Quick Start

Reading CSV++ Data
package main

import (
    "fmt"
    "io"
    "strings"

    "github.com/osamingo/go-csvpp"
)

func main() {
    input := `name,phone[],geo(lat^lon)
Alice,555-1234~555-5678,34.0522^-118.2437
Bob,555-9999,40.7128^-74.0060
`

    reader := csvpp.NewReader(strings.NewReader(input))

    for {
        record, err := reader.Read()
        if err == io.EOF {
            break
        }
        if err != nil {
            panic(err)
        }

        name := record[0].Value
        phones := record[1].Values
        lat := record[2].Components[0].Value
        lon := record[2].Components[1].Value

        fmt.Printf("%s: phones=%v, location=(%s, %s)\n", name, phones, lat, lon)
    }
}

Output:

Alice: phones=[555-1234 555-5678], location=(34.0522, -118.2437)
Bob: phones=[555-9999], location=(40.7128, -74.0060)
Writing CSV++ Data
package main

import (
    "bytes"
    "fmt"

    "github.com/osamingo/go-csvpp"
)

func main() {
    var buf bytes.Buffer
    writer := csvpp.NewWriter(&buf)

    headers := []*csvpp.ColumnHeader{
        {Name: "name", Kind: csvpp.SimpleField},
        {Name: "tags", Kind: csvpp.ArrayField, ArrayDelimiter: '~'},
    }
    writer.SetHeaders(headers)

    if err := writer.WriteHeader(); err != nil {
        panic(err)
    }
    if err := writer.Write([]*csvpp.Field{
        {Value: "Alice"},
        {Values: []string{"go", "rust", "python"}},
    }); err != nil {
        panic(err)
    }
    writer.Flush()

    fmt.Print(buf.String())
}

Output:

name,tags[]
Alice,go~rust~python
Struct Mapping
package main

import (
    "fmt"
    "strings"

    "github.com/osamingo/go-csvpp"
)

type Person struct {
    Name   string   `csvpp:"name"`
    Phones []string `csvpp:"phone[]"`
    Geo    struct {
        Lat string
        Lon string
    } `csvpp:"geo(lat^lon)"`
}

func main() {
    input := `name,phone[],geo(lat^lon)
Alice,555-1234~555-5678,34.0522^-118.2437
`

    var people []Person
    if err := csvpp.Unmarshal(strings.NewReader(input), &people); err != nil {
        panic(err)
    }

    for _, p := range people {
        fmt.Printf("%s: phones=%v, geo=(%s, %s)\n",
            p.Name, p.Phones, p.Geo.Lat, p.Geo.Lon)
    }
}

Output:

Alice: phones=[555-1234 555-5678], geo=(34.0522, -118.2437)

Field Types

CSV++ supports four field types in headers:

Type Header Syntax Example Data Description
Simple name Alice Plain text value
Array tags[] go~rust~python Multiple values with delimiter
Structured geo(lat^lon) 34.05^-118.24 Named components
ArrayStructured addr[](city^zip) LA^90210~NY^10001 Array of structures
Default Delimiters
  • Array delimiter: ~ (tilde)
  • Component delimiter: ^ (caret)

Custom delimiters can be specified in the header:

  • phone[|] - uses | as array delimiter
  • geo;(lat;lon) - uses ; as component delimiter
Delimiter Progression

For nested structures, the IETF specification recommends:

Level Delimiter
1 (arrays) ~
2 (components) ^
3 ;
4 :

API Reference

Reader
reader := csvpp.NewReader(r) // r is io.Reader

// Configuration (same as encoding/csv)
reader.Comma = ','           // Field delimiter
reader.Comment = '#'         // Comment character
reader.LazyQuotes = false    // Relaxed quote handling
reader.TrimLeadingSpace = false
reader.MaxNestingDepth = 10  // Nesting limit (security)

// Methods
headers, err := reader.Headers()  // Get parsed headers
record, err := reader.Read()      // Read one record
records, err := reader.ReadAll()  // Read all records
Writer
writer := csvpp.NewWriter(w) // w is io.Writer

// Configuration
writer.Comma = ','      // Field delimiter
writer.UseCRLF = false  // Use \r\n line endings

// Methods
writer.SetHeaders(headers)  // Set column headers
writer.WriteHeader()        // Write header row
writer.Write(record)        // Write one record
writer.WriteAll(records)    // Write all records
writer.Flush()              // Flush buffer
Marshal/Unmarshal
// Unmarshal CSV++ data into structs
var people []Person
err := csvpp.Unmarshal(reader, &people)

// Marshal structs to CSV++ data
err := csvpp.Marshal(writer, people)
Struct Tags

Use csvpp struct tags to map fields:

type Record struct {
    Name     string   `csvpp:"name"`           // Simple field
    Tags     []string `csvpp:"tags[]"`         // Array field
    Location struct {                          // Structured field
        Lat string
        Lon string
    } `csvpp:"geo(lat^lon)"`
    Addresses []Address `csvpp:"addr[](street^city)"` // Array structured
}

Compatibility

This package wraps encoding/csv and inherits:

  • Full RFC 4180 compliance
  • Quoted field handling
  • Configurable field/line delimiters
  • Comment support

Security

  • MaxNestingDepth: Limits nested structure depth (default: 10) to prevent stack overflow from malicious input
  • Header names are restricted to ASCII characters per IETF specification

Specification

This implementation follows the IETF CSV++ specification:

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Documentation

Overview

Package csvpp implements the IETF CSV++ specification (draft-mscaldas-csvpp-00).

CSV++ extends traditional CSV to support arrays and structured fields within cells, enabling complex data representation while maintaining CSV's simplicity. This package wraps encoding/csv and is fully compatible with RFC 4180.

Overview

CSV++ introduces four field types beyond simple text values:

  • Simple: "name" - plain text value
  • Array: "tags[]" - multiple values separated by a delimiter (default: ~)
  • Structured: "geo(lat^lon)" - named components separated by a delimiter (default: ^)
  • ArrayStructured: "addresses[](street^city)" - array of structured values

These field types are represented by the FieldKind constants: SimpleField, ArrayField, StructuredField, and ArrayStructuredField.

Basic Usage

Reading CSV++ data:

r := csvpp.NewReader(file)

// Get parsed headers
headers, err := r.Headers()
if err != nil {
    log.Fatal(err)
}

// Read records
for {
    record, err := r.Read()
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    // process record
}

Writing CSV++ data:

w := csvpp.NewWriter(file)
w.SetHeaders(headers)

if err := w.WriteHeader(); err != nil {
    log.Fatal(err)
}

for _, record := range records {
    if err := w.Write(record); err != nil {
        log.Fatal(err)
    }
}
w.Flush()
if err := w.Error(); err != nil {
    log.Fatal(err)
}

Struct Mapping

Use Marshal and Unmarshal for automatic struct mapping with struct tags:

type Person struct {
    Name   string   `csvpp:"name"`
    Phones []string `csvpp:"phone[]"`
    Geo    struct {
        Lat string
        Lon string
    } `csvpp:"geo(lat^lon)"`
}

// Read into structs
var people []Person
if err := csvpp.Unmarshal(file, &people); err != nil {
    log.Fatal(err)
}

// Write from structs
var buf bytes.Buffer
if err := csvpp.Marshal(&buf, people); err != nil {
    log.Fatal(err)
}

Delimiter Conventions

The IETF CSV++ specification recommends using specific delimiters for nested structures to avoid conflicts. The recommended progression is:

  • Level 1 (arrays): ~ (tilde)
  • Level 2 (components): ^ (caret)
  • Level 3: ; (semicolon)
  • Level 4: : (colon)

This package uses ~ and ^ as defaults, matching the IETF recommendation.

Compatibility with encoding/csv

This package wraps encoding/csv and inherits its RFC 4180 compliance. The Reader and Writer types expose the same configuration options:

  • Comma: field delimiter (default: ',')
  • Comment: comment character (Reader only)
  • LazyQuotes: relaxed quote handling (Reader only)
  • TrimLeadingSpace: trim leading whitespace (Reader only)
  • UseCRLF: use \r\n line endings (Writer only)

Security Considerations

The MaxNestingDepth option (default: 10) limits the depth of nested structures to prevent stack overflow attacks from maliciously crafted input.

Errors

The package defines the following sentinel errors:

Parse errors are wrapped in ParseError, which provides line/column information.

Constants

Default delimiters follow IETF recommendations:

Specification Reference

For the complete IETF CSV++ specification, see: https://datatracker.ietf.org/doc/draft-mscaldas-csvpp/

Example
input := `name,phone[],geo(lat^lon)
Alice,555-1234~555-5678,34.0522^-118.2437
Bob,555-9999,40.7128^-74.0060
`

reader := csvpp.NewReader(strings.NewReader(input))

// Get headers
headers, err := reader.Headers()
if err != nil {
	log.Fatal(err)
}

fmt.Printf("Headers: %s, %s, %s\n", headers[0].Name, headers[1].Name, headers[2].Name)

// Read all records
for {
	record, err := reader.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		log.Fatal(err)
	}

	name := record[0].Value
	phones := record[1].Values
	lat := record[2].Components[0].Value
	lon := record[2].Components[1].Value

	fmt.Printf("%s: phones=%v, location=(%s, %s)\n", name, phones, lat, lon)
}
Output:

Headers: name, phone, geo
Alice: phones=[555-1234 555-5678], location=(34.0522, -118.2437)
Bob: phones=[555-9999], location=(40.7128, -74.0060)

Index

Examples

Constants

View Source
const (
	DefaultArrayDelimiter     = '~' // IETF Section 2.3.2: recommended for array fields
	DefaultComponentDelimiter = '^' // IETF Section 2.3.2: recommended for structured fields
)

Default delimiters as recommended in IETF CSV++ Section 2.3.2. The specification suggests delimiter progression: ~ → ^ → ; → : for nested structures.

View Source
const DefaultMaxNestingDepth = 10

DefaultMaxNestingDepth is the default maximum nesting depth. IETF Section 5 (Security Considerations) recommends limiting nesting depth to prevent stack overflow attacks from maliciously crafted input.

Variables

View Source
var (
	ErrNoHeader       = errors.New("csvpp: header record is required")
	ErrInvalidHeader  = errors.New("csvpp: invalid column header format")
	ErrNestingTooDeep = errors.New("csvpp: nesting level exceeds limit")
)

Error definitions.

Functions

func Marshal

func Marshal(w io.Writer, src any) error

Marshal encodes a slice of structs to CSV++ data.

Example
people := []Person{
	{Name: "Alice", Phones: []string{"555-1234", "555-5678"}},
	{Name: "Bob", Phones: []string{"555-9999"}},
}

var buf bytes.Buffer
if err := csvpp.Marshal(&buf, people); err != nil {
	log.Fatal(err)
}

fmt.Print(buf.String())
Output:

name,phone[]
Alice,555-1234~555-5678
Bob,555-9999

func MarshalWriter

func MarshalWriter(w *Writer, src any) error

MarshalWriter encodes a slice of structs to a Writer.

func Unmarshal

func Unmarshal(r io.Reader, dst any) error

Unmarshal decodes CSV++ data into a slice of structs. dst must be a pointer to a slice of structs.

Example
input := `name,phone[]
Alice,555-1234~555-5678
Bob,555-9999
`

var people []Person
if err := csvpp.Unmarshal(strings.NewReader(input), &people); err != nil {
	log.Fatal(err)
}

for _, p := range people {
	fmt.Printf("%s: %v\n", p.Name, p.Phones)
}
Output:

Alice: [555-1234 555-5678]
Bob: [555-9999]
Example (Structured)
input := `name,geo(lat^lon)
Los Angeles,34.0522^-118.2437
New York,40.7128^-74.0060
`

var locations []Location
if err := csvpp.Unmarshal(strings.NewReader(input), &locations); err != nil {
	log.Fatal(err)
}

for _, loc := range locations {
	fmt.Printf("%s: (%s, %s)\n", loc.Name, loc.Geo.Lat, loc.Geo.Lon)
}
Output:

Los Angeles: (34.0522, -118.2437)
New York: (40.7128, -74.0060)

func UnmarshalReader

func UnmarshalReader(r *Reader, dst any) error

UnmarshalReader decodes from a Reader into a slice of structs.

Types

type ColumnHeader

type ColumnHeader struct {
	Name               string          // Field name (ABNF: name = 1*field-char)
	Kind               FieldKind       // Field type (IETF Section 2.2)
	ArrayDelimiter     rune            // Array delimiter (ABNF: delimiter)
	ComponentDelimiter rune            // Component delimiter (ABNF: component-delim)
	Components         []*ColumnHeader // Component list (ABNF: component-list)
}

ColumnHeader represents the declaration information for an individual field. It corresponds to the ABNF "field" rule in IETF CSV++ Section 2.2:

field = simple-field / array-field / struct-field / array-struct-field
name  = 1*field-char
field-char = ALPHA / DIGIT / "_" / "-"

type Field

type Field struct {
	Value      string   // Value for SimpleField
	Values     []string // Values for ArrayField (IETF Section 2.2.2)
	Components []*Field // Components for StructuredField/ArrayStructuredField (IETF Section 2.2.3/2.2.4)
}

Field represents a parsed field value from a data row. The populated fields depend on the corresponding ColumnHeader.Kind:

  • SimpleField: Value is set
  • ArrayField: Values is set
  • StructuredField: Components is set (each component is a Field)
  • ArrayStructuredField: Components is set (each is a Field with its own Components)

type FieldKind

type FieldKind int

FieldKind represents the type of field as defined in IETF CSV++ Section 2.2. See: https://datatracker.ietf.org/doc/draft-mscaldas-csvpp/

const (
	SimpleField          FieldKind = iota // IETF Section 2.2.1: simple-field = name
	ArrayField                            // IETF Section 2.2.2: array-field = name "[" [delimiter] "]"
	StructuredField                       // IETF Section 2.2.3: struct-field = name [component-delim] "(" component-list ")"
	ArrayStructuredField                  // IETF Section 2.2.4: array-struct-field = name "[" [delimiter] "]" [component-delim] "(" component-list ")"
)

func (FieldKind) String

func (k FieldKind) String() string

String returns the string representation of FieldKind.

type ParseError

type ParseError struct {
	Line   int    // Line number where the error occurred (1-based)
	Column int    // Column number where the error occurred (1-based)
	Field  string // Field name (if available)
	Err    error  // Original error
}

ParseError holds detailed information about an error that occurred during parsing.

func (*ParseError) Error

func (e *ParseError) Error() string

Error returns the error message for ParseError.

func (*ParseError) Unwrap

func (e *ParseError) Unwrap() error

Unwrap returns the original error.

type Reader

type Reader struct {
	// Comma is the field delimiter (default: ',').
	Comma rune
	// Comment is the comment character (disabled if 0).
	Comment rune
	// LazyQuotes relaxes strict quote checking if true.
	LazyQuotes bool
	// TrimLeadingSpace trims leading whitespace from fields if true.
	TrimLeadingSpace bool
	// MaxNestingDepth is the maximum nesting depth for structured fields (default: 10).
	// This limit prevents stack overflow from deeply nested input (IETF Section 5).
	// If 0, DefaultMaxNestingDepth is used.
	MaxNestingDepth int
	// contains filtered or unexported fields
}

Reader reads CSV++ files according to the IETF CSV++ specification. It wraps encoding/csv.Reader and provides CSV++ header parsing and field parsing. The first row is always treated as the header row (IETF Section 2.1).

func NewReader

func NewReader(r io.Reader) *Reader

NewReader creates a new Reader.

Example (CustomDelimiter)
// Using semicolon as field delimiter (common in European locales)
input := `name;age
Alice;30
Bob;25
`

reader := csvpp.NewReader(strings.NewReader(input))
reader.Comma = ';'

records, err := reader.ReadAll()
if err != nil {
	log.Fatal(err)
}

for _, record := range records {
	fmt.Printf("%s is %s\n", record[0].Value, record[1].Value)
}
Output:

Alice is 30
Bob is 25

func (*Reader) Headers

func (r *Reader) Headers() ([]*ColumnHeader, error)

Headers returns the parsed header information. If headers have not been parsed yet, the first row is read and parsed.

Example
input := `id,name,tags[],address(street^city^zip)
1,Alice,go~rust,123 Main^LA^90210
`

reader := csvpp.NewReader(strings.NewReader(input))
headers, err := reader.Headers()
if err != nil {
	log.Fatal(err)
}

for _, h := range headers {
	fmt.Printf("%s: %s\n", h.Name, h.Kind)
}
Output:

id: SimpleField
name: SimpleField
tags: ArrayField
address: StructuredField

func (*Reader) Read

func (r *Reader) Read() ([]*Field, error)

Read reads and returns one record's worth of fields. The header row is automatically parsed on the first call. Returns io.EOF when the end of file is reached.

Example
input := `name,scores[]
Alice,100~95~88
Bob,77~82
`

reader := csvpp.NewReader(strings.NewReader(input))

for {
	record, err := reader.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%s: %v\n", record[0].Value, record[1].Values)
}
Output:

Alice: [100 95 88]
Bob: [77 82]

func (*Reader) ReadAll

func (r *Reader) ReadAll() ([][]*Field, error)

ReadAll reads and returns all records. The header row is automatically parsed on the first call.

Example
input := `name,age
Alice,30
Bob,25
Charlie,35
`

reader := csvpp.NewReader(strings.NewReader(input))
records, err := reader.ReadAll()
if err != nil {
	log.Fatal(err)
}

fmt.Printf("Read %d records\n", len(records))
for _, record := range records {
	fmt.Printf("%s is %s years old\n", record[0].Value, record[1].Value)
}
Output:

Read 3 records
Alice is 30 years old
Bob is 25 years old
Charlie is 35 years old

type Writer

type Writer struct {
	// Comma is the field delimiter (default: ',').
	Comma rune
	// UseCRLF uses \r\n as the line terminator if true.
	UseCRLF bool
	// contains filtered or unexported fields
}

Writer writes CSV++ files according to the IETF CSV++ specification. It wraps encoding/csv.Writer and serializes CSV++ fields using the delimiters defined in the headers. The output is RFC 4180 compliant.

Example
var buf bytes.Buffer
writer := csvpp.NewWriter(&buf)

headers := []*csvpp.ColumnHeader{
	{Name: "name", Kind: csvpp.SimpleField},
	{Name: "tags", Kind: csvpp.ArrayField, ArrayDelimiter: '~'},
}
writer.SetHeaders(headers)

if err := writer.WriteHeader(); err != nil {
	log.Fatal(err)
}

records := [][]*csvpp.Field{
	{{Value: "Alice"}, {Values: []string{"go", "rust"}}},
	{{Value: "Bob"}, {Values: []string{"python"}}},
}

for _, record := range records {
	if err := writer.Write(record); err != nil {
		log.Fatal(err)
	}
}
writer.Flush()

fmt.Print(buf.String())
Output:

name,tags[]
Alice,go~rust
Bob,python

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter creates a new Writer.

func (*Writer) Error

func (w *Writer) Error() error

Error returns any error that occurred during writing.

func (*Writer) Flush

func (w *Writer) Flush()

Flush flushes the buffer.

func (*Writer) SetHeaders

func (w *Writer) SetHeaders(headers []*ColumnHeader)

SetHeaders sets the header information. This must be called before WriteHeader or Write.

func (*Writer) Write

func (w *Writer) Write(record []*Field) error

Write writes one record's worth of fields.

func (*Writer) WriteAll

func (w *Writer) WriteAll(records [][]*Field) error

WriteAll writes all records. The header row is also written automatically.

Example
var buf bytes.Buffer
writer := csvpp.NewWriter(&buf)

headers := []*csvpp.ColumnHeader{
	{Name: "name", Kind: csvpp.SimpleField},
	{Name: "score", Kind: csvpp.SimpleField},
}
writer.SetHeaders(headers)

records := [][]*csvpp.Field{
	{{Value: "Alice"}, {Value: "100"}},
	{{Value: "Bob"}, {Value: "95"}},
}

if err := writer.WriteAll(records); err != nil {
	log.Fatal(err)
}

fmt.Print(buf.String())
Output:

name,score
Alice,100
Bob,95

func (*Writer) WriteHeader

func (w *Writer) WriteHeader() error

WriteHeader writes the header row.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL