psv

package module
v0.3.0-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 22, 2025 License: MIT Imports: 1 Imported by: 0

README

PSV - Pipe Separated Values

Introduction

PSV (Pipe Separated Values) is a file format for encoding simple, tabular data in human-readable, plain text.

PSV is similar in concept to Comma-Separated Values (CSV), Tab-Separated Values (TSV) or Delimiter-Separated Values (DSV), but with the distinction that additional spaces are added so that that all rows

  • have the same number of columns
  • and all columns align vertically

e.g.:

CSV:

name,score
Alexander,3
Tim,5
Johannes,17

PSV:

| name      | score |
| --------- | ----: |
| Alexander |     3 |
| Tim       |     5 |
| Johannes  |    17 |

PSV tables are also used by Markdown, with some minor restrictions

Intended Use Cases

Three basic scenarios are supported:

  1. Generating PSV tables from code:

    doc := &psv.Document{} doc.AppendRow([]string{"name","score"}) doc.AppendRow([]string{"Alice","3"}) doc.AppendRow([]string{"Bob","2"}) ouput, _ := doc.MarshalText() fmt.Println(ouput)

    | name | score | | Alice | 3 | | Bob | 2 |

  2. Reading data from PSV tables into code:

    input := | name | score | | Alice | 3 | | Bob | 2 |

    // read the input doc := &psv.Document{} doc.UnmarshalText([]byte(input))

    // the input may contain any number of tables // loop through each table looking for 'interesting' data for _, table := range doc.Tables() {

     // ColumnByNameFunc provides a convenient name => column mapping
     value := table.ColumnByName()
    
     // DataRows() returns all rows except the first row (assumed to be a header)
     // AllRows() returns all rows including the first row.
     for r, row := range table.DataRows() {
     	fmt.Printf("%s points were awarded to %q\n",
     		value(row,"score"),
     		value(row,"name"),
     	)
     }
    

    }

    // Output: // 3 points were awarded to "Alice" // 2 points were awarded to "Bob"

  3. Re-formatting existing PSV tables via the psv command:

    % cat input.txt | name | score | Alice | 3 | Bob | 2 % psv < input.txt > output.txt % cat output.txt | name | score | | Alice | 3 | | Bob | 2 |

PSV is very closely related to comma-separated values (CSV) or delimiter-separated values (DSV), however, PSV explicitly advocates the use of additional whitespace and decoration to improve readability. PSV should be easy for humans to write and easy for humans to read.

The psv unix command line utility and golang package in this project help automate the process of (re-)formatting PSV data and also the use of PSV as a source of data in your programs.

For example, boolean logic is often presented as a table of inputs and outcomes:

| A     | B     | A anb B |                     [][]string{
| ----- | ----- | ------- |                               {"A", "B", "A anb B" },
| false | false | false   |                               {"false", "false", "false" },
| false | true  | false   |      <== psv ==>              {"false", "true", "false" },
| true  | false | false   |                               {"true", "false", "false" },
| true  | true  | true    |                               {"true", "true", "true" },
                                                          }

Document Structure

  • parsing always returns a document
  • all tables in a document may be aligned with each other by enabling the align_all option
  • a ruler after the first row of data in a table is special. it can
    • specify left,right,center,numeric data alignment per column
    • specify that a column should be sorted before joining (actually, before encoding!)
    • all other rulers are for decoration purposes only, and any additional markers within them will be ignored

Basic Formatting Rules

  • PSV is encoded in UTF-8
  • data rows
    • data rows must begin with a | (ASCII 0x7c, Unicode U+007c)
    • new columns are introduced by further | characters (one | per column)
      • a trailing | at the end of a data row is optonal
      • empty columns at the end of a line are always truncated
    • empty columns inside a table may be removed by enabling the squash-empty option
    • UTF-8 whitespace surrounding |s is ignored
    • any other UTF-8 characters are considered data
      • whitespace within data is retained verbatim
      • whitespace and | can be included as data by preceding them with a \ (ASCII 0x5c, Unicode U+005c)
    • \n (ASCII 0x0a, Unicode U+000a) separates data rows
      • \r (ASCII 0x0d, Unicode U+000a) is included as whitespace, and is thus ignored
      • a trailing \n at the end of a file is not required
  • any text lines which do not begin with a | are retained verbatim, but are not part of a PSV table

These rules are enough to produce simple PSV tables. Horizontal rulers are also available, however, they are "somewhat more complicated" and are thus explained in ruler formatting or psv_format.md

Index

Introductory examples

Creating PSV Tables Manually

To write a PSV table, simply start a line with with a | and some text. Don't worry about spacing or indentation, the psv tool will fix that in a minute. For example, the following, deliberately sloppily entered table:

    |A| B     |     A   anb B
| -
  | false | false | false
|false| true        | false ||||||
  |true       |       false | false
    |true   | true  | true    | yay

will be turned into this:

    | A     | B     | A   anb B |     |
    | ----- | ----- | --------- | --- |
    | false | false | false     |     |
    | false | true  | false     |     |
    | true  | false | false     |     |
    | true  | true  | true      | yay |

with a single call to psv (in this case, the vim [^1] command: vip!psv [^2]).

Some things of note:

  • all table rows are indented to align with the first row
  • all rows have been trimmed to the same number of columns
  • all columns are vertically aligned
  • a trailing | is always included on every data row
  • the horizontal ruler has been resized to match the width of each column
  • the contents of the table has not changed
    • e.g. the extra spacing between A and B was retained

(see ruler formatting)

[^1]: You don't have to use vim! psv can be used from any editor or shell script that lets you pipe text through shell commands.

[^2]: which translates to: - v start a visual selection ... - i select everything in ... - p the current paragraph - !psv and replace the current selection with whatever psv makes of it

Using psv Tables Programmatically

psv Tables can also help improve the readibility of test data.

Here is an example of an actual test suite (containing 14 individual unit tests) from psv's own unit testing code (sort_test.go):

func TestSingleSectionSorting(t *testing.T) {

    testTable := psv.TableFromString(`
        | 0 | b | 3  | partial
        | 1 | D
        | 2 | E | 5
        | 3 | a | 4  | unequal
        | 4 | c | 20
        | 5 | C | 10 | row | lengths
        | 6 | e | 5
        | 7 | d | 7
        `)

    testCases := sortingTestCasesFromTable(`
	| name                         | sort  | columns | exp-col | exp-rows        |
	| ---------------------------- | ----- | ------- | ------- | --------------- |
	| no sort                      | false |         |         | 0 1 2 3 4 5 6 7 |
	| default sort                 |       |         |         | 0 1 2 3 4 5 6 7 |
	| sort only when asked to      | false | 2       |         | 0 1 2 3 4 5 6 7 |
	| reverse default sort         |       | ~       |         | 7 6 5 4 3 2 1 0 |
	| reverse reverse default sort |       | ~~      |         | 0 1 2 3 4 5 6 7 |
	| indexed column sort          |       | 2       |         | 3 0 4 5 7 1 6 2 |
	| indexed column sort          |       | 2       | 2       | a b c C d D e E |
	| reverse column sort          |       | ~2      |         | 2 6 1 7 5 4 0 3 |
	| third column sort            |       | 3       |         | 1 5 4 0 3 2 6 7 |
	| numeric sort                 |       | #3      |         | 1 0 3 2 6 7 5 4 |
	| reverse numeric sort         |       | ~#3     |         | 4 5 7 6 2 3 0 1 |
	| numeric reverse sort         |       | #~3     |         | 4 5 7 6 2 3 0 1 |
	| reverse reverse column sort  |       | ~ #~3   |         | 1 0 3 2 6 7 5 4 |
	| partial column sort          |       | 4 2     |         | 4 7 1 6 2 0 5 3 |
	| non-existent column sort     |       | 9       |         | 0 1 2 3 4 5 6 7 |
	`)

    runSortingTestCases(t, testTable.AllRows(), testCases.DataRows())
}

In the example above, two tables are defined:

  • testTable is the reference table to be tested

    • it simply contains a few rows of data, in various forms suitable for testing some features of psv
    • testTable.AllRows() is used to get a [][]string containing all of the rows in the table.
  • testCases then defines a series of individual unit tests to be run on testTable

    • the first rows (|name|...) is used as a header for the table
      • psv always refers to columns by the value in their first row
        • but the first row is treated the same as all other rows
      • testCases.DataRows() is used to get all of the rows except the first row
      • the second row in the table is a ruler
        • rulers are decorative in nature and may be used to influence column alignment and sorting preferences, but they do not appear in the [][]string array of data!

Detailed Description

psv reads, formats and writes simple tables of data in text files.

In doing so, psv focuses on human readibility and ease of use, rather than trying to provide a loss-less, ubiquitous, machine-readable data transfer format.

The same could be said of markdown, and indeed, psv can be used to generate github-style markdown tables that look nice in their markdown source code, and not just after they have been converted to HTML by the markdown renderer.

Another intended use case is data tables in Gherkin files, which are a central component of Behaviour Driven Development (BDD).

However, the real reason for creating psv was to be able to use text tables as the source of data for running automated tests. Hence the go package.

Main Features

  • normalisation of rows and columns, so that every row has the same number of cells
  • automatic table indentation and column alignment
  • the ability to automatically draw horizontal separation lines, called rulers
  • the ability to re-format existing tables, while leaving lines which "do not look like table rows" unchanged
  • a simple way to read data from tables into go programs via the psv go package
  • the (limited) ability to sort table data
    • without interfering with the rest of the table's formatting
  • and more ...
Not Supported

psv is not intended to replace spreadsheets etc 😄

Among a myriad of other non-features, the following are definitely not supported by psv:

  • the inclusion of | characters in a cell's data
  • multi-line cell data
  • any kind of cell merging or splitting
  • sorting of complex data formats, including:
    • date and/or timestamps (unless they are in ISO-8601 format, which sorts nicely)
    • signed numbers (+ and - signs confuse go's collators 😦)
    • floating point numbers
    • scientific notation
    • hexadecimal notation
  • ...
Design Principles
  • self contained
    • psv is a single go binary with no external dependencies
    • the psv go package is a single package, also with no external dependecies other than go's standard packages
      • exception: I do include another package of mine to provide simplified testing with meaningful success and error messages.
    • all psv actions occur locally (no network access required)
  • non-destructive
    • if psv doesn't know how to interperet a line of text, the text remains unchanged
      • only data rows (lines beginning with a |) and rulers are re-formatted, all other lines remain unchanged
  • idempotent
    • any table generated by psv can also be read be psv
    • running a formatted table through psv again must not change the table in any way
  • easy of use
    • normal use should not require any configuration or additional parameters

Markdown Support

Markdown's table format is a subset of the formatting options provided by psv

Specifically:

  • Markdown tables MUST begin with a row of column names
  • Markdown tables MUST have exactly one ruler as their second line
  • Markdown rulers MAY contain the alignment hints :- (left-aligned), -: (right-aligned) or :-: (centered)
  • Markdown tables MUST NOT have embedded rulers anywhere else
TODO's
  • add ability to configure the scanner

    • allow auto-indent detection
      • -I detect indent by capturing the indent before the first | encountered
    • explicitly specify ruler characters (for cli)
      • default autodetect
      • explicit rulers
        • turns off autodetection
        • allows the use of + and - as data
        • options:
          • -rh '-' horizontal ruler
          • -ro '|' outer ruler
          • -ri ':' inner ruler
          • -rc '+' corners
          • -rp 'ophi'
            • o outer vertical ruler
            • p padding character
            • h horizontal ruler (default: same as padding character)
            • i inner vertical ruler (default: same as outer ruler)
  • Replace table.Data with table.DataRows

Installation

psv consists of two components: the psv command and the psv go package.

To use the psv command, you only need the psv binary in your PATH, e.g. ~/bin/psv (see binary installation below).

If you don't want to install "a binary, downloaded from the 'net", you can download the source, (inspect it 😄), and build your own version.

Source Installation
Prerequisites
  • go 1.18 or later
  • make (optional, but recommended)
Build Steps

Clone the psv git repository and use make to build, test and install psv in your $GOBIN directory (typically $GOPATH/bin or ~/Go/bin)

git clone -o codeberg https://codeberg.org/japh/psv
cd psv
make install
psv -v
Binary Installation

Note: currently only available for darwin amd64 (64-bit Intel Macs)

  • download the latest psv.gz from https://codeberg.org/japh/psv/releases
  • verify psv.gz with gpg --verify psv.gz.asc
  • compare psv.gz's checksums against those provided with shasum -c psv.gz.sha256
  • unpack psv.gz with gunzip psv.gz
  • copy psv to any directory in your $PATH, or use it directly via ./psv
  • don't forget to check that it is executable, e.g. chmod +x psv

Now you can use the psv command...

Using The psv Package In Go Projects
Prerequisites
  • go 1.18 or later

To use psv in your go project, simply import codeberg.org/japh/psv and go mod tidy will download it, build it and make it available for your project.

See the psv package documentation for the API and code examples.

Alternatives

  • csv, tsv and delimeter-separated-values tables | wikipedia

    • generally, psv tables are just a single type of delimeter separated values format
  • ASCII Table Writer

    • go package for creating tables of almost any form
    • more traditional table.SetHeader, table.SetFooter() interface
    • more features (incl. colors)
    • does not read tables
      • no good for defining test cases etc in code
  • psv-spec

    • an attempt to standardize a CSV replacement using pipes as the delimiter
    • focuses on electronic data transfers
    • does not provide a tabular layout
    • escaping just |, \, \n and \r is nice
      • but does not allow for whitespace quoting
      • future: | " " | could be used by psv to represent a space

References

Copyright 2022 Stephen Riehm japh-codeberg@opensauce.de

Documentation

Overview

psv provides methods for handling tables of Pipe-Separated-Values (PSV)

Three basic use cases are supported:

1. Generating PSV tables from code:

doc := &psv.Document{}
doc.AppendRow([]string{"name","score"})
doc.AppendRow([]string{"Alice","3"})
doc.AppendRow([]string{"Bob","2"})
ouput, _ := doc.MarshalText()
fmt.Println(ouput)

| name  | score |
| Alice | 3     |
| Bob   | 2     |

2. Reading data from PSV tables into code:

input := `
	| name  | score |
	| Alice | 3     |
	| Bob   | 2     |
	`

// read the input
doc := &psv.Document{}
doc.UnmarshalText([]byte(input))

// the input may contain any number of tables
// loop through each table looking for 'interesting' data
for _, table := range doc.Tables() {

	// ColumnByNameFunc provides a convenient name => column mapping
	value := table.ColumnByNameFunc()

	// DataRows() returns all rows except the first row (assumed to be a header)
	// AllRows() returns all rows including the first row.
	for r, row := range table.DataRows() {
		fmt.Printf("%s points were awarded to %q\n",
			value(row,"score"),
			value(row,"name"),
		)
	}
}

// Output:
// 3 points were awarded to "Alice"
// 2 points were awarded to "Bob"

3. Re-formatting existing PSV tables via the `psv` command:

% cat input.txt
| name | score
| Alice | 3
| Bob | 2
% psv < input.txt > output.txt
% cat output.txt
| name  | score |
| Alice | 3     |
| Bob   | 2     |

Usage

Document is the main aggregate for building or accessing PSV data. The Document type fulfills the encoding.TextMarshaler and encoding.TextUnmarshaler interfaces for conversion to and from the document's text form.

Documents are built incrementally via Append methods and may be read as a slice of rows. The ability to edit data or randomly access data is not provided.

Internally, a Document may contain any number of Table objects which can be accessed via the Document.Tables method.

Each table then has its own set of column names, prefix etc.

Ruler objects may be used to add separation lines to a table and may be placed anywhere within a table.

The [Markdown] formatter, however, will ignore all but the ruler that appears directly after the first row of data in the table, thus conforming to markdown's requirements.

e.g.

+---------+-------+
| name    | score |
| ------- | ----- |
| Alice   | 25    |
| Bob     | 17    |
| Charlie | 10    |
| ------- - ----- |
| Dave    | 9     |
+---------+-------+

When re-formatted for Markdown would become:

| name    | score |
| ------- | ----- |
| Alice   | 25    |
| Bob     | 17    |
| Charlie | 10    |
| Dave    | 9     |

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	*Prefix // prefix to apply to all tables within in the document
	// contains filtered or unexported fields
}

Document represents a single text file which may contain any number of tables.

func (*Document) AppendRow

func (doc *Document) AppendRow(row []string)

AppendRow appends a row to the end of the current table.

A new table will be added to the document if necessary.

func (*Document) AppendRuler

func (doc *Document) AppendRuler(ruler *Ruler)

AppendRuler appends a ruler to the end of the current table.

A new table will be added to the document if necessary.

func (*Document) AppendText

func (doc *Document) AppendText(line string)

AppendText appends a line of text to the end of the document.

Text lines are in no-way modified and are reproduced verbatim.

Text lines separate multiple tables within a document.

func (*Document) MarshalText

func (doc *Document) MarshalText() (text []byte, err error)

MarshalText returns a text representation of a document, which can be parsed by Document.UnmarshalText

Each table will be re-aligned according to the rules defined via Document.SetMarshalOptions

See also encoding.TextMarshaler

func (*Document) SetMarshalOptions

func (doc *Document) SetMarshalOptions(opts ...MarshalOption)

TODO: 2025-01-22 define how marshaling / unmarshaling shoulg be configured

func (*Document) SetTablePrefixOnce

func (doc *Document) SetTablePrefixOnce(prefix *Prefix)

SetTablePrefixOnce sets the prefix of the current table if it has not already been set. This is intended for use when parsing tables line-by-line, in which case the prefix of the first row should be used for all rows of the table.

A new table will be added to the document if necessary.

func (*Document) Tables

func (doc *Document) Tables() []*Table

Tables returns the slice of tables within the document

func (*Document) UnmarshalText

func (doc *Document) UnmarshalText(text []byte) error

UnmarshalText parses a psv table into an existing Table object

See also encoding.TextUnmarshaler

type DocumentItem

type DocumentItem struct {
	*Text
	*Table
}

DocumentItem represents a single block of text or a table within a document.

If both Text and Table are not nil, then the Text is positioned before the table.

type MarshalOption

type MarshalOption struct{}

TODO: 2025-01-22 define how marshaling / unmarshaling shoulg be configured

type Prefix

type Prefix struct {
	Pattern string
	// contains filtered or unexported fields
}

Prefix can be used to remove or add a prefix from / to each row of a table.

This is useful for re-formatting tables which are embedded in e.g. code comments.

Example
package main

import (
	"fmt"

	"codeberg.org/japh/psv"
)

func main() {
	text := `
        // verified scores
        | name  | scort |
        | Alice | 12    |


        // unverified scores:
        // | name | score |
        // | Adam | 6     |
    `

	doc := &psv.Document{
		Prefix: &psv.Prefix{Pattern: `//`},
	}

	// TODO(Steve): 2025-01-22 UnmarshalText not yet implemented
	doc.UnmarshalText([]byte(text))

	for tn, tbl := range doc.Tables() {
		fmt.Printf("%d: table with indent %q\n", tn, tbl.Prefix.Pattern)
	}

	// TODO: 2025-01-22 Output:
	// 0: table with indent "        "
	// 1: table with indent "     // "
}

func (*Prefix) SplitLine

func (p *Prefix) SplitLine(line string) (*Prefix, string)

SplitLine attempts to match a line with a specific prefix.

If the prefix does not match, then the prefix returned will be nil and the line will be returned as-is.

type Ruler

type Ruler struct {
	Line int
}

Ruler represents a horizontal separator line within a table.

Rulers grow and shrink depending on the width of the data in the column.

Rulers may also include formatting hints for each column individually, e.g. whether the column's data should be aligned to the left, right or center, and whether or not the rows should be sorted by the data in a column.

type Table

type Table struct {
	*Prefix
	// contains filtered or unexported fields
}

Table represents a single table of data.

func (*Table) AllRows

func (tbl *Table) AllRows() [][]string
Example
package main

import (
	"fmt"

	"codeberg.org/japh/psv"
)

func main() {
	tbl := &psv.Table{}
	tbl.AppendRow([]string{"name", "score"})
	tbl.AppendRow([]string{"Adam", "6"})

	for r, row := range tbl.AllRows() {
		fmt.Printf("%d: %v\n", r, row)
	}

}
Output:

0: [name score]
1: [Adam 6]

func (*Table) AppendRow

func (tbl *Table) AppendRow(row []string)

func (*Table) AppendRuler added in v0.1.1

func (tbl *Table) AppendRuler(ruler *Ruler)

func (*Table) ColumnByNameFunc

func (tbl *Table) ColumnByNameFunc() func([]string, string) string

ColumnByNameFunc returns a function which returns a column's value from a row of data, indexed by the column name.

func (*Table) ColumnNames added in v0.1.1

func (tbl *Table) ColumnNames() []string
Example
package main

import (
	"fmt"

	"codeberg.org/japh/psv"
)

func main() {
	tbl := &psv.Table{}
	tbl.AppendRow([]string{"name", "score"})
	tbl.AppendRow([]string{"Adam", "6"})

	names := tbl.ColumnNames()
	fmt.Printf("column names: %v\n", names)

}
Output:

column names: [name score]

func (*Table) DataRows added in v0.1.1

func (tbl *Table) DataRows() [][]string
Example
package main

import (
	"fmt"

	"codeberg.org/japh/psv"
)

func main() {
	tbl := &psv.Table{}
	tbl.AppendRow([]string{"name", "score"})
	tbl.AppendRow([]string{"Adam", "6"})

	for r, row := range tbl.DataRows() {
		fmt.Printf("%d: %v\n", r, row)
	}

}
Output:

0: [Adam 6]

func (*Table) SetPrefixOnce

func (tbl *Table) SetPrefixOnce(prefix *Prefix)

SetPrefixOnce sets the prefix for a table.

type Text

type Text struct {
	Lines []string
}

Text is a collection of lines which appear between tables within a document.

Text lines are never modified in any way and are reproduced verbatim when marshaling a document.

Directories

Path Synopsis
cmd
psv command
psv command
psv command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL