jsonstream

package module
v0.5.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2025 License: BSD-2-Clause Imports: 9 Imported by: 1

README

GoDoc License

JSONStream

JSONStream is a streaming JSON parser for Go. It's useful if you want to search through a JSON input stream without parsing all of it, or if you want precise control over how the input stream is parsed.

Streaming parsers are more difficult to use than parsers which automatically construct a data structure from the input. This library is not recommended for general-purpose JSON processing.

Key features and design decisions

  • Simple iterator-based API.
  • Line and column info for all tokens.
  • Extensive test suite (including fuzz tests and the JSONTestSuite).
  • Choice of behavior for numeric literals outside the range of float64 or int.
  • Optional support for JavaScript-style comments and trailing commas.
  • Reports errors for all invalid JSON. You need only verify that the JSON has the required structure.
  • Simple path API that can be used to search for values at a given path.
  • Assumes UTF-8 input.
  • Surrogate pair escape sequences decoded correctly (e.g. "\uD834\uDD1E" decodes to UTF-8 []byte{0xf0, 0x9d, 0x84, 0x9e}, i.e. '𝄞').

Usage

Create a parser:

var p jsonstream.Parser

Optionally change the default configuration (default is strict JSON):

p.AllowComments = true
p.AllowTrailingCommas = true

Call the Tokenize method with a byte slice to obtain an iterator over a sequence of tokens:

for tok := range p.Tokenize(input) {
	...
}

If you would prefer to pull tokens one-by-one rather than looping, you can use iter.Pull.

Errors are reported via error tokens, for which IsError(token.Kind) is true and token.AsError() returns a non-nil error value. These tokens have their ErrorMsg field set. JSONStream does not automatically halt on errors.

JSONStream always yields at least one error token for any input that is not valid JSON. This includes input with mismatched {}[].

Parsing numeric values

The JSON standard specifies only the syntactic format of numeric literals. The interpretation of very large and very small values may therefore vary. JSONStream does not automatically parse numeric literals and so does not force any particular handling of out of range literals or other edge cases.

The convenience methods AsInt, AsInt32, AsInt64, AsFloat32, and AsFloat64 are provided for parsing numeric values. These methods add decode errors to the associated Parser object if a value is out of range. Decode errors can be accessed and manipulated via the PopDecodeErrorIf, DecodeError, and LastDecodeError methods of Parser.

If none of the As* methods has the desired behavior, the Value field of a Token struct may be accessed directly in order to implement custom parsing of numeric values.

Parsing arrays

The sequence of tokens for the array [1,2,3] is as follows:

Token{Kind: ArrayStart, ...}
Token{Kind: Number, Value: []byte("1"), ...}
Token{Kind: Number, Value: []byte("2"), ...}
Token{Kind: Number, Value: []byte("3"), ...}
Token{Kind: ArrayEnd, ...}
Parsing objects

Within an object each token represents a value. The associated key is obtained via the Key field. The sequence of tokens for the object {"foo": "bar", "baz": "amp"} is as follows:

Token{Kind: ObjectStart, ...}
Token{Kind: String, Key: []byte("foo"), Value: []byte("bar"), ...}
Token{Kind: String, Key: []byte("baz"), Value: []byte("amp"), ...}
Token{Kind: ObjectEnd, ...}

The KeyAsString method can be used to obtain a token's key as a string.

Source position information

Each token has Line and Col fields for the start of the token, and Start and End fields giving the indices of the first and last byte of the token in the input.

Performance

JSONStream is written in a simple and straightforward style. It should perform acceptably for most purposes, but it is not intended to be an ultra high performance parsing library (such as e.g. json-iterator). Informal benchmarking suggests that performance is a little better than encoding/json (though much depends on whether and how you construct a parsed representation of the input).

Examples

Parse an array of integers
import (
	"errors"
	"github.com/addrummond/jsonstream"
)

func parseIntArray(input []byte) ([]int, error) {
	state := 0
	ints := make([]int, 0)
	var p jsonstream.Parser
	for t := range p.Tokenize(input) {
		if err := t.AsError(); err != nil {
			return nil, err
		}

		if state == 0 {
			state++
			if t.Kind != jsonstream.ArrayStart {
				return nil, errors.New("Expected opening '['")
			}
			continue
		}

		if t.Kind == jsonstream.ArrayEnd {
			return ints, nil
		}
		if t.Kind == jsonstream.Number {
			ints = append(ints, t.AsInt())
			continue
		}

		return nil, errors.New("Expected integer or closing ']'")
	}

	return ints, p.DecodeError()
}
Parse an object with string values
import (
	"errors"
	"github.com/addrummond/jsonstream"
)

func parseObjectWithStringValues(input []byte) (map[string]string, error) {
	state := 0
	var p jsonstream.Parser
	dict := make(map[string]string)
	for t := range p.Tokenize(input) {
		if err := t.AsError(); err != nil {
			return nil, err
		}

		if state == 0 {
			state++
			if t.Kind != jsonstream.ObjectStart {
				return nil, errors.New("Expected opening '{'")
			}
			continue
		}

		if t.Kind == jsonstream.ObjectEnd {
			return dict, nil
		}
		if t.Kind == jsonstream.String {
			dict[t.KeyAsString()] = t.AsString()
			continue
		}

		return nil, errors.New("Expected string or closing '}'")
	}

	return dict, p.DecodeError()
}
Search for a value at a given path
import (
	"errors"
	"github.com/addrummond/jsonstream"
)

// Example call:
//
//	findByPath(
//	  []byte(`{"a": {"b": {"c": 1}}}`),
//	  []any{"a", "b", "c"}
//	) // returns "1", nil
func findByPath(input []byte, path []any) (string, error) {
	var p jsonstream.Parser
	for twp := range jsonstream.WithPaths(p.Tokenize(input)) {
		if err := twp.Token.AsError(); err != nil {
			return "", err
		}
		if jsonstream.PathEquals(twp.Path, path) {
			return string(twp.Token.Value), nil
		}
	}
	return "", errors.New("path not found")
}

Documentation

Overview

Package jsonstream provides a JSON tokenizer that reports line and column information for tokens. It optionally supports /* */ and // comment syntax as an extension to standard JSON.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsError

func IsError(k Kind) bool

IsError returns true for Error* token kinds and false for all others.

func IsNonIntegerDecodeError

func IsNonIntegerDecodeError(e error) bool

IsNonIntegerDecodeError Returns true iff a decode error results from an attempt to parse a non-integer numeric value as an integer.

func IsOutOfRangeDecodeError

func IsOutOfRangeDecodeError(e error) bool

IsOutOfRangeDecodeError returns true iff a decode error results from an attempt to parse a numeric value that is out of range.

func PathEquals added in v0.2.0

func PathEquals(path Path, elems []any) bool

PathEquals returns true iff the given path is equivalent to the given sequence of int and string values.

func PathToSlice added in v0.2.0

func PathToSlice(p Path) []any

PathToSlice converts a Path to a slice of int and string values.

func WithPaths added in v0.2.0

func WithPaths(tokens iter.Seq[Token]) iter.Seq[TokenWithPath]

WithPaths converts a sequence of Token values into a sequence of TokenWithPath values.

Types

type Kind

type Kind int

Kind represents the kind of a JSON token.

const (
	// A '{' token
	ObjectStart Kind = iota
	// A '}' token
	ObjectEnd
	// A '[' token]
	ArrayStart
	// A ']' token
	ArrayEnd
	// A string
	String Kind = iota
	// A number
	Number Kind = iota
	// A true boolean value
	True Kind = iota
	// A false boolean value
	False Kind = iota
	// A null value
	Null Kind = iota
	// A // or /* */ comment. If you need to distinguish between the two, you can
	// look at the second byte of the token's Value field.
	Comment Kind = iota
	// A parse error
	ErrorTrailingInput Kind = iota | isError
	// An unexpected EOF was encountered
	ErrorUnexpectedEOF
	// An unexpected token was encountered
	ErrorUnexpectedToken
	// There is a trailing comma in an object or array (not permitted by the JSON
	// standard).
	ErrorTrailingComma
	// There is a comma in an unexpected position (either immediately following '['
	// or '{' or immediately following another comma).
	ErrorUnexpectedComma
	// An unexpected character was encountered while tokenizing the input.
	ErrorUnexpectedCharacter
	// A numeric literal has leading zeros (not permitted by the JSON standard).
	// Tokens with this kind can also be treated as tokens of kind Number, if you
	// wish to be liberal in what you accept.
	ErrorLeadingZerosNotPermitted
	// A decimal point was not followed by a digit.
	ErrorExpectedDigitAfterDecimalPoint
	// The 'e' (or 'E') in a number was not followed by a digit.
	ErrorExpectedDigitFollowingEInNumber
	// A bad "\uXXXX" escape sequence was encountered in a string.
	ErrorBadUnicodeEscape
	// A control character not permitted by the JSON standard was found inside a
	// string.
	ErrorIllegalControlCharInsideString
	// UTF-8 decoding failing inside a string.
	ErrorUTF8DecodingErrorInsideString
)

func (Kind) String

func (k Kind) String() string

type Parser

type Parser struct {
	AllowComments       bool // Set to true to allow /* */ and // comments in the input
	AllowTrailingCommas bool // Set to true to allow trailing commas in arrays and objects (does not allow initial commas or multiple commas)
	// contains filtered or unexported fields
}

Parser is a streaming JSON parser. It is valid when default initialized.

func (*Parser) DecodeError

func (p *Parser) DecodeError() error

DecodeError returns the first decode error if any, or nil otherwise. A decode error is an error caused by invalid input to AsInt, AsInt32, AsInt64, AsFloat32, or AsFloat64.

func (*Parser) DecodeErrors

func (p *Parser) DecodeErrors() []error

DecodeErrors returns a slice containing all decode errors in the order they occurred. A decode error is an error occurring in AsInt, AsInt32, AsInt64, AsFloat32, or AsFloat64.

func (*Parser) LastDecodeError

func (p *Parser) LastDecodeError() error

LastDecodeError returns the last decode error if any, or nil otherwise. A decode error is an error caused by invalid input to AsInt, AsInt32, AsInt64, AsFloat32, or AsFloat64.

func (*Parser) PopDecodeErrorIf

func (p *Parser) PopDecodeErrorIf(predicate func(error) bool)

Removes the last decode error if it satisfies the predicate. This is useful with the supplied predicates IsNonIntegerDecodeError and IsOutOfRangeDecodeError. For example, if p.PopDecodeErrorIf(IsOutOfRangeDecodeError) is called immediately after AsInt(), then errors caused by out of range integers will be ignored.

func (*Parser) Tokenize

func (p *Parser) Tokenize(inp []byte) iter.Seq[Token]

Tokenize returns an iter.Seq[Token] from a byte slice input.

type Path added in v0.2.0

type Path struct {
	// contains filtered or unexported fields
}

Path represents a sequence of strings and integers >= 0 that gives the path to a value inside a JSON document. For example, the sequence {1, "foo", 0} is the path to document[1]["foo"][0].

func SliceToPath added in v0.2.0

func SliceToPath(elems []any) Path

SliceToPath converts a slice of int and string values to a Path.

func (Path) String added in v0.2.0

func (p Path) String() string

String() returns a string representation of the path. The string is a sequence of JavaScript indexation operators that can be used to access the value (e.g. [0]["foo"][1]]).

type Token

type Token struct {
	Line     int    // the line number of the first character of the token
	Col      int    // the column of the first character of the token
	Start    int    // the start position of the token in the input (byte index)
	End      int    // the end position of the token in the input (byte index)
	Key      []byte // the key of the token, or nil if none (may be a sub-slice of the input)
	Kind     Kind   // the kind of token
	Value    []byte // the value of the token (may be a sub-slice of the input).
	ErrorMsg string // error message set if IsError(token.Kind) == true
	// contains filtered or unexported fields
}

Token represents a JSON token.

func (*Token) AsBool

func (t *Token) AsBool() bool

AsBool returns the token's value as a bool. Its return value is defined only for tokens where Kind == True or Kind == False.

func (Token) AsError added in v0.1.2

func (t Token) AsError() error

AsError returns an error value if the token is an error token or nil otherwise.

func (*Token) AsFloat32

func (t *Token) AsFloat32() float32

AsFloat32 returns the token's value as a float32. Its return value is defined only for tokens where Kind == Number. The input is parsed using strconv.ParseFloat. If ParseFloat signals an error, a decode error is added to the associated Parser.

func (*Token) AsFloat64

func (t *Token) AsFloat64() float64

AsFloat64 returns the token's value as a float64. Its return value is defined only for tokens where Kind == Number. The input is parsed using strconv.ParseFloat. If ParseFloat signals an error, a decode error is added to the associated Parser.

func (*Token) AsInt

func (t *Token) AsInt() int

AsInt returns the token's value as an int. Its return value is defined only for tokens where Kind == Number. If the value is not an integer or does not fit in an int, then a decode error is added to the associated Parser. A decode error is not added for in-range integer values specified using floating point syntax (e.g. '1.5e1', which evaluates to 15). If the decode error satisfies IsNotAnIntegerError(err) or IsOutOfRangeError(err) then the returned value approximates the value of the float as closely as possible. The function may therefore be used to parse floating point values as the nearest int value.

For more on decode errors see the following methods of Parser: DecodeError(), LastDecodeError(), DecodeErrors(), PopDecodeErrorIf().

func (*Token) AsInt32

func (t *Token) AsInt32() int32

AsInt32 is like AsInt, but for int32.

func (*Token) AsInt64

func (t *Token) AsInt64() int64

AsInt64 is like AsInt, but for int64.

func (*Token) AsString

func (t *Token) AsString() string

AsString returns the token's value as a string. Its return value is defined only for tokens where Kind == String.

func (Token) Error added in v0.1.2

func (t Token) Error() string

func (*Token) KeyAsString

func (t *Token) KeyAsString() string

KeyAsString returns the token's associated object Key as a string.

func (Token) String

func (t Token) String() string

type TokenWithPath added in v0.2.0

type TokenWithPath struct {
	Token Token
	Path  Path
}

func (TokenWithPath) String added in v0.2.0

func (twp TokenWithPath) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL