gobls

package module
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2019 License: MIT Imports: 3 Imported by: 4

README

gobls

Gobls is a buffered line scanner for Go.

GoDoc

Description

Similar to bufio.Scanner, but wraps bufio.Reader.ReadLine so lines of arbitrary length can be scanned. It uses a hybrid approach so that in most cases, when lines are not unusually long, the fast code path is taken. When lines are unusually long, it uses the per-scanner pre-allocated byte slice to reassemble the fragments into a single slice of bytes.

Example

Enumerating lines from an io.Reader (drop in replacement for bufio.Scanner)

When you have an io.Reader that you want to enumerate, normally you wrap it in bufio.Scanner. This library is a drop in replacement for this particular circumstance, and you can change from bufio.NewScanner(r) to gobls.NewScanner(r), and no longer have to worry about token too long errors.

    var lines, characters int
    ls := gobls.NewScanner(os.Stdin)
    for ls.Scan() {
        lines++
        characters += len(ls.Bytes())
    }
    if err:= ls.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "cannot scan:", err)
    }
    fmt.Println("Counted",lines,"lines and",characters,"characters.")
Enumerating lines from []byte

If you already have a slice of bytes that you want to enumerate lines for, it is much more performant to wrap that byte slice with gobls.NewBufferScanner(buf) than to wrap the slice in a io.Reader and call either the above or bufio.NewScanner.

    var lines, characters int
    ls := gobls.NewBufferScanner(buf)
    for ls.Scan() {
        lines++
        characters += len(ls.Bytes())
    }
    if err:= ls.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "cannot scan:", err)
    }
    fmt.Println("Counted",lines,"lines and",characters,"characters.")

Performance

On my test system, gobls scanner takes from 2% to nearly 40% longer than bufio scanner, depending on the length of the lines to be scanned. The 40% longer times were only observed when line lengths were bufio.MaxScanTokenSize bytes long. Usually the performance penalty is 2% to 15% of bufio measurements.

Run go test -bench=. on your system for comparison. I'm sure the testing method could be improved. Suggestions are welcomed.

For circumstances where there is no concern about enumerating lines whose lengths are longer than the max token length from bufio, then I recommend using the standard library. However if you already have a slice of bytes, this library is much more performant than the equivalent bufio.NewScanner(bytes.NewReader(buf)).

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/karrick/gobls
BenchmarkScannerAverage/bufio-12    10000000    198  ns/op  0  B/op  0  allocs/op
BenchmarkScannerAverage/reader-12   10000000    199  ns/op  0  B/op  0  allocs/op
BenchmarkScannerAverage/buffer-12   10000000    122  ns/op  0  B/op  0  allocs/op
BenchmarkScannerShort/bufio-12      30000000   45.4  ns/op  0  B/op  0  allocs/op
BenchmarkScannerShort/reader-12     30000000   56.5  ns/op  0  B/op  0  allocs/op
BenchmarkScannerShort/buffer-12     50000000   37.7  ns/op  0  B/op  0  allocs/op
BenchmarkScannerLong/bufio-12        2000000    614  ns/op  0  B/op  0  allocs/op
BenchmarkScannerLong/reader-12       2000000    628  ns/op  0  B/op  0  allocs/op
BenchmarkScannerLong/buffer-12       5000000    379  ns/op  0  B/op  0  allocs/op
BenchmarkScannerVeryLong/bufio-12     200000   9616  ns/op  0  B/op  0  allocs/op
BenchmarkScannerVeryLong/reader-12    100000  13177  ns/op  2  B/op  0  allocs/op
BenchmarkScannerVeryLong/buffer-12    200000   6163  ns/op  0  B/op  0  allocs/op
PASS
ok      github.com/karrick/gobls        159.336s

Documentation

Index

Constants

View Source
const DefaultBufferSize = 16 * 1024

DefaultBufferSize specifies the initial bytes size each gobls scanner will allocate to be used for aggregation of line fragments.

Variables

This section is empty.

Functions

This section is empty.

Types

type BufferScanner added in v1.3.0

type BufferScanner struct {
	// contains filtered or unexported fields
}

BufferScanner enumerates newline terminated strings from a provided slice of bytes faster than bufio.Scanner and gobls.Scanner. This is particular useful when a program already has the entire buffer in a slice of bytes. This structure uses newline as the line terminator, but returns nether the newline nor an optional carriage return from each discovered string.

func (*BufferScanner) Bytes added in v1.3.0

func (b *BufferScanner) Bytes() []byte

Bytes returns the byte slice that was just scanned. It does not return the terminating newline character, nor any optional preceding carriage return character.

func (*BufferScanner) Err added in v1.3.0

func (b *BufferScanner) Err() error

Err returns nil because scanning from a slice of bytes will never cause an error.

func (*BufferScanner) Scan added in v1.3.0

func (b *BufferScanner) Scan() bool

Scan will scan the text from the original slice of bytes, and return true if scanning ought to continue or false if scanning is complete, because of the end of the slice of bytes.

func (*BufferScanner) Text added in v1.3.0

func (b *BufferScanner) Text() string

Text returns the string representation of the byte slice returned by the most recent Scan call. It does not return the terminating newline character, nor any optional preceding carriage return character.

type Scanner

type Scanner interface {
	Bytes() []byte
	Err() error
	Scan() bool
	Text() string
}

Scanner provides an interface for reading newline-delimited lines of text. It is similar to bufio.Scanner, but wraps the ReadLine method of bufio.Reader so lines of arbitrary length can be scanned. Successive calls to the Scan method will step through the lines of a file, skipping the newline whitespace between lines.

Scanning stops unrecoverably at EOF, or at the first I/O error. Unlike bufio.Scanner, however, attempting to scan a line longer than bufio.MaxScanTokenSize will not result in an error, but will return the long line.

Also like bufio.Scanner, it is not necessary to check for errors by calling the Err method until after scanning stops, when the Scan method returns false.

This Scanner ought behave exactly like bufio.Scanner. All methods ought to have the exact same return values while stepping through the given the provided io.Reader.

func NewBufferScanner added in v1.3.0

func NewBufferScanner(buf []byte) Scanner

NewBufferScanner returns a BufferScanner that enumerates newline terminated strings from buf.

func NewScanner

func NewScanner(r io.Reader) Scanner

NewScanner returns a scanner that reads from the specified `io.Reader`. It allocates a scanning buffer with the default buffer size. This per-scanner buffer will grow to accomodate extremely long lines.

var lines, characters int
ls := gobls.NewScanner(os.Stdin)
for ls.Scan() {
    lines++
    characters += len(ls.Bytes())
}
if ls.Err() != nil {
    fmt.Fprintln(os.Stderr, "cannot scan:", ls.Err())
}
fmt.Println("Counted",lines,"and",characters,"characters.")

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL