pipeline

package module
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2024 License: MIT Imports: 10 Imported by: 1

README

Go Reference Go Report Card Mentioned in Awesome Go Tests

Fork of script

This is a heavily restructured version of script, optimized for extending it with custom implementations. The original version of script can be found at: https://github.com/bitfield/script/.

Pipeline isn't intended to run by itself. It provides the core pipeline logic for (custom) implementations of script. An example implementation that closely replicates the original script API can be found at https://github.com/bartdeboer/script/.

import "github.com/bartdeboer/pipeline"

Magical gopher logo

What is script?

script is a Go library for doing the kind of tasks that shell scripts are good at: reading files, executing subprocesses, counting lines, matching strings, and so on.

Why shouldn't it be as easy to write system administration programs in Go as it is in a typical shell? script aims to make it just that easy.

Shell scripts often compose a sequence of operations on a stream of data (a pipeline). This is how script works, too.

This is one absolutely superb API design. Taking inspiration from shell pipes and turning it into a Go library with syntax this clean is really impressive.
Simon Willison

Read more: Scripting with Go

Quick start: Unix equivalents

If you're already familiar with shell scripting and the Unix toolset, here is a rough guide to the equivalent script operation for each listed Unix command.

Unix / shell script equivalent
(any program name) Exec
[ -f FILE ] IfExists
> WriteFile
>> AppendFile
$* Args
basename Basename
cat File / Concat
curl Do / Get / Post
cut Column
dirname Dirname
echo Echo
find FindFiles
grep Match / MatchRegexp
grep -v Reject / RejectRegexp
head First
jq JQ
ls ListFiles
sed Replace / ReplaceRegexp
sha256sum SHA256Sum / SHA256Sums
tail Last
tee Tee
uniq -c Freq
wc -l CountLines
xargs ExecForEach

Some examples

Let's see some simple examples. Suppose you want to read the contents of a file as a string:

contents, err := script.File("test.txt").String()

That looks straightforward enough, but suppose you now want to count the lines in that file.

numLines, err := script.File("test.txt").CountLines()

For something a bit more challenging, let's try counting the number of lines in the file that match the string Error:

numErrors, err := script.File("test.txt").Match("Error").CountLines()

But what if, instead of reading a specific file, we want to simply pipe input into this program, and have it output only matching lines (like grep)?

script.Stdin().Match("Error").Stdout()

Just for fun, let's filter all the results through some arbitrary Go function:

script.Stdin().Match("Error").FilterLine(strings.ToUpper).Stdout()

That was almost too easy! So let's pass in a list of files on the command line, and have our program read them all in sequence and output the matching lines:

script.Args().Concat().Match("Error").Stdout()

Maybe we're only interested in the first 10 matches. No problem:

script.Args().Concat().Match("Error").First(10).Stdout()

What's that? You want to append that output to a file instead of printing it to the terminal? You've got some attitude, mister. But okay:

script.Args().Concat().Match("Error").First(10).AppendFile("/var/log/errors.txt")

And if we'd like to send the output to the terminal as well as to the file, we can do that:

script.Echo("data").Tee().AppendFile("data.txt")

We're not limited to getting data only from files or standard input. We can get it from HTTP requests too:

script.Get("https://wttr.in/London?format=3").Stdout()
// Output:
// London: 🌦   +13°C

That's great for simple GET requests, but suppose we want to send some data in the body of a POST request, for example. Here's how that works:

script.Echo(data).Post(URL).Stdout()

If we need to customise the HTTP behaviour in some way, such as using our own HTTP client, we can do that:

script.NewPipe().WithHTTPClient(&http.Client{
	Timeout: 10 * time.Second,
}).Get("https://example.com").Stdout()

Or maybe we need to set some custom header on the request. No problem. We can just create the request in the usual way, and set it up however we want. Then we pass it to Do, which will actually perform the request:

req, err := http.NewRequest(http.MethodGet, "http://example.com", nil)
req.Header.Add("Authorization", "Bearer "+token)
script.Do(req).Stdout()

The HTTP server could return some non-okay response, though; for example, “404 Not Found”. So what happens then?

In general, when any pipe stage (such as Do) encounters an error, it produces no output to subsequent stages. And script treats HTTP response status codes outside the range 200-299 as errors. So the answer for the previous example is that we just won't see any output from this program if the server returns an error response.

Instead, the pipe “remembers” any error that occurs, and we can retrieve it later by calling its Error method, or by using a sink method such as String, which returns an error value along with the result.

Stdout also returns an error, plus the number of bytes successfully written (which we don't care about for this particular case). So we can check that error, which is always a good idea in Go:

_, err := script.Do(req).Stdout()
if err != nil {
	log.Fatal(err)
}

If, as is common, the data we get from an HTTP request is in JSON format, we can use JQ queries to interrogate it:

data, err := script.Do(req).JQ(".[0] | {message: .commit.message, name: .commit.committer.name}").String()

We can also run external programs and get their output:

script.Exec("ping 127.0.0.1").Stdout()

Note that Exec runs the command concurrently: it doesn't wait for the command to complete before returning any output. That's good, because this ping command will run forever (or until we get bored).

Instead, when we read from the pipe using Stdout, we see each line of output as it's produced:

PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.056 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.054 ms
...

In the ping example, we knew the exact arguments we wanted to send the command, and we just needed to run it once. But what if we don't know the arguments yet? We might get them from the user, for example.

We might like to be able to run the external command repeatedly, each time passing it the next line of data from the pipe as an argument. No worries:

script.Args().ExecForEach("ping -c 1 {{.}}").Stdout()

That {{.}} is standard Go template syntax; it'll substitute each line of data from the pipe into the command line before it's executed. You can write as fancy a Go template expression as you want here (but this simple example probably covers most use cases).

If there isn't a built-in operation that does what we want, we can just write our own, using Filter:

script.Echo("hello world").Filter(func (r io.Reader, w io.Writer) error {
	n, err := io.Copy(w, r)
	fmt.Fprintf(w, "\nfiltered %d bytes\n", n)
	return err
}).Stdout()
// Output:
// hello world
// filtered 11 bytes

The func we supply to Filter takes just two parameters: a reader to read from, and a writer to write to. The reader reads the previous stages of the pipe, as you might expect, and anything written to the writer goes to the next stage of the pipe.

If our func returns some error, then, just as with the Do example, the pipe's error status is set, and subsequent stages become a no-op.

Filters run concurrently, so the pipeline can start producing output before the input has been fully read, as it did in the ping example. In fact, most built-in pipe methods, including Exec, are implemented using Filter.

If we want to scan input line by line, we could do that with a Filter function that creates a bufio.Scanner on its input, but we don't need to:

script.Echo("a\nb\nc").FilterScan(func(line string, w io.Writer) {
	fmt.Fprintf(w, "scanned line: %q\n", line)
}).Stdout()
// Output:
// scanned line: "a"
// scanned line: "b"
// scanned line: "c"

And there's more. Much more. Read the docs for full details, and more examples.

A realistic use case

Let's use script to write a program that system administrators might actually need. One thing I often find myself doing is counting the most frequent visitors to a website over a given period of time. Given an Apache log in the Common Log Format like this:

212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"

we would like to extract the visitor's IP address (the first column in the logfile), and count the number of times this IP address occurs in the file. Finally, we might like to list the top 10 visitors by frequency. In a shell script we might do something like:

cut -d' ' -f 1 access.log |sort |uniq -c |sort -rn |head

There's a lot going on there, and it's pleasing to find that the equivalent script program is quite brief:

package main

import (
	"github.com/bitfield/script"
)

func main() {
	script.Stdin().Column(1).Freq().First(10).Stdout()
}

Let's try it out with some sample data:

16 176.182.2.191
 7 212.205.21.11
 1 190.253.121.1
 1 90.53.111.17

Documentation

See pkg.go.dev for the full documentation, or read on for a summary.

Sources

These are functions that create a pipe with a given contents:

Source Contents
Args command-line arguments
Do HTTP response
Echo a string
Exec command output
File file contents
FindFiles recursive file listing
Get HTTP response
IfExists do something only if some file exists
ListFiles file listing (including wildcards)
Post HTTP response
Slice slice elements, one per line
Stdin standard input

Filters

Filters are methods on an existing pipe that also return a pipe, allowing you to chain filters indefinitely. The filters modify each line of their input according to the following rules:

Filter Results
Basename removes leading path components from each line, leaving only the filename
Column Nth column of input
Concat contents of multiple files
Dirname removes filename from each line, leaving only leading path components
Do response to supplied HTTP request
Echo all input replaced by given string
Exec filtered through external command
ExecForEach execute given command template for each line of input
Filter user-supplied function filtering a reader to a writer
FilterLine user-supplied function filtering each line to a string
FilterScan user-supplied function filtering each line to a writer
First first N lines of input
Freq frequency count of unique input lines, most frequent first
Get response to HTTP GET on supplied URL
Join replace all newlines with spaces
JQ result of jq query
Last last N lines of input
Match lines matching given string
MatchRegexp lines matching given regexp
Post response to HTTP POST on supplied URL
Reject lines not matching given string
RejectRegexp lines not matching given regexp
Replace matching text replaced with given string
ReplaceRegexp matching text replaced with given string
SHA256Sums SHA-256 hashes of each listed file
Tee input copied to supplied writers

Note that filters run concurrently, rather than producing nothing until each stage has fully read its input. This is convenient for executing long-running commands, for example. If you do need to wait for the pipeline to complete, call Wait.

Sinks

Sinks are methods that return some data from a pipe, ending the pipeline and extracting its full contents in a specified way:

Sink Destination Results
AppendFile appended to file, creating if it doesn't exist bytes written, error
Bytes data as []byte, error
CountLines number of lines, error
Read given []byte bytes read, error
SHA256Sum SHA-256 hash, error
Slice data as []string, error
Stdout standard output bytes written, error
String data as string, error
Wait none
WriteFile specified file, truncating if it exists bytes written, error

What's new

Version New
v0.22.0 Tee, WithStderr
v0.21.0 HTTP support: Do, Get, Post
v0.20.0 JQ

Contributing

See the contributor's guide for some helpful tips if you'd like to contribute to the script project.

Gopher image by MariaLetta

Documentation

Overview

Package script aims to make it easy to write shell-type scripts in Go, for general system administration purposes: reading files, counting lines, matching strings, and so on.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BaseProgram added in v0.0.3

type BaseProgram struct {
	Stdin  io.Reader
	Stdout io.Writer
	Stderr io.Writer

	StartFn func() error
	// contains filtered or unexported fields
}

BaseProgram is for convenience to easily create new programs that only require a custom StartFn function.

func NewBaseProgram added in v0.0.3

func NewBaseProgram() *BaseProgram

func (*BaseProgram) Error added in v0.0.3

func (b *BaseProgram) Error() error

func (*BaseProgram) Exit added in v0.0.3

func (b *BaseProgram) Exit(err error) error

func (*BaseProgram) Fprint added in v0.0.3

func (b *BaseProgram) Fprint(a ...any) error

func (*BaseProgram) FprintStderr added in v0.0.3

func (b *BaseProgram) FprintStderr(a ...any)

func (*BaseProgram) SetError added in v0.0.3

func (b *BaseProgram) SetError(err error) error

func (*BaseProgram) SetStderr added in v0.0.3

func (b *BaseProgram) SetStderr(stderr io.Writer)

func (*BaseProgram) SetStdin added in v0.0.3

func (b *BaseProgram) SetStdin(stdin io.Reader)

func (*BaseProgram) SetStdout added in v0.0.3

func (b *BaseProgram) SetStdout(stdout io.Writer)

func (*BaseProgram) Start added in v0.0.3

func (b *BaseProgram) Start() error

type ExitError

type ExitError struct {
	Code    int
	Message string
}

func (*ExitError) Error

func (e *ExitError) Error() string

func (*ExitError) ExitCode

func (e *ExitError) ExitCode() int

func (*ExitError) Exited

func (e *ExitError) Exited() bool

func (*ExitError) String

func (e *ExitError) String() string

type Pipe

type Pipe struct {
	// contains filtered or unexported fields
}

Pipe provides a pipe that streams out what was streamed into it.

func NewPipe

func NewPipe() *Pipe

NewPipe initializes a new Pipe.

func NewReadOnlyPipe

func NewReadOnlyPipe(reader io.Reader) *Pipe

NewReaderPipe initializes a read-only pipe with the provided stream.

func (*Pipe) Close

func (p *Pipe) Close() error

Close closes the pipe output, useful for signaling no more writes.

func (*Pipe) IsClosed

func (p *Pipe) IsClosed() bool

IsClosed returns if the pipe is closed.

func (*Pipe) Read

func (p *Pipe) Read(b []byte) (int, error)

Read implements the io.Reader interface.

func (*Pipe) Write

func (p *Pipe) Write(b []byte) (int, error)

Write implements the io.Writer interface.

type Pipeline

type Pipeline struct {
	Stdin  io.Reader
	Stdout io.Writer
	Stderr io.Writer
	// contains filtered or unexported fields
}

Pipeline represents a pipeline object with an associated [ReadAutoCloser].

func NewPipeline

func NewPipeline() *Pipeline

NewPipeline creates a new pipe with an empty reader (use [Pipe.WithReader] to attach another reader to it).

func (*Pipeline) Add

func (p *Pipeline) Add(programs ...Program) *Pipeline

Add adds one or more programs to the pipeline.

func (*Pipeline) Bytes

func (p *Pipeline) Bytes() ([]byte, error)

Bytes returns the contents of the pipe as a []byte, or an error.

func (*Pipeline) Close

func (p *Pipeline) Close() error

Close closes the pipe's associated reader. This is a no-op if the reader is not an io.Closer.

func (*Pipeline) Error

func (p *Pipeline) Error() error

Error returns any error present on the pipe, or nil otherwise.

func (*Pipeline) ExitStatus

func (p *Pipeline) ExitStatus() int

ExitStatus returns the integer exit status of a previous command (for example run by [Pipe.Exec]). This will be zero unless the pipe's error status is set and the error matches the pattern “exit status %d”.

func (*Pipeline) Int

func (p *Pipeline) Int() (int, error)

Int returns the pipe's contents as an int, together with any error.

func (*Pipeline) Int64

func (p *Pipeline) Int64() (int64, error)

Int64 returns the pipe's contents as an int64, together with any error.

func (*Pipeline) IsClosed

func (p *Pipeline) IsClosed() bool

IsClosed returns if the pipeline's last reader is closed

func (*Pipeline) Pipe

func (p *Pipeline) Pipe(program Program) *Pipeline

Pipe (Filter) adds a new program to the pipeline and ties the output stream io.Reader of the previous program into the input stream io.Reader of the next program. It then reads the output stream io.Writer of the next program to be used for further processing within the pipeline.

Pipe runs concurrently, so its goroutine will not exit until the pipe has been fully read. Use [Pipe.Wait] to wait for all concurrent pipes to complete.

func (*Pipeline) Read

func (p *Pipeline) Read(b []byte) (int, error)

Read reads up to len(b) bytes from the pipe into b. It returns the number of bytes read and any error encountered. At end of file, or on a nil pipe, Read returns 0, io.EOF.

func (*Pipeline) Run

func (p *Pipeline) Run(programs ...Program) (int64, error)

Run adds one or more programs to the pipeline and/or runs the pipeline with all programs added to it.

func (*Pipeline) Scanner

func (p *Pipeline) Scanner(filter func(string, io.Writer)) *Pipeline

Scanner (FilterScan) sends the contents of the pipe to the function filter, a line at a time, and produces the result. filter takes each line as a string and an io.Writer to write its output to. See [Pipe.Filter] for concurrency handling.

func (*Pipeline) SetCombinedOutput added in v0.0.3

func (p *Pipeline) SetCombinedOutput(v bool) *Pipeline

SetCombinedOutput configures the pipeline to combine stderr with stdout

func (*Pipeline) SetError

func (p *Pipeline) SetError(err error) *Pipeline

SetError sets the error err on the pipe.

func (*Pipeline) SetExitOnError

func (p *Pipeline) SetExitOnError(v bool) *Pipeline

SetExitOnError configures the pipeline to exit when an error has occured

func (*Pipeline) Slice

func (p *Pipeline) Slice() ([]string, error)

Slice returns the pipe's contents as a slice of strings, one element per line, or an error.

An empty pipe will produce an empty slice. A pipe containing a single empty line (that is, a single \n character) will produce a slice containing the empty string as its single element.

func (*Pipeline) String

func (p *Pipeline) String() (string, error)

String returns the pipe's contents as a string, together with any error.

func (*Pipeline) Wait

func (p *Pipeline) Wait() *Pipeline

Wait reads the pipe to completion and discards the result. This is mostly useful for waiting until concurrent filters have completed (see [Pipe.Filter]).

func (*Pipeline) WithError

func (p *Pipeline) WithError(err error) *Pipeline

WithError sets the error err on the pipe.

func (*Pipeline) WithReader

func (p *Pipeline) WithReader(r io.Reader) *Pipeline

WithReader sets the pipe's input reader to r. Once r has been completely read, it will be closed if necessary.

func (*Pipeline) WithStderr

func (p *Pipeline) WithStderr(w io.Writer) *Pipeline

WithStderr redirects the standard error output for commands run via [Pipe.Exec] or [Pipe.ExecForEach] to the writer w, instead of going to the pipe as it normally would.

func (*Pipeline) WithStdout

func (p *Pipeline) WithStdout(w io.Writer) *Pipeline

WithStdout sets the pipe's standard output to the writer w, instead of the default os.Stdout.

type Program added in v0.0.3

type Program interface {
	Start() error
	SetError(err error) error
	Error() error
	Exit(err error) error
	SetStdin(stdin io.Reader)
	SetStdout(stdout io.Writer)
	SetStderr(stderr io.Writer)
}

func Scanner

func Scanner(filter func(string, io.Writer)) Program

Scanner is the scanner program that applies the specified filter line by line.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL