Documentation
¶
Overview ¶
Package diff provides functionality for comparing two sorted data streams and identifying differences between them. It operates efficiently on pre-sorted input channels and reports items that exist in only one stream or both streams.
The package supports three main use cases:
- Generic comparison using Generic() with custom comparison functions
- Ordered type comparison using Ordered() with built-in comparison operators
- String comparison using Strings() for backward compatibility
All diff operations assume that input channels provide data in sorted order. This assumption is not validated for performance reasons.
Example usage:
func printDifferences(a, b <-chan string, aErr, bErr <-chan error) { result, err := diff.Strings(context.Background(), a, b, aErr, bErr, func(d diff.Delta, s string) error { fmt.Printf("%s %s\n", d, s) return nil }) if err != nil { log.Fatal(err) } fmt.Printf("Compared %d items\n", result.TotalA+result.TotalB-result.Common) }
The package is designed to work efficiently with the extsort library's sorted output streams for comparing large datasets.
Index ¶
- Constants
- func PrintDiff[T any](d Delta, s T) error
- type CompareFunc
- type Delta
- type Result
- func Generic[T any](ctx context.Context, aChan, bChan <-chan T, aErrChan, bErrChan <-chan error, ...) (r Result, err error)
- func Ordered[T cmp.Ordered](ctx context.Context, aChan, bChan <-chan T, aErrChan, bErrChan <-chan error, ...) (r Result, err error)
- func Strings(ctx context.Context, aChan, bChan <-chan string, ...) (r Result, err error)
- type ResultFunc
- type StringChanResult
- type StringResultFunc
Constants ¶
const ( // NEW indicates an item that exists only in the second stream (B). // This represents a "new" or "added" item when comparing A to B. NEW = iota // + // OLD indicates an item that exists only in the first stream (A). // This represents an "old" or "removed" item when comparing A to B. OLD // - )
Variables ¶
This section is empty.
Functions ¶
Types ¶
type CompareFunc ¶ added in v1.2.0
CompareFunc defines a comparison function for ordering items of type T. It should return a negative value if the first argument is less than the second, zero if they are equal, and a positive value if the first is greater than the second. This follows the same convention as strings.Compare and cmp.Compare.
type Delta ¶
type Delta int
Delta represents the type of difference found when comparing two sorted streams. It indicates whether an item is unique to the first stream (OLD) or second stream (NEW).
type Result ¶
type Result struct { // ExtraA is the count of items that exist only in stream A (OLD items) ExtraA uint64 // ExtraB is the count of items that exist only in stream B (NEW items) ExtraB uint64 // TotalA is the total count of items processed from stream A TotalA uint64 // TotalB is the total count of items processed from stream B TotalB uint64 // Common is the count of items that exist in both streams Common uint64 }
Result contains statistical information about the differences between two sorted streams. It provides counts of items that are unique to each stream as well as common items.
func Generic ¶ added in v1.2.0
func Generic[T any](ctx context.Context, aChan, bChan <-chan T, aErrChan, bErrChan <-chan error, compareFunc CompareFunc[T], resultFunc ResultFunc[T]) (r Result, err error)
Generic performs a diff operation on two sorted channels of any comparable type T. It compares items from both channels using the provided comparison function and calls resultFunc for each item that exists in only one channel (differences).
Parameters:
- ctx: Context for cancellation and timeout control
- aChan, bChan: Sorted channels to compare (MUST be pre-sorted)
- aErrChan, bErrChan: Error channels corresponding to each data channel
- compareFunc: Function that returns <0, 0, or >0 for ordering comparison
- resultFunc: Callback function called for each difference found
Returns statistical information about the comparison and any errors encountered. The function assumes both input channels provide items in sorted order according to the comparison function. This assumption is not validated for performance reasons.
func Ordered ¶ added in v1.2.0
func Ordered[T cmp.Ordered](ctx context.Context, aChan, bChan <-chan T, aErrChan, bErrChan <-chan error, resultFunc ResultFunc[T]) (r Result, err error)
Ordered performs a diff operation on two sorted channels of cmp.Ordered types. It uses the standard comparison operators (<, ==, >) for ordering, making it convenient for built-in types like numbers and strings. This is a wrapper around Generic that provides the comparison function automatically.
Parameters:
- ctx: Context for cancellation and timeout control
- aChan, bChan: Sorted channels to compare (MUST be pre-sorted)
- aErrChan, bErrChan: Error channels corresponding to each data channel
- resultFunc: Callback function called for each difference found
Returns statistical information about the comparison and any errors encountered. The channels must provide items in ascending sorted order.
func Strings ¶
func Strings(ctx context.Context, aChan, bChan <-chan string, aErrChan, bErrChan <-chan error, resultFunc StringResultFunc) (r Result, err error)
Strings performs a diff operation on two sorted string channels. This is a convenience function that uses lexicographic string comparison. It's equivalent to calling Ordered[string] but uses the StringResultFunc type for backward compatibility.
Parameters:
- ctx: Context for cancellation and timeout control
- aChan, bChan: Sorted string channels to compare (MUST be pre-sorted)
- aErrChan, bErrChan: Error channels corresponding to each string channel
- resultFunc: Callback function called for each string difference found
Returns statistical information about the comparison and any errors encountered.
type ResultFunc ¶ added in v1.2.0
ResultFunc is a generic callback function type for processing diff results. It is called once for each item that appears in only one of the two streams. The Delta parameter indicates which stream the item belongs to (NEW or OLD). The T parameter contains the actual item value. If the function returns an error, the diff operation will be terminated.
type StringChanResult ¶
type StringChanResult struct { // D indicates whether the string is NEW (only in stream B) or OLD (only in stream A) D Delta // S contains the actual string value that differs between streams S string }
StringChanResult holds a single diff result from a string comparison. It contains both the difference type (NEW/OLD) and the actual string value. This type is used with StringResultChan to enable parallel processing of diff results.
type StringResultFunc ¶
type StringResultFunc ResultFunc[string]
StringResultFunc is a type alias for ResultFunc[string] that provides backward compatibility with the original string-specific diff API.
func StringResultChan ¶
func StringResultChan() (StringResultFunc, chan *StringChanResult)
StringResultChan creates a channel-based result processing system for string diffs. It returns a StringResultFunc that can be passed to diff.Strings() and a channel for consuming the results in a separate goroutine. This enables parallel processing where the diff operation runs in one goroutine while results are processed in another.
Returns:
- StringResultFunc: A callback function to pass to diff.Strings()
- chan *StringChanResult: A channel to receive diff results from
The caller is responsible for closing the returned channel when done.