difflib

package
v2.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 3, 2026 License: Apache-2.0, Apache-2.0 Imports: 6 Imported by: 0

README

internal/difflib

This is an internalized and modernized copy of github.com/pmezard/go-difflib, a partial port of Python's difflib module for generating textual diffs.

Original Source

Source repository: github.com/pmezard/go-difflib Original license: BSD 3-Clause License (see LICENSE) Copyright: 2013 Patrick Mezard Maintenance status: ⚠️ No longer maintained (archived by author)

go-difflib provides tools to compare sequences of strings and generate textual diffs in unified or context format. It implements Python's SequenceMatcher class and unified_diff()/context_diff() functions.

Why Internalized

This fork of testify maintains zero external dependencies for the core assertion packages. By internalizing go-difflib, we:

  1. Eliminate the external dependency on an unmaintained package (last updated 2014)
  2. Gain full control to apply modernizations aligned with our go1.24 target
  3. Can apply targeted fixes and optimizations specific to testify's use cases

Modernizations Applied

This internalized copy has been significantly modernized and refactored from the original go-difflib codebase:

Go Language Features (Go 1.21+)
  • Built-in functions: Removed custom min() and max() functions in favor of Go 1.21+ built-ins
  • Modern operators: Used -- instead of -= 1 for decrement operations
  • Efficient conversions: Used strconv.Itoa() instead of fmt.Sprintf("%d", ...) for integer-to-string conversion
  • Buffer handling: Used bytes.Buffer.String() instead of string(bytes.Buffer.Bytes())
  • Modern initialization: Used new(bytes.Buffer) instead of &bytes.Buffer{}
Code Organization & Complexity Reduction
  • Function extraction: Refactored complex functions by extracting helper functions:
    • writeGroup() - Handles writing diff groups
    • writeEqual() - Writes unchanged lines
    • writeReplaceOrDelete() - Writes deleted/replaced lines (prefix -)
    • writeReplaceOrInsert() - Writes inserted/replaced lines (prefix +)
  • Method reorganization: Reordered methods for better logical flow (public methods first, then helpers)
  • Named constants: Added named constants for magic numbers (e.g., hundred = 100, maxDisplayElements = 200)
Code Quality
  • Godoc compliance: Updated all function comments to start with the function name for proper godoc generation
    • // Set two sequences// SetSeqs sets two sequences
    • // Return list of triples// GetMatchingBlocks return the list of triples
  • Linting compliance: Removed blank identifiers in range loops where value is unused
    • for s, _ := rangefor s := range
  • Modern control flow: Replaced if-else chains with switch statements for better readability
  • Simplified logic: Improved boolean expressions using De Morgan's laws
    • !(len(group) == 1 && group[0].Tag == 'e')(len(group) != 1 || group[0].Tag != 'e')
  • Struct literals: Simplified composite literals where types are inferred
    • OpCode{'e', 0, 1, 0, 1} instead of OpCode{Tag: 'e', I1: 0, I2: 1, J1: 0, J2: 1}
Documentation
  • Comment punctuation: Standardized comment formatting and added proper punctuation
  • Code clarity: Added inline comments for switch cases explaining tag meanings ('r', 'd', 'i', 'e')

API Compatibility

The internalized copy maintains full API compatibility with the original go-difflib while incorporating the modernizations above. All public functions and types work identically to the upstream version.

Key exports:

  • SequenceMatcher - Compares sequences of strings using the Ratcliff-Obershelp algorithm
  • UnifiedDiff / WriteUnifiedDiff() / GetUnifiedDiffString() - Generate unified diff format
  • ContextDiff / WriteContextDiff() / GetContextDiffString() - Generate context diff format
  • SplitLines() - Split strings on newlines while preserving them

Use in Testify

This package is used by testify's assertion functions to generate human-readable diffs when assertions fail, particularly for comparing strings, slices, and complex data structures.

The diff output helps developers quickly identify what changed between expected and actual values during test failures.

Future Enhancements

As an internalized dependency, this copy can receive targeted improvements:

  • Potential: Performance optimizations for large diffs
  • Potential: Enhanced diff algorithms for specific data types
  • Potential: Colorized output support (if implemented as enable/color module)

These enhancements would be difficult to incorporate if difflib remained an external, unmaintained dependency.

Maintenance

This internalized copy is maintained as part of github.com/go-openapi/testify/v2 and follows the same Go version requirements (currently go1.24). It does not track upstream go-difflib releases, as the original repository is no longer maintained and this copy has diverged through modernization and refactoring.

For issues or improvements specific to this internalized version, please file issues at: https://github.com/go-openapi/testify/issues

License

This code retains its original BSD 3-Clause License. See LICENSE for the full license text.

The original copyright and license terms are preserved in accordance with the BSD License requirements.

Documentation

Overview

Package difflib is a partial port of Python difflib module.

It provides tools to compare sequences of strings and generate textual diffs.

The following class and functions have been ported:

- SequenceMatcher

- unified_diff

Getting unified diffs was the main goal of the port. Keep in mind this code is mostly suitable to output text differences in a human friendly way, there are no guarantees generated diffs are consumable by patch(1).

This package was adopted from github.com/pmezard/go-difflib which is no longer maintained.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetUnifiedDiffString

func GetUnifiedDiffString(diff UnifiedDiff) (string, error)

GetUnifiedDiffString is like WriteUnifiedDiff but returns the diff a string.

func SplitLines

func SplitLines(s string) []string

SplitLines splits a string on "\n" while preserving them. The output can be used as input for UnifiedDiff and ContextDiff structures.

func WriteUnifiedDiff

func WriteUnifiedDiff(writer io.Writer, diff UnifiedDiff) error

WriteUnifiedDiff write the comparison between two sequences of lines. It generates the delta as a unified diff.

Unified diffs are a compact way of showing line changes and a few lines of context. The number of context lines is set by 'n' which defaults to three.

By default, the diff control lines (those with ---, +++, or @@) are created with a trailing newline. This is helpful so that inputs created from file.readlines() result in diffs that are suitable for file.writelines() since both the inputs and outputs have trailing newlines.

For inputs that do not have trailing newlines, set the lineterm argument to "" so that the output will be uniformly newline free.

The unidiff format normally has a header for filenames and modification 'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'. The modification times are normally expressed in the ISO 8601 format.

Types

type Match

type Match struct {
	A    int
	B    int
	Size int
}

type OpCode

type OpCode struct {
	Tag byte
	I1  int
	I2  int
	J1  int
	J2  int
}

type SequenceMatcher

type SequenceMatcher struct {
	IsJunk func(string) bool
	// contains filtered or unexported fields
}

SequenceMatcher compares sequence of strings. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (R-O doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people.

SequenceMatcher tries to compute a "human-friendly diff" between two sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the longest *contiguous* & junk-free matching subsequence. That's what catches peoples' eyes. The Windows(tm) windiff has another interesting notion, pairing up elements that appear uniquely in each sequence. That, and the method here, appear to yield more intuitive difference reports than does diff. This method appears to be the least vulnerable to synching up on blocks of "junk lines", though (like blank lines in ordinary text files, or maybe "<P>" lines in HTML files). That may be because this is the only method of the 3 that has a *concept* of "junk" <wink>.

Timing: Basic R-O is cubic time worst case and quadratic time expected case. SequenceMatcher is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear.

func NewMatcher

func NewMatcher(a, b []string) *SequenceMatcher

func (*SequenceMatcher) GetGroupedOpCodes

func (m *SequenceMatcher) GetGroupedOpCodes(n int) [][]OpCode

GetGroupedOpCodes isolates change clusters by eliminating ranges with no changes.

Return a generator of groups with up to n lines of context. Each group is in the same format as returned by GetOpCodes().

func (*SequenceMatcher) GetMatchingBlocks

func (m *SequenceMatcher) GetMatchingBlocks() []Match

GetMatchingBlocks return the list of triples describing matching subsequences.

Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and in j. It's also guaranteed that if (i, j, n) and (i', j', n') are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'. IOW, adjacent triples never describe adjacent equal blocks.

The last triple is a dummy, (len(a), len(b), 0), and is the only triple with n==0.

func (*SequenceMatcher) GetOpCodes

func (m *SequenceMatcher) GetOpCodes() []OpCode

GetOpCodes return the list of 5-tuples describing how to turn a into b.

Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the tuple preceding it, and likewise for j1 == the previous j2.

The tags are characters, with these meanings:

'r' (replace): a[i1:i2] should be replaced by b[j1:j2]

'd' (delete): a[i1:i2] should be deleted, j1==j2 in this case.

'i' (insert): b[j1:j2] should be inserted at a[i1:i1], i1==i2 in this case.

'e' (equal): a[i1:i2] == b[j1:j2].

func (*SequenceMatcher) SetSeq1

func (m *SequenceMatcher) SetSeq1(a []string)

SetSeq1 sets the first sequence to be compared. The second sequence to be compared is not changed.

SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence S against many sequences, use .SetSeq2(s) once and call .SetSeq1(x) repeatedly for each of the other sequences.

See also SetSeqs() and SetSeq2().

func (*SequenceMatcher) SetSeq2

func (m *SequenceMatcher) SetSeq2(b []string)

SetSeq2 sets the second sequence to be compared. The first sequence to be compared is not changed.

func (*SequenceMatcher) SetSeqs

func (m *SequenceMatcher) SetSeqs(a, b []string)

SetSeqs sets two sequences to be compared.

type UnifiedDiff

type UnifiedDiff struct {
	A        []string // First sequence lines
	FromFile string   // First file name
	FromDate string   // First file time
	B        []string // Second sequence lines
	ToFile   string   // Second file name
	ToDate   string   // Second file time
	Eol      string   // Headers end of line, defaults to LF
	Context  int      // Number of context lines
}

UnifiedDiff holds the unified diff parameters.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL