checksum

package

v0.11.0 Latest Latest Go to latest Published: Mar 4, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/block/spirit

Links

Open Source Insights

README ¶

Checksum

Checksums validate data consistency between two tables. During schema changes, this means comparing the original table with its _new counterpart. For move operations, checksums verify consistency between source and destination tables.

Key Features

Column intersection: The checksum operates only on the intersection of non-generated columns between tables, enabling schema changes where columns have been added, removed, or modified.
Type normalization: A CAST operation converts columns to a comparable type before comparison. This enables comparisons when data types have changed and their string representations differ (e.g., TIMESTAMP vs. TIMESTAMP(6)).
Automatic repair: When inconsistencies are detected, the checksum automatically repairs differences by recopying affected chunks.
Parallel execution: Checksums process chunks concurrently across multiple threads for efficient handling of large tables.
Consistent snapshot: A brief table lock establishes a consistent snapshot before being released. The checksum remains immune to concurrent modifications during execution.
Server-side execution: The checksum computation is pushed down to MySQL, with each chunk returning only a CRC32 value and row count to Spirit. This minimizes network overhead and is significantly more efficient than approaches that extract all data for client-side comparison.

Why Checksums Matter

Checksums are a defensive feature against bugs. While Spirit is designed to correctly copy and apply data changes, subtle data corruption can occur during online operations in many ways.

Naive implementations that only compare row counts fail to catch most of these problems—validating the actual data is essential. Common issues include:

Trailing space handling: Storage engines and column types may handle trailing spaces inconsistently
Special character mangling: Character encoding issues can corrupt special characters during copy operations
Character set mishandling: Converting between character sets (e.g., latin1 → utf8mb4) can introduce subtle corruption
Timezone conversions: Timestamp values may be incorrectly converted between timezones
Lost updates: Race conditions or replication lag can cause updates to be missed during the copy process
Type conversion edge cases: Implicit type conversions may produce unexpected results (e.g., floating point precision)
NULL mangling: NULLs can be incorrectly replaced by empty strings during data operations

While we do our best to prevent such bugs, we also want to be pedantic when it comes to data integrity. In most cases we have observed that the checksum process takes about 10% of the time as the copy-rows stage, which makes it an easy cost to justify.

There are also some known cases where a checksum failure is not a bug. This includes adding a unique index on non-unique data, or a lossy data type conversion (e.g., VARCHAR(100) → VARCHAR(10) when records exist requiring more than 10 characters). Both are important cases to handle, and prevent a cutover operation from executing.

Implementations

The checksum package contains two implementations:

SingleChecker - Compares two tables on the same MySQL server (for schema changes, or 1:1 moves)
DistributedChecker - Compares a source table against multiple distributed target databases (for sharded scenarios)

Both implementations use the same underlying checksum algorithm: CRC32 with XOR aggregation. This technique computes a checksum for each chunk of rows and can efficiently detect differences without comparing individual rows.

Checksum Algorithm

The checksum is computed using (simplified version):

SELECT BIT_XOR(CRC32(CONCAT(...))) as checksum, COUNT(*) as c 
FROM table 
WHERE <chunk_range>

This approach:

Computes a CRC32 hash for each row (using concatenated column values)
Aggregates the row checksums using XOR (BIT_XOR)
Provides both a checksum value and row count for verification

The actual implementation includes additional handling:

NULL normalization: Uses IFNULL() and ISNULL() to ensure NULLs are consistently represented
Type casting: Applies CAST operations to convert columns to the target table's type for comparable string representations

The CRC32 + XOR aggregate technique for table checksumming was pioneered by pt-table-checksum from Percona Toolkit, which established this as a reliable method for verifying data consistency in MySQL. This same approach has since been adopted by other database tools, including TiDB's data migration and verification utilities, demonstrating its effectiveness for distributed database scenarios.

Documentation ¶

Overview ¶

Package checksum provides online checksum functionality. Two tables on the same MySQL server can be compared with only an initial lock. It is not in the row/ package because it requires a replClient to be passed in, which would cause a circular dependency.

Index ¶

type Checker
- func NewChecker(db *sql.DB, chunker table.Chunker, feed *repl.Client, config *CheckerConfig) (Checker, error)
type CheckerConfig
- func NewCheckerDefaultConfig() *CheckerConfig
type DistributedChecker
type SingleChecker

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Checker ¶

type Checker interface {
	// Run performs the checksum operation.
	Run(ctx context.Context) error
	GetProgress() string
	StartTime() time.Time
	ExecTime() time.Duration
}

func NewChecker ¶

func NewChecker(db *sql.DB, chunker table.Chunker, feed *repl.Client, config *CheckerConfig) (Checker, error)

NewChecker creates a new checksum object.

type CheckerConfig ¶

type CheckerConfig struct {
	Concurrency     int
	TargetChunkTime time.Duration
	DBConfig        *dbconn.DBConfig
	Logger          *slog.Logger
	FixDifferences  bool
	Watermark       string // optional; defines a watermark to start from
	MaxRetries      int
	Applier         applier.Applier // optional; indicates it is a distributed checker
}

func NewCheckerDefaultConfig ¶

func NewCheckerDefaultConfig() *CheckerConfig

type DistributedChecker ¶ added in v0.10.1

type DistributedChecker struct {
	sync.Mutex
	// contains filtered or unexported fields
}

func (*DistributedChecker) ChecksumChunk ¶ added in v0.10.1

func (c *DistributedChecker) ChecksumChunk(ctx context.Context, trxPool *dbconn.TrxPool, chunk *table.Chunk) error

func (*DistributedChecker) ExecTime ¶ added in v0.10.1

func (c *DistributedChecker) ExecTime() time.Duration

func (*DistributedChecker) GetProgress ¶ added in v0.10.1

func (c *DistributedChecker) GetProgress() string

GetProgress returns the progress of the checker this is really just a proxy to the chunker progress.

func (*DistributedChecker) Run ¶ added in v0.10.1

func (c *DistributedChecker) Run(ctx context.Context) error

func (*DistributedChecker) StartTime ¶ added in v0.10.1

func (c *DistributedChecker) StartTime() time.Time

type SingleChecker ¶ added in v0.10.1

type SingleChecker struct {
	sync.Mutex
	// contains filtered or unexported fields
}

func (*SingleChecker) ChecksumChunk ¶ added in v0.10.1

func (c *SingleChecker) ChecksumChunk(ctx context.Context, trxPool *dbconn.TrxPool, chunk *table.Chunk) error

func (*SingleChecker) ExecTime ¶ added in v0.10.1

func (c *SingleChecker) ExecTime() time.Duration

func (*SingleChecker) GetProgress ¶ added in v0.10.1

func (c *SingleChecker) GetProgress() string

GetProgress returns the progress of the checker this is really just a proxy to the chunker progress.

func (*SingleChecker) Run ¶ added in v0.10.1

func (c *SingleChecker) Run(ctx context.Context) error

func (*SingleChecker) StartTime ¶ added in v0.10.1

func (c *SingleChecker) StartTime() time.Time

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL