simhash

package
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 6, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package simhash implements SimHash algorithm for near-duplicate detection.

The original algorithm is taken from: https://github.com/yahoo/gryffin/blob/master/html-distance/feature.go Optimized implementation with performance improvements.

Original Copyright 2015, Yahoo Inc. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Distance

func Distance(a, b uint64) uint8

Distance return the similarity distance between two fingerprint.

func Fingerprint

func Fingerprint(r io.Reader, shingle int) uint64

Fingerprint is the original function signature for compatibility

Types

type Oracle

type Oracle struct {
	// contains filtered or unexported fields
}

func NewOracle

func NewOracle() *Oracle

NewOracle return an oracle that could tell if the fingerprint has been seen or not.

func (*Oracle) See

func (n *Oracle) See(f uint64) *Oracle

See asks the oracle to see the fingerprint.

func (*Oracle) Seen

func (n *Oracle) Seen(f uint64, r uint8) bool

Seen asks the oracle if anything closed to the fingerprint in a range (r) is seen before.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL