lsh

package
v1.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: MIT Imports: 5 Imported by: 0

README

lsh

Package lsh provides query helpers for FrogoDB server-side locality-sensitive hashing (LSH). It maps event fields from queries.Source into the direct client LSH commands:

  • Dedup extracts a string reference and returns a near-duplicate bucket ID.
  • Vector extracts ordered numeric attributes and returns a behavioral bucket ID for the vector.

The helpers accept small client interfaces. A *client.Client satisfies them, and tests can pass mocks that implement only LSHDedup, LSHVector, or Delete.

Strengths

  • Keeps LSH query helpers reusable: callers only need the small client interfaces used by this package, so tests can pass narrow mocks.
  • Leaves the LSH algorithms and bucket ownership on the FrogoDB server. The SDK stays focused on extracting source values and sending protocol-compatible LSHDedup or LSHVector commands.
  • Dedup avoids a server round trip when the reference value is empty or missing, which is useful for sparse event streams.
  • Vector preserves the caller-provided attribute order and sends an explicit []float64 vector, making feature layout easy to keep stable across callers.

Weaknesses / Tradeoffs

  • These helpers depend on server-side LSH support. They do not compute buckets locally, so behavior, index parameters, and availability come from the FrogoDB server handling the request.
  • Vector allocates a []float64 for the extracted attributes, and the direct client serializes that vector into a byte buffer before sending it. That is a simple and deterministic path, but it is not zero-allocation.
  • Missing or non-numeric attributes become 0 with queries.MapSource; callers that need stricter feature validation should check input before calling Vector.
  • Dedup treats an empty reference value as "no result" rather than an error. That is convenient for optional fields, but it can hide upstream extraction mistakes if empty values are unexpected.
  • Delete helpers remove the server-side helper entries addressed by the current request fields. They do not provide bulk cleanup for all related LSH buckets.

String Deduplication

DedupRequest.Reference is read with src.String. If the value is empty or missing, Dedup returns an empty DedupResult without calling the server. TTL, Namespace, and Reference are required.

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"github.com/FrogoAI/fdb-client/pkg/client"
	"github.com/FrogoAI/fdb-client/pkg/queries"
	"github.com/FrogoAI/fdb-client/pkg/queries/lsh"
)

func main() {
	c, err := client.New("localhost:3000")
	if err != nil {
		log.Fatal(err)
	}
	defer c.Close()

	src := queries.NewMapSource(map[string]any{
		"standard.email": "alice@example.com",
	})

	result, err := lsh.Dedup(context.Background(), c, lsh.DedupRequest{
		Namespace: "scoring",
		Reference: "standard.email",
		TTL:       24 * time.Hour,
	}, src)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(result.BucketID)
}

Dedup calls LSHDedup with group lsh_dedup. The server treats that helper group as namespace-scoped, so the same reference value may produce different bucket IDs in different namespaces.

Vector Clustering

VectorRequest.Attributes are read in order with src.Float and sent as one []float64 vector. TTL, Namespace, and at least one attribute are required. Missing or non-numeric attributes read as 0 when using queries.MapSource.

src := queries.NewMapSource(map[string]any{
	"amount":    125.50,
	"frequency": int64(7),
	"recency":   2.0,
})

result, err := lsh.Vector(ctx, c, lsh.VectorRequest{
	Namespace:  "scoring",
	Attributes: []string{"amount", "frequency", "recency"},
	TTL:        24 * time.Hour,
}, src)
if err != nil {
	return err
}

fmt.Println(result.BehavioralID)

Delete Helpers

Use DeleteDedup to remove the dedup entry for the current reference value. Use DeleteVector to remove the vector helper bucket for the request namespace.

err := lsh.DeleteDedup(ctx, c, lsh.DedupRequest{
	Namespace: "scoring",
	Reference: "standard.email",
	TTL:       time.Hour,
}, src)
if err != nil {
	return err
}

err = lsh.DeleteVector(ctx, c, lsh.VectorRequest{
	Namespace:  "scoring",
	Attributes: []string{"amount", "frequency", "recency"},
	TTL:        time.Hour,
}, src)

Documentation

Index

Constants

View Source
const SetDedup = "lsh_dedup"

SetDedup is the query-helper group for LSH dedup storage. The server scopes this group by request namespace when computing client-visible bucket IDs.

View Source
const SetVector = "lsh_vector"

SetVector is the set name for LSH vector storage.

Variables

This section is empty.

Functions

func DeleteDedup

func DeleteDedup(ctx context.Context, c deleteClient, req DedupRequest, src queries.Source) error

DeleteDedup removes the LSH dedup bucket entry for the given reference value. The server stores dedup data under the namespace with the reference value as key.

func DeleteVector

func DeleteVector(ctx context.Context, c deleteClient, req VectorRequest, src queries.Source) error

DeleteVector removes the LSH vector bucket entry for the given attribute vector.

Types

type DedupRequest

type DedupRequest struct {
	Namespace string        // FrogoDB namespace and server dedup scope for SetDedup
	Reference string        // source key for reference value (e.g., "standard.email")
	TTL       time.Duration // REQUIRED. Expiration for LSH bucket entries.
}

DedupRequest describes an LSH string deduplication query.

func (DedupRequest) Validate

func (r DedupRequest) Validate() error

Validate checks that required fields are set.

type DedupResult

type DedupResult struct {
	BucketID string // client-visible lowercase hex LSH dedup bucket identifier
}

DedupResult holds the outcome of an LSH deduplication query.

func Dedup

func Dedup(ctx context.Context, c dedupClient, req DedupRequest, src queries.Source) (DedupResult, error)

Dedup performs an LSH string deduplication query. It extracts the reference value from the source, sends it to the FrogoDB LSH dedup service, and returns the bucket ID.

Flow:

  1. value = src.String(req.Reference)
  2. Send OpLSHDedup to FrogoDB with group lsh_dedup and value
  3. Server computes bucket ID from req.Namespace and value
  4. Return DedupResult{BucketID: bucketID}

type VectorRequest

type VectorRequest struct {
	Namespace  string        // FrogoDB namespace for vector LSH index
	Attributes []string      // ordered source keys for vector components
	TTL        time.Duration // REQUIRED. Expiration for vector LSH bucket entries.
}

VectorRequest describes an LSH vector query for behavioural clustering.

func (VectorRequest) Validate

func (r VectorRequest) Validate() error

Validate checks that required fields are set.

type VectorResult

type VectorResult struct {
	BehavioralID string // LSH bucket identifier for this behavior vector
}

VectorResult holds the outcome of an LSH vector query.

func Vector

func Vector(ctx context.Context, c vectorClient, req VectorRequest, src queries.Source) (VectorResult, error)

Vector performs an LSH vector query. It extracts ordered attribute floats from the source, forms a vector, sends it to the FrogoDB LSH vector service, and returns the behavioural ID.

Flow:

  1. vec[i] = src.Float(req.Attributes[i]) for each attribute
  2. Send OpLSHVector to FrogoDB with vector bytes
  3. Server computes vector LSH -> behavioral_id
  4. Return VectorResult{BehavioralID: id}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL