Documentation
¶
Overview ¶
Package duplicate provides memo duplicate detection for P2-C002.
Package duplicate - detector implementation for P2-C002.
Package duplicate - similarity calculation for P2-C002.
Index ¶
- Constants
- Variables
- func CalculateWeightedSimilarity(b *Breakdown, w Weights) float64
- func CosineSimilarity(a, b []float32) float64
- func ExtractTitle(content string) string
- func FindSharedTags(tags1, tags2 []string) []string
- func TagCoOccurrence(tags1, tags2 []string) float64
- func TimeProximity(newTime, candidateTime time.Time) float64
- func Truncate(content string, maxLen int) string
- type Breakdown
- type DetectRequest
- type DetectResponse
- type DuplicateDetector
- type SimilarMemo
- type Weights
Constants ¶
const ( DuplicateThreshold = 0.9 // >90% = duplicate RelatedThreshold = 0.7 // 70-90% = related DefaultTopK = 5 )
Thresholds for duplicate detection.
const TimeDecayDays = 7
TimeDecayDays is the decay period for time proximity calculation.
Variables ¶
var DefaultWeights = Weights{
Vector: 0.5,
TagCoOccur: 0.3,
TimeProx: 0.2,
}
DefaultWeights are the default weights for similarity calculation.
Functions ¶
func CalculateWeightedSimilarity ¶
CalculateWeightedSimilarity computes weighted similarity from breakdown.
func CosineSimilarity ¶
CosineSimilarity calculates cosine similarity between two vectors.
func ExtractTitle ¶
ExtractTitle extracts title from memo content (first line).
func FindSharedTags ¶
FindSharedTags returns tags that appear in both slices.
func TagCoOccurrence ¶
TagCoOccurrence calculates Jaccard similarity between two tag sets.
func TimeProximity ¶
TimeProximity calculates time proximity using exponential decay. Returns 1.0 for same day, decaying exponentially over TimeDecayDays.
Types ¶
type Breakdown ¶
type Breakdown struct {
Vector float64 `json:"vector"`
TagCoOccur float64 `json:"tag_co_occur"`
TimeProx float64 `json:"time_prox"`
}
Breakdown shows how similarity was calculated.
type DetectRequest ¶
type DetectRequest struct {
UserID int32 `json:"user_id"`
Title string `json:"title"`
Content string `json:"content"`
Tags []string `json:"tags,omitempty"`
TopK int `json:"top_k,omitempty"` // default 5
}
DetectRequest contains input for duplicate detection.
type DetectResponse ¶
type DetectResponse struct {
HasDuplicate bool `json:"has_duplicate"`
HasRelated bool `json:"has_related"`
Duplicates []SimilarMemo `json:"duplicates,omitempty"`
Related []SimilarMemo `json:"related,omitempty"`
LatencyMs int64 `json:"latency_ms"`
}
DetectResponse contains detection results.
type DuplicateDetector ¶
type DuplicateDetector interface {
// Detect finds duplicate and related memos for given content.
Detect(ctx context.Context, req *DetectRequest) (*DetectResponse, error)
// Merge merges source memo into target memo.
Merge(ctx context.Context, userID int32, sourceID, targetID string) error
// Link creates a bidirectional relation between two memos.
Link(ctx context.Context, userID int32, memoID1, memoID2 string) error
}
DuplicateDetector detects duplicate and related memos.
func NewDuplicateDetector ¶
func NewDuplicateDetector(s *store.Store, embedding ai.EmbeddingService, model string) DuplicateDetector
NewDuplicateDetector creates a new DuplicateDetector.
func NewDuplicateDetectorWithWeights ¶
func NewDuplicateDetectorWithWeights(s *store.Store, embedding ai.EmbeddingService, model string, weights Weights) DuplicateDetector
NewDuplicateDetectorWithWeights creates a detector with custom weights.
type SimilarMemo ¶
type SimilarMemo struct {
ID string `json:"id"`
Name string `json:"name"`
Title string `json:"title"`
Snippet string `json:"snippet"`
Similarity float64 `json:"similarity"`
Level string `json:"level"` // "duplicate" or "related"
Breakdown *Breakdown `json:"breakdown,omitempty"`
}
SimilarMemo represents a memo similar to the input.