matcher

package
v0.2.5-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 15, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// PatternExplicit indicates explicit multipart patterns (pt1, part2, -1, -2)
	// These are always considered multipart without directory context validation.
	PatternExplicit = "explicit"
	// PatternLetter indicates ambiguous single-letter patterns (A, B, C)
	// These need directory context validation to confirm multipart status.
	PatternLetter = "letter"
	// PatternNone indicates no multipart pattern detected
	PatternNone = ""
)

Pattern type constants for multipart detection

Variables

This section is empty.

Functions

func CalculateOptimalScrapers

func CalculateOptimalScrapers(
	requestScrapers []string,
	configPriority []string,
	parsed *ParsedInput,
) []string

CalculateOptimalScrapers determines the optimal scraper list for a given input. This consolidates the scraper selection logic used by both /scrape and /rescrape endpoints, ensuring consistent behavior and preventing logic drift.

The function applies two optimizations when a URL is detected: 1. FILTER: Reduces scraper list to only URL-compatible scrapers 2. REORDER: Places hinted scraper first for best performance

Parameters:

  • requestScrapers: User's explicitly selected scrapers (can be empty)
  • configPriority: Default scraper priority from configuration
  • parsed: Result from ParseInput containing URL detection info (can be nil)

Returns the optimized scraper list to use for scraping.

func DetectPartSuffix

func DetectPartSuffix(nameWithoutExt, id string) (int, string, string)

DetectPartSuffix parses the portion of filename after the first occurrence of id and returns (number, suffix, patternType) where:

  • number: 0 for single-part, 1..N for part index
  • suffix: normalized string to append to base name (including leading dash)
  • patternType: "explicit" for unambiguous patterns, "letter" for ambiguous single-letter, "" for no pattern detected

func FilterScrapersForURL

func FilterScrapersForURL(userScrapers []string, parsed *ParsedInput) []string

FilterScrapersForURL filters a list of scrapers to only those compatible with a parsed URL. This helper is used by API endpoints to optimize scraper selection when URL is detected.

Parameters:

  • userScrapers: User's selected scrapers (can be empty to use all compatible)
  • parsed: Result from ParseInput containing CompatibleScrapers

Returns filtered scrapers or all compatible scrapers if userScrapers is empty. If no compatible scrapers exist, returns empty slice (caller should handle this case).

func GroupByID

func GroupByID(results []MatchResult) map[string][]MatchResult

GroupByID groups match results by their ID

func ReorderWithPriority

func ReorderWithPriority(scrapers []string, priority string) []string

ReorderWithPriority moves the priority scraper to the front of the list. This is useful when multiple compatible scrapers exist for a URL - the hinted scraper should be tried first for best performance.

Parameters:

  • scrapers: List of scraper names
  • priority: Scraper name to move to front

Returns reordered list with priority scraper first. If scrapers is empty, returns a single-item list with just the priority scraper.

Types

type MatchResult

type MatchResult struct {
	File             scanner.FileInfo
	ID               string // Extracted JAV ID (e.g., "IPX-535")
	PartNumber       int    // 0 = single-part, 1..N = part index
	PartSuffix       string // "-A", "-pt1", "-part2" (always with leading dash)
	IsMultiPart      bool   // Whether this is a multi-part file
	MatchedBy        string // "regex" or "builtin"
	MultipartPattern string // Pattern type: "explicit", "letter", or "" (see PatternExplicit, PatternLetter, PatternNone)
}

MatchResult represents a matched file with extracted ID

func FilterMultiPart

func FilterMultiPart(results []MatchResult) []MatchResult

FilterMultiPart filters results to only include multi-part files

func FilterSinglePart

func FilterSinglePart(results []MatchResult) []MatchResult

FilterSinglePart filters results to only include single-part files

func ValidateMultipartInDirectory

func ValidateMultipartInDirectory(results []MatchResult) []MatchResult

ValidateMultipartInDirectory validates letter-based multipart patterns by checking for sibling files in the same directory with the same ID. Files with ambiguous letter patterns (-A, -B, -C) are only marked as multipart if multiple files with the same movie ID exist in the same directory. This prevents false positives for files like "ABW-121-C.mp4" where -C means Chinese subtitles, not part 3.

type Matcher

type Matcher struct {
	// contains filtered or unexported fields
}

Matcher identifies JAV IDs from filenames

func NewMatcher

func NewMatcher(cfg *config.MatchingConfig) (*Matcher, error)

NewMatcher creates a new file matcher

func (*Matcher) Match

func (m *Matcher) Match(files []scanner.FileInfo) []MatchResult

Match extracts JAV IDs from a list of files

func (*Matcher) MatchFile

func (m *Matcher) MatchFile(file scanner.FileInfo) *MatchResult

MatchFile attempts to extract a JAV ID from a single file

func (*Matcher) MatchString

func (m *Matcher) MatchString(s string) string

MatchString is a helper to extract ID from a string directly

type ParsedInput

type ParsedInput struct {
	ID                 string   // Extracted movie ID
	ScraperHint        string   // Suggested scraper ("dmm", "r18dev", or "")
	IsURL              bool     // true if input was a URL
	CompatibleScrapers []string // List of scrapers that can handle this URL (if IsURL)
}

ParsedInput represents the result of parsing user input

func ParseInput

func ParseInput(input string, registry *models.ScraperRegistry) (*ParsedInput, error)

ParseInput determines if input is a URL or ID and extracts the movie ID. The parser is agnostic about URL patterns - it delegates URL detection to scrapers that implement the URLHandler interface. If no scraper handles the URL, the input is treated as a plain movie ID.

When input is a URL, the function also returns the list of all compatible scrapers that can handle the URL, avoiding redundant registry iteration in callers.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL