slugs

package

v2.7.0 Latest Latest Go to latest Published: Nov 25, 2025 License: GPL-3.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ZaparooProject/zaparoo-core

Links

Open Source Insights

Documentation ¶

Index ¶

func ConvertRomanNumerals(s string) string
func ExpandAbbreviations(s string) string
func ExpandNumberWords(s string) string
func GenerateBigrams(s string) []string
func GenerateTrigrams(s string) []string
func IsBurmese(s string) bool
func IsKhmer(s string) bool
func IsLao(s string) bool
func IsThai(s string) bool
func JaccardSimilarity(set1, set2 []string) float64
func NormalizeDotSeparators(s string) string
func NormalizeOrdinals(s string) string
func NormalizePunctuation(s string) string
func NormalizeSymbolsAndSeparators(s string) string
func NormalizeToWords(input string) []string
func NormalizeUnicode(s string, ctx *pipelineContext) string
func NormalizeWidth(s string) string
func ParseGame(title string) string
func ParseMovie(title string) string
func ParseMusic(title string) string
func ParseTVShow(title string) string
func ParseWithMediaType(mediaType MediaType, title string) string
func Slugify(mediaType MediaType, input string) string
func SplitAndStripArticles(s string) string
func SplitTitle(title string) (mainTitle, secondaryTitle string, hasSecondary bool)
func StripEditionAndVersionSuffixes(s string) string
func StripLeadingArticle(s string) string
func StripMetadataBrackets(s string) string
func StripMovieSceneTags(s string) string
func StripMusicSceneTags(s string) string
func StripSceneTags(s string) string
func StripTrailingArticle(s string) string
type MediaType
type ScriptType
- func DetectScript(s string) ScriptType
type SlugifyResult
- func SlugifyWithTokens(mediaType MediaType, input string) SlugifyResult

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ConvertRomanNumerals ¶

func ConvertRomanNumerals(s string) string

ConvertRomanNumerals converts Roman numerals (II-XIX) to Arabic numbers. Note: X is intentionally NOT converted to avoid "Mega Man X" → "Mega Man 10".

Useful for:

Games: "Final Fantasy VII" → "Final Fantasy 7", "Street Fighter II" → "Street Fighter 2"
Movies: "Rocky III" → "Rocky 3"
Music: "Symphony No. IX" → "Symphony No. 9"

Examples:

"Final Fantasy VII" → "Final Fantasy 7"
"Street Fighter II" → "Street Fighter 2"
"Mega Man X" → "Mega Man X" (unchanged - X preserved)

Optimization: Performs case-insensitive matching without full-string case conversions, converting to lowercase directly during output.

func ExpandAbbreviations ¶

func ExpandAbbreviations(s string) string

ExpandAbbreviations expands common abbreviations found in titles. Uses word boundaries to avoid false matches (e.g., "versus" won't become "versuersus"). Handles two types of abbreviations:

Period-required: Only expand when period is present (e.g., "feat." but not "feat")
Flexible: Expand with or without period (e.g., "vs" or "vs.")

Useful for:

Games: "Super Mario Bros." → "Super Mario Brothers", "Mario vs DK" → "Mario versus DK"
Music: "Song feat. Artist" → "Song featuring Artist"
Movies: "Dr. Strangelove" → "Doctor Strangelove"

Examples:

"Mario vs Donkey Kong" → "Mario versus Donkey Kong"
"Super Mario Bros." → "Super Mario Brothers"
"Dr. Mario" → "Doctor Mario"
"St. Louis Blues" → "Saint Louis Blues"
"Song feat. Artist" → "Song featuring Artist"
"A great feat" → "A great feat" (not expanded - no period)

func ExpandNumberWords ¶

func ExpandNumberWords(s string) string

ExpandNumberWords expands number words (one, two, three, etc.) to their numeric forms. Handles words 1-20 in both forms:

"one" or "one." → "1"
"twenty" or "twenty." → "20"

Useful for:

Games: "Street Fighter Two" → "Street Fighter 2"
Movies: "Ocean's Eleven" → "Ocean's 11"
TV: "Chapter One" → "Chapter 1"

Examples:

"Game One" → "Game 1"
"Part Two" → "Part 2"
"Street Fighter Two" → "Street Fighter 2"

func GenerateBigrams ¶

func GenerateBigrams(s string) []string

GenerateBigrams creates overlapping 2-character chunks from a string. This is used for matching scripts that don't have word boundaries (Thai, Burmese, Khmer, Lao).

Example:

GenerateBigrams("เพลงไทย") → ["เพ", "พล", "ลง", "งไ", "ไท", "ทย"]

For strings shorter than 2 characters, returns the original string as a single-element slice.

func GenerateTrigrams ¶

func GenerateTrigrams(s string) []string

GenerateTrigrams creates overlapping 3-character chunks from a string. This is an alternative to bigrams that may provide better accuracy for longer queries.

Example:

GenerateTrigrams("เพลงไทย") → ["เพล", "พลง", "ลงไ", "งไท", "ไทย"]

For strings shorter than 3 characters, falls back to bigrams or returns the original string.

func IsBurmese ¶

func IsBurmese(s string) bool

IsBurmese returns true if the string contains Burmese characters.

func IsKhmer ¶

func IsKhmer(s string) bool

IsKhmer returns true if the string contains Khmer characters.

func IsLao ¶

func IsLao(s string) bool

IsLao returns true if the string contains Lao characters.

func IsThai ¶

func IsThai(s string) bool

IsThai returns true if the string contains Thai characters. This is a convenience function for the resolution workflow.

func JaccardSimilarity ¶

func JaccardSimilarity(set1, set2 []string) float64

JaccardSimilarity computes the Jaccard similarity coefficient between two sets of strings. This is defined as the size of the intersection divided by the size of the union.

Returns a value between 0.0 (no overlap) and 1.0 (identical sets).

Example:

set1 := []string{"a", "b", "c"}
set2 := []string{"b", "c", "d"}
similarity := JaccardSimilarity(set1, set2)  // Returns 0.5 (2 common / 4 total)

func NormalizeDotSeparators ¶

func NormalizeDotSeparators(s string) string

NormalizeDotSeparators converts dot separators to spaces, commonly used in scene release filenames. Scene releases typically use dots to separate words: "Show.Name.S01E02.mkv" This function converts those dots to spaces for better normalization.

Note: Preserves dots in:

Dates (e.g., "2024.01.15" stays as-is for date parsing)
Episode markers like "S01.E02" (preserved for episode format normalization)

Note: Does NOT preserve generic numeric decimals (e.g., "5.1" → "5 1"). However, known scene tags like "DD5.1", "AAC2.0", "H.264" are stripped by StripSceneTags() before this function runs, so they never reach here.

Useful for:

TV shows: "Show.Name.S01E02" → "Show Name S01E02"
Movies: "Movie.Name.2024" → "Movie Name 2024"

Examples:

"Breaking.Bad.S01E02" → "Breaking Bad S01E02"
"Attack.on.Titan.1x02" → "Attack on Titan 1x02"
"Show.Episode.Title" → "Show Episode Title"
"Show.2024.01.15" → "Show 2024.01.15" (date preserved)

func NormalizeOrdinals ¶

func NormalizeOrdinals(s string) string

NormalizeOrdinals removes ordinal suffixes from numbers. This allows "2nd" and "II" to both normalize to "2" for consistent matching.

Useful for:

Games: "Sonic the Hedgehog 2nd" → "Sonic the Hedgehog 2"
Movies: "21st Century" → "21 Century"

Examples:

"Street Fighter 2nd Impact" → "Street Fighter 2 Impact"
"21st Century" → "21 Century"
"3rd Strike" → "3 Strike"

func NormalizePunctuation ¶

func NormalizePunctuation(s string) string

NormalizePunctuation normalizes Unicode punctuation variants to their ASCII equivalents. This ensures consistent behavior across all pipeline stages, particularly for:

Conjunction detection (" 'n' " patterns in Stage 7)
Separator normalization (dash handling in Stage 7)
Abbreviation expansion (word boundary detection in Stage 9)

Normalized characters:

Curly quotes: ' ' " " → ' "
Prime marks: ′ ″ → ' "
Grave/acute: ` ´ → '
Dashes: – — ― − → -
Ellipsis: … → ...

Examples:

"Link's Awakening" → "Link's Awakening" (curly apostrophe → straight)
"Super–Bros." → "Super-Bros." (en dash → hyphen, enables "Bros" expansion)
"Rock 'n' Roll" → "Rock 'n' Roll" (curly quotes → straight, enables conjunction)

This is Stage 2 of the normalization pipeline (character-level normalization). Must be called BEFORE Stage 3 (Unicode normalization) and Stage 7 (symbol/separator processing).

func NormalizeSymbolsAndSeparators ¶

func NormalizeSymbolsAndSeparators(s string) string

NormalizeSymbolsAndSeparators converts conjunctions and separators to normalized forms. Handles conjunctions: "&", " + ", " 'n' " variants → "and" Handles plus symbol: "+" → "plus" Handles separators: ":", "_", "-", "/", "\", ",", ";" → space NOTE: Period "." is NOT converted here; it's handled after abbreviation expansion

Examples:

"Sonic & Knuckles" → "Sonic and Knuckles"
"Rock + Roll Racing" → "Rock and Roll Racing"
"Game+" → "Game plus"
"Zelda:Link" → "Zelda Link"
"Super_Mario_Bros" → "Super Mario Bros"
"Game/Part\One" → "Game Part One"

This is Stage 7 of the normalization pipeline.

func NormalizeToWords ¶

func NormalizeToWords(input string) []string

NormalizeToWords converts a game title to a normalized form with preserved word boundaries. This function applies game-specific parsing followed by universal normalization, then returns word tokens for scoring and ranking operations.

The result preserves spaces between words, enabling word-level operations like:

Token-based similarity matching
Word sequence validation
Sequel suffix detection
Weighted word scoring

Example:

NormalizeToWords("The Legend of Zelda: Ocarina of Time (USA)")
→ "legend of zelda ocarina of time"
→ []string{"legend", "of", "zelda", "ocarina", "of", "time"}

Note: For database queries and slug matching, use Slugify() instead. This function is for scoring and ranking operations only.

func NormalizeUnicode ¶

func NormalizeUnicode(s string, ctx *pipelineContext) string

NormalizeUnicode performs Unicode normalization with symbol removal and script-aware processing. This combines several operations:

Removes Unicode symbols (trademark ™, copyright ©, currency $€¥)
Applies script-specific normalization (NFKC for Latin, NFC for CJK, etc.)
Removes diacritics for Latin scripts (Pokémon → Pokemon)
Preserves essential marks for CJK scripts

Examples:

Symbols: "Sonic™" → "Sonic", "Game©" → "Game"
Diacritics (Latin): "Pokémon" → "Pokemon", "Café" → "Cafe"
Ligatures: "ﬁnal" → "final"
CJK preserved: "ドラゴンクエスト" → "ドラゴンクエスト"

This is Stage 3 of the normalization pipeline. Returns the input unchanged if normalization fails or if input is pure ASCII.

The optional ctx parameter enables caching optimizations during pipeline processing. When ctx is nil, caching is skipped (useful for standalone calls or tests). When ctx is provided, ASCII check and script detection results are cached for reuse.

func NormalizeWidth ¶

func NormalizeWidth(s string) string

NormalizeWidth performs width normalization on a string. Converts fullwidth ASCII characters to halfwidth (for Latin text processing). Converts halfwidth CJK characters to fullwidth (for consistent display and matching).

Examples:

Fullwidth ASCII: "ＡＢＣＤＥＦ" → "ABCDEF"
Fullwidth numbers: "１２３" → "123"
Halfwidth katakana: "ｳｴｯｼﾞ" → "ウエッジ"
Mixed: "Super Ｍario １２３" → "Super Mario 123"

This is Stage 1 of the normalization pipeline. Returns the input unchanged if normalization fails.

func ParseGame ¶

func ParseGame(title string) string

ParseGame normalizes game titles by applying game-specific transformations. This handles common game title patterns and variations to ensure consistent matching.

Transformations applied (in order):

Split titles and strip articles: "The Zelda: Link's Awakening" → "Zelda Link's Awakening"
Strip trailing articles: "Legend, The" → "Legend"
Strip metadata brackets: (USA), [!], {Europe}, <Beta> → removed
Strip edition/version suffixes: "Edition", "Version", v1.0 → removed
Normalize separators: Convert periods to spaces (for abbreviation matching)
Expand abbreviations: "Bros" → "brothers", "vs" → "versus", "Dr" → "doctor"
Expand number words: "one" → "1", "two" → "2"
Normalize ordinals: "1st" → "1", "2nd" → "2"
Convert roman numerals: "VII" → "7", "II" → "2" (preserves "X" for games like Mega Man X)

Examples:

"Super Mario Bros. III (USA) [!]" → "super mario brothers 3"
"Street Fighter II Version" → "street fighter 2"
"Mega Man X" → "mega man x" (X preserved)
"Final Fantasy VII" → "final fantasy 7"

func ParseMovie ¶

func ParseMovie(title string) string

ParseMovie normalizes movie titles to a canonical format. Handles scene release tags, edition suffix stripping, and article stripping. Years are stripped from the slug (like games) and extracted as tags by the tag parser.

Transformations applied (in order):

Width normalization: Convert fullwidth characters to ASCII
Scene tag stripping: Remove quality, codec, source, HDR, 3D tags
Scene group stripping: Remove trailing release group tags (-GROUP)
Dot normalization: Convert scene release dots to spaces
Edition suffix stripping: Remove "Edition", "Version", "Cut", "Release" suffixes (preserves qualifiers like "Director's", "Extended", "Theatrical")
Bracket stripping: Remove metadata brackets including years (extracted as tags)
Split titles and strip articles: "The Movie: Subtitle" → "Movie Subtitle"
Strip trailing articles: "Movie, The" → "Movie"

Supported formats: - Standard: "Movie Name (2024)" - Scene: "Movie.Name.2024.1080p.BluRay.x264-GROUP" - With edition: "Movie Name (2024) Director's Cut Edition" → "Movie Name Director's" - With ID: "Movie Name (2024) {imdb-tt1234567}"

Examples:

"The.Matrix.1999.1080p.BluRay.x264.DTS-WAF" → "Matrix 1999"
"Blade Runner (1982) Director's Cut" → "Blade Runner Director's"
"Avatar.2009.Extended.Edition.1080p" → "Avatar 2009 Extended"
"The Dark Knight (2008)" → "Dark Knight"
"Lord of the Rings (2001) Extended Edition" → "Lord of Rings Extended"
"Movie, The (2024)" → "Movie"

Note: Years like (1999) are extracted as tags (year:1999) by the tag parser, allowing users to filter by year when needed: launch.title Movie/Matrix (+year:1999)

TODO: Scene releases use bare years without parentheses (Movie.Name.1999.1080p), but we can't safely strip them without breaking movies with years in their titles (e.g., "2001: A Space Odyssey", "1917", "1984"). For now, we only strip years in parentheses/brackets. This means scene releases will include the year in the slug (e.g., "Matrix 1999" vs "Matrix" from standard naming). Cross-format matching happens at the Slugify level where lowercasing provides some normalization.

func ParseMusic ¶

func ParseMusic(title string) string

ParseMusic normalizes music album titles to a canonical format. This is a CONSERVATIVE implementation that focuses on cleaning scene release tags while preserving artist names for uniqueness.

Transformations applied (in order):

Width normalization: Convert fullwidth characters to ASCII
Scene tag stripping: Remove format, quality, source tags and release group
Separator normalization: Convert dots, underscores, and dashes to spaces
Bracket stripping: Remove metadata brackets including years (extracted as tags)
Disc number stripping: Remove CD1, CD2, Disc 1, etc.
Split titles and strip articles: "The Album: Subtitle" → "Album Subtitle"
Strip trailing articles: "Album, The" → "Album"

Supported formats: - Scene release: "Artist-Album-2024-CD-FLAC-GROUP" → "Artist Album 2024" - User-friendly: "Artist - Album (2024)" → "Artist Album" - With quality: "Artist - Album (2024) [FLAC 24bit]" → "Artist Album" - With disc: "Artist - Album CD1" → "Artist Album"

Examples:

"Pink.Floyd-The.Wall-1979-CD-FLAC-GROUP" → "Pink Floyd Wall 1979"
"The Beatles - Abbey Road (1969)" → "Beatles Abbey Road"
"VA - Best of 2024 [FLAC]" → "VA Best of 2024"
"Miles Davis - Kind of Blue (1959)" → "Miles Davis Kind of Blue"

Note: Years in parentheses/brackets are extracted as tags (year:1997) by the tag parser. Bare years (from scene releases) are kept in the slug.

Design note: This implementation intentionally keeps artist names to preserve uniqueness. Many albums share the same title across different artists ("IV", "Nevermind", etc.). More sophisticated artist/album extraction can be added later if needed.

func ParseTVShow ¶

func ParseTVShow(title string) string

ParseTVShow normalizes TV show titles to a canonical format. Handles various episode number formats, scene release tags, and reorders components.

Transformations applied (in order):

Width normalization: Convert fullwidth characters to ASCII
Scene tag stripping: Remove quality, codec, source tags (1080p, x264, BluRay, etc.)
Dot normalization: Convert scene release dots to spaces
Split titles and strip articles: "The Show: Episode Title" → "Show Episode Title"
Strip trailing articles: "Show, The" → "Show"
Strip metadata brackets: [720p], (extended), etc. → removed
Normalize episode formats: S01E02, 1x02, dates, absolute → canonical formats
Component reordering: Place episode marker in consistent position

Supported episode formats: - Season-based: S01E02, s01e02, 1x02, S01.E02, S01_E02, 102 (multi-episode supported) - Date-based: YYYY-MM-DD, DD-MM-YYYY, various separators (-, ., /) - Absolute: Episode 001, Ep 42, E001, #001 (anime) - Various delimiter variations (-, ., _, space)

Examples:

"Breaking.Bad.S01E02.1080p.BluRay.x264-GROUP" → "Breaking Bad s01e02"
"Show - S01E02 [720p]" → "Show s01e02"
"S01E02 - Show - Episode Title" → "Show s01e02 Episode Title"
"Attack on Titan - 1x02 - Title" → "Attack on Titan s01e02 Title"
"Daily Show - 2024-01-15" → "Daily Show 2024-01-15"
"One Piece - Episode 001" → "One Piece e001"

func ParseWithMediaType ¶

func ParseWithMediaType(mediaType MediaType, title string) string

ParseWithMediaType is the entry point for media-type-aware parsing. It delegates to the appropriate parser based on media type. Each parser applies media-specific normalization BEFORE the universal pipeline.

Media-specific parsers are implemented in separate files:

ParseTVShow → media_parsing_tv.go
ParseGame → media_parsing_game.go
ParseMovie, ParseMusic, etc. → TODO (return unchanged for now)

func Slugify ¶

func Slugify(mediaType MediaType, input string) string

Slugify applies media-type-aware parsing before slugification. It normalizes media titles based on their type (TV shows, movies, music, etc.) to ensure consistent matching across different format variations.

Media type should be a string matching one of the MediaType constants from systemdefs: "TVShow", "Movie", "Music", "Audio", "Video", "Game", "Image", "Application"

For TV shows, this normalizes episode markers:

"Show - S01E02 - Title" and "Show - 1x02 - Title" both normalize to the same slug

For other media types, parsing is applied based on the type, or the title passes through to the standard slugification pipeline.

Example:

Slugify(MediaTypeTVShow, "Breaking Bad - S01E02 - Gray Matter")
→ same as Slugify(MediaTypeTVShow, "Breaking Bad - 1x02 - Gray Matter")

func SplitAndStripArticles ¶

func SplitAndStripArticles(s string) string

SplitAndStripArticles splits a title into main and secondary parts, then strips leading articles from both. This combines title splitting and article removal into a single operation.

Delimiter priority (highest to lowest): ":", " - ", "'s " Note: For "'s " delimiter, the "'s" is retained in the main title.

Examples:

"The Legend of Zelda: Link's Awakening" → "Legend of Zelda Link's Awakening"
"The Game - A Subtitle" → "Game Subtitle"
"Mario's Adventure" → "Mario's Adventure" (no leading article)

This function is shared by all media parsers to ensure consistent article handling.

func SplitTitle ¶

func SplitTitle(title string) (mainTitle, secondaryTitle string, hasSecondary bool)

SplitTitle splits a title into main and secondary parts based on common delimiters. This is a public API function used by other packages for metadata processing.

Delimiter priority (highest to lowest): ":", " - ", "'s " Note: For "'s " delimiter, the "'s" is retained in the main title.

Returns:

mainTitle: The primary part of the title
secondaryTitle: The secondary part (subtitle)
hasSecondary: Whether a secondary title was found

Examples:

"The Legend of Zelda: Link's Awakening" → ("The Legend of Zelda", "Link's Awakening", true)
"Super Mario Bros." → ("Super Mario Bros.", "", false)
"Game - Subtitle" → ("Game", "Subtitle", true)

func StripEditionAndVersionSuffixes ¶

func StripEditionAndVersionSuffixes(s string) string

StripEditionAndVersionSuffixes removes edition/version words and version numbers from titles. Strips standalone words ("version", "edition") and their multi-language equivalents. Does NOT strip semantic edition markers like "Special", "Ultimate", "Remastered" - these represent different products and users may want to target them specifically.

Useful for:

Games: "Pokemon Red Version" → "Pokemon Red"
Applications: "Photoshop v2024" → "Photoshop"
Movies: "Blade Runner Director's Cut Edition" → "Blade Runner Director's Cut"

Supported languages:

English: version, edition
German: ausgabe (edition)
Italian: versione, edizione
Portuguese: versao, edicao (after diacritic normalization)
Japanese: バージョン (version), エディション (edition), ヴァージョン (version alt.)

Examples:

"Pokemon Red Version" → "Pokemon Red"
"Game Edition" → "Game"
"Super Mario Edition" → "Super Mario"
"ドラゴンクエストバージョン" → "ドラゴンクエスト" (CJK)
"Game Special Edition" → "Game Special" (Edition stripped, Special kept)

func StripLeadingArticle ¶

func StripLeadingArticle(s string) string

StripLeadingArticle removes leading articles ("The", "A", "An") from a string. This is a utility function used by both slug normalization and word-level matching. It preserves the original case of non-article portions.

Examples:

"The Legend of Zelda" → "Legend of Zelda"
"A New Hope" → "New Hope"
"An American Tail" → "American Tail"

func StripMetadataBrackets ¶

func StripMetadataBrackets(s string) string

StripMetadataBrackets removes all bracket types (parentheses, square brackets, braces, angle brackets) from a string. Commonly used to clean metadata like region codes, dump info, and tags.

Useful for:

Games: "Sonic (USA) [!]" → "Sonic"
Movies: "Movie (2024) [Remastered]" → "Movie (2024)" (year preserved, quality tag removed)
TV shows: "Show - S01E02 [720p]" → "Show - S01E02"

Examples:

"Game (USA) [!]" → "Game"
"Title {Europe} <Beta>" → "Title"
"Game ((nested)) [test]" → "Game"

func StripMovieSceneTags ¶

func StripMovieSceneTags(s string) string

StripMovieSceneTags removes scene release tags specific to movies. Unlike the shared StripSceneTags(), this function excludes edition qualifiers (Extended, Unrated, Director's Cut, Remastered) which identify different movie editions.

Removed tags include:

Quality: 480p, 720p, 1080p, 2160p, 4K, 8K, UHD, HD, SD
Source: BluRay, WEB-DL, HDTV, DVDRip, Remux, etc.
Codec: x264, x265, H.264, H.265, HEVC, XviD, AVC, VC-1, 10bit, 8bit
Audio: AC3, AAC, DTS, DD5.1, DD7.1, Atmos, TrueHD, etc.
HDR: HDR, HDR10, HDR10+, Dolby Vision, HLG
3D: 3D, HSBS, HOU, Half-SBS, Half-OU
Tags: PROPER, REPACK, INTERNAL, LIMITED, MULTI, KORSUB (but NOT Extended, Unrated, etc.)
Group: -GROUP at end

Preserved edition qualifiers:

Extended, Unrated, Director's Cut, Remastered (these identify different editions)

Examples:

"Movie.2024.2160p.WEB-DL.DV.HDR10.HEVC-GROUP" → "Movie 2024"
"Avatar.2009.Extended.3D.HSBS.1080p.BluRay" → "Avatar 2009 Extended"
"Film.2020.Unrated.1080p.BluRay.x264.DTS" → "Film 2020 Unrated"

func StripMusicSceneTags ¶

func StripMusicSceneTags(s string) string

StripMusicSceneTags removes scene release tags specific to music. Unlike movie scene tags, music preserves edition qualifiers (Remastered, Deluxe, etc.) as these identify different album editions.

Removed tags include:

Format: FLAC, MP3, AAC, ALAC, APE, WAV, OGG, WMA, M4A, OPUS
Quality: V0, V2, 320, 192, 256, CBR, VBR, LAME, 24bit, 96kHz, etc.
Source: CD, WEB, Vinyl, SACD, DVD, Blu-ray, DAT, Cassette
Disc numbers: CD1, CD2, Disc1, Disc2
Group: -GROUP at end

Preserved edition qualifiers:

Remastered, Deluxe, Limited, Expanded, Anniversary, Bonus, Special

Examples:

"Artist-Album-2024-CD-FLAC-V0-GROUP" → "Artist-Album 2024"
"Album.Title.1979.Vinyl.FLAC.24bit.96kHz" → "Album Title 1979"
"Album.2020.Remastered.WEB.FLAC" → "Album 2020 Remastered"

func StripSceneTags ¶

func StripSceneTags(s string) string

StripSceneTags removes scene release tags commonly found in TV show filenames. Scene releases use specific tags to indicate quality, source, codec, audio, and release group. This function strips all such tags to normalize titles for matching.

Removed tags include:

Quality: 480p, 720p, 1080p, 2160p, 4K, HD, SD, UHD
Source: BluRay, BDRip, BRRip, WEBRip, WEB-DL, HDTV, DVDRip, etc.
Codec: x264, x265, H.264, H.265, HEVC, XviD, AVC, 10bit, 8bit
Audio: AC3, AAC, DTS, DD5.1, DD7.1, Atmos, TrueHD, etc.
Other: PROPER, REPACK, INTERNAL, LIMITED, EXTENDED, UNRATED, Director's Cut, etc.
Group: Trailing release group tag (e.g., "-GROUP")

Useful for:

TV shows: "Show.Name.S01E02.1080p.BluRay.x264-GROUP" → "Show Name S01E02"
Movies: "Movie.Name.2024.720p.WEB-DL.AAC2.0.H.264-RELEASE" → "Movie Name 2024"

Examples:

"Breaking.Bad.S01E02.1080p.BluRay.x264-GROUP" → "Breaking Bad S01E02"
"Show.S01E02.720p.WEB-DL.AAC2.0.H.264" → "Show S01E02"
"Episode.4K.HDR.Atmos.PROPER" → "Episode"

func StripTrailingArticle ¶

func StripTrailingArticle(s string) string

StripTrailingArticle removes trailing articles like ", The" from the end of a string.

Pattern: `, The` followed by end of string or separator characters (space, colon, dash, parenthesis, bracket)

Examples:

"Legend, The" → "Legend"
"Mega Man, The" → "Mega Man"
"Story, the:" → "Story:" (case insensitive)

Types ¶

type MediaType ¶

type MediaType string

MediaType categorizes the type of media content being slugified. This determines which media-specific parsing rules are applied before slugification.

const (
	// MediaTypeGame represents gaming systems (consoles, computers, arcade).
	MediaTypeGame MediaType = "Game"
	// MediaTypeMovie represents film and movie content.
	MediaTypeMovie MediaType = "Movie"
	// MediaTypeTVShow represents TV episodes and shows.
	MediaTypeTVShow MediaType = "TVShow"
	// MediaTypeMusic represents music and song content.
	MediaTypeMusic MediaType = "Music"
	// MediaTypeImage represents image files.
	MediaTypeImage MediaType = "Image"
	// MediaTypeAudio represents general audio content (audiobooks, podcasts).
	MediaTypeAudio MediaType = "Audio"
	// MediaTypeVideo represents general video content (music videos).
	MediaTypeVideo MediaType = "Video"
	// MediaTypeApplication represents application/software content.
	MediaTypeApplication MediaType = "Application"
)

type ScriptType ¶

type ScriptType int

ScriptType represents different writing systems supported by the slug system. Each script type may require different normalization strategies.

const (
	ScriptLatin    ScriptType = iota // Latin alphabet (English, French, Spanish, etc.)
	ScriptCJK                        // Chinese, Japanese, Korean
	ScriptCyrillic                   // Russian, Ukrainian, Bulgarian, Serbian, etc.
	ScriptGreek                      // Greek
	ScriptIndic                      // Devanagari, Bengali, Tamil, Telugu, etc.
	ScriptArabic                     // Arabic, Urdu, Persian/Farsi
	ScriptHebrew                     // Hebrew
	ScriptThai                       // Thai (requires n-gram matching)
	ScriptBurmese                    // Burmese/Myanmar (requires n-gram matching)
	ScriptKhmer                      // Khmer/Cambodian (requires n-gram matching)
	ScriptLao                        // Lao (requires n-gram matching)
	ScriptAmharic                    // Amharic/Ethiopic
)

func DetectScript ¶

func DetectScript(s string) ScriptType

DetectScript identifies the primary writing system used in a string. Returns the first matching script type, or ScriptLatin as the default.

type SlugifyResult ¶

type SlugifyResult struct {
	Slug   string
	Tokens []string
}

SlugifyResult contains the slug and tokens generated during slugification. This ensures metadata is computed from the EXACT tokens used during slug generation, not from re-tokenization.

func SlugifyWithTokens ¶

func SlugifyWithTokens(mediaType MediaType, input string) SlugifyResult

SlugifyWithTokens performs 14-stage normalization and returns both slug and tokens. This is the core implementation - it returns tokens extracted DURING slug generation to ensure metadata is computed from the EXACT same tokenization that produces the slug.

Use this function when you need both the slug and token-based metadata (e.g., word count). For simple slug generation, use Slugify() instead.

Example:

result := SlugifyWithTokens("The Legend of Zelda: Ocarina of Time (USA)")
result.Slug   → "legendofzeldaocarinaoftime"
result.Tokens → []string{"legend", "of", "zelda", "ocarina", "of", "time"}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL