ldt

package
v1.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2025 License: MIT, MIT Imports: 5 Imported by: 0

README

ldt

ldt is library to automatically detect language of texts for Go programming language.

This package was created by abadojack. Forked by the pango project in order to incorporate bugfixes and new features.

Natural language detection for Go.

Features

  • Supports 84 languages
  • 100% written in Go
  • No external dependencies
  • Fast
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)

Getting started

Installation:

    go get -u github.com/askasoft/pango
Simple usage example:
package main

import (
	"fmt"

	"github.com/askasoft/pango/ldt"
)

func main() {
	info := ldt.Detect("Foje funkcias kaj foje ne funkcias")
	fmt.Println("Language:", info.Lang.String(), " Confidence: ", info.Confidence)
}
With Options:
package main

import (
	"fmt"

	"github.com/askasoft/pango/ldt"
)

func main() {
	// Excludes
	options := ldt.Options{
		Excludes: []ldt.Lang{ldt.Ydd},
	}

	info := ldt.DetectWithOptions("האקדמיה ללשון העברית", options)

	fmt.Println("Language:", info.Lang.String())

	// Includes
	options1 := ldt.Options{
		Includes: []ldt.Lang{ldt.Epo, ldt.Ukr},
	}

	info = ldt.DetectWithOptions("Mi ne scias", options1)
	fmt.Println("Language:", info.Lang.String())
}

Requirements

Go 1.8 or higher

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How IsReliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Documentation

Overview

Package ldt detects natural languages and scripts ( writing systems ). Languages are represented by a determined list of constants while scripts are represented by *unicode.RangeTable.

Index

Constants

View Source
const ReliableConfidenceThreshold = 0.8

ReliableConfidenceThreshold is confidence rating that has to be succeeded for the language detection to be considered reliable.

Variables

View Source
var Langs = map[Lang]string{
	Afr: "Afrikaans",
	Aka: "Akan",
	Amh: "Amharic",
	Arb: "Arabic",
	Azj: "Azerbaijani",
	Bel: "Belarusian",
	Ben: "Bengali",
	Bho: "Bhojpuri",
	Bul: "Bulgarian",
	Ceb: "Cebuano",
	Ces: "Czech",
	Dan: "Danish",
	Deu: "German",
	Ell: "Greek",
	Eng: "English",
	Epo: "Esperanto",
	Est: "Estonian",
	Fin: "Finnish",
	Fra: "French",
	Guj: "Gujarati",
	Hat: "Haitian Creole",
	Hau: "Hausa",
	Heb: "Hebrew",
	Hin: "Hindi",
	Hrv: "Croatian",
	Hun: "Hungarian",
	Ibo: "Igbo",
	Ilo: "Ilocano",
	Ind: "Indonesian",
	Ita: "Italian",
	Jav: "Javanese",
	Jpn: "Japanese",
	Kan: "Kannada",
	Kat: "Georgian",
	Khm: "Khmer",
	Kin: "Kinyarwanda",
	Kor: "Korean",
	Kur: "Kurdish",
	Lav: "Latvian",
	Lit: "Lithuanian",
	Mai: "Maithili",
	Mal: "Malayalam",
	Mar: "Marathi",
	Mkd: "Macedonian",
	Mlg: "Malagasy",
	Mya: "Burmese",
	Nep: "Nepali",
	Nld: "Dutch",
	Nno: "Nynorsk",
	Nob: "Bokmal",
	Nya: "Chewa",
	Ori: "Oriya",
	Orm: "Oromo",
	Pan: "Punjabi",
	Pes: "Persian",
	Pol: "Polish",
	Por: "Portuguese",
	Ron: "Romanian",
	Run: "Rundi",
	Rus: "Russian",
	Sin: "Sinhalese",
	Skr: "Saraiki",
	Slv: "Slovene",
	Sna: "Shona",
	Som: "Somali",
	Spa: "Spanish",
	Srp: "Serbian",
	Swe: "Swedish",
	Tam: "Tamil",
	Tel: "Telugu",
	Tgl: "Tagalog",
	Tha: "Thai",
	Tir: "Tigrinya",
	Tuk: "Turkmen",
	Tur: "Turkish",
	Uig: "Uyghur",
	Ukr: "Ukrainian",
	Urd: "Urdu",
	Uzb: "Uzbek",
	Vie: "Vietnamese",
	Ydd: "Yiddish",
	Yor: "Yoruba",
	Zho: "Chinese",
	Zul: "Zulu",
}

Langs represents a map of Lang to language name.

Functions

func LangToString

func LangToString(lang Lang) string

LangToString converts enum into ISO 639-3 code as a string. Deprecated: LangToString is deprected and exists for historical compatibility. Please use `Lang.Iso6393()` instead.

func LangToStringShort

func LangToStringShort(lang Lang) string

LangToStringShort converts enum into ISO 639-1 code as a string. Return empty string when there is no ISO 639-1 code. Deprecated: LangToStringShort is deprected and exists for historical compatibility. Please use `Lang.Iso6391()` instead.

Types

type ArabicDetector

type ArabicDetector struct {
	// contains filtered or unexported fields
}

func (*ArabicDetector) Chars

func (sc *ArabicDetector) Chars() int

func (*ArabicDetector) Count

func (ad *ArabicDetector) Count(s string) bool

func (*ArabicDetector) Detect

func (ad *ArabicDetector) Detect(text string, options Options) (Lang, float64)

func (*ArabicDetector) Words

func (sc *ArabicDetector) Words() int

type BengaliDetector

type BengaliDetector struct {
	// contains filtered or unexported fields
}

func (*BengaliDetector) Chars

func (sc *BengaliDetector) Chars() int

func (*BengaliDetector) Count

func (bd *BengaliDetector) Count(s string) bool

func (*BengaliDetector) Detect

func (bd *BengaliDetector) Detect(text string, options Options) (Lang, float64)

func (*BengaliDetector) Words

func (sc *BengaliDetector) Words() int

type CJKDetector

type CJKDetector struct {
	// contains filtered or unexported fields
}

func (*CJKDetector) Chars

func (sc *CJKDetector) Chars() int

func (*CJKDetector) Count

func (cjk *CJKDetector) Count(s string) bool

func (*CJKDetector) Detect

func (cjk *CJKDetector) Detect(text string, options Options) (Lang, float64)

func (*CJKDetector) Words

func (sc *CJKDetector) Words() int

type CyrillicDetector

type CyrillicDetector struct {
	// contains filtered or unexported fields
}

func (*CyrillicDetector) Chars

func (sc *CyrillicDetector) Chars() int

func (*CyrillicDetector) Count

func (cd *CyrillicDetector) Count(s string) bool

func (*CyrillicDetector) Detect

func (cd *CyrillicDetector) Detect(text string, options Options) (Lang, float64)

func (*CyrillicDetector) Words

func (sc *CyrillicDetector) Words() int

type Detector

type Detector interface {
	Count(string) bool
	Detect(string, Options) (Lang, float64)
	Chars() int
	Words() int
}

type Detectors

type Detectors []Detector

func AllDetectors

func AllDetectors() Detectors

func (Detectors) Best

func (ds Detectors) Best() Detector

func (Detectors) Counts

func (ds Detectors) Counts(s string) bool

func (Detectors) Len

func (ds Detectors) Len() int

func (Detectors) Less

func (ds Detectors) Less(i, j int) bool

func (Detectors) Swap

func (ds Detectors) Swap(i, j int)

type DevanagariDetector

type DevanagariDetector struct {
	// contains filtered or unexported fields
}

func (*DevanagariDetector) Chars

func (sc *DevanagariDetector) Chars() int

func (*DevanagariDetector) Count

func (dd *DevanagariDetector) Count(s string) bool

func (*DevanagariDetector) Detect

func (dd *DevanagariDetector) Detect(text string, options Options) (Lang, float64)

func (*DevanagariDetector) Words

func (sc *DevanagariDetector) Words() int

type EthiopicDetector

type EthiopicDetector struct {
	// contains filtered or unexported fields
}

func (*EthiopicDetector) Chars

func (sc *EthiopicDetector) Chars() int

func (*EthiopicDetector) Count

func (ed *EthiopicDetector) Count(s string) bool

func (*EthiopicDetector) Detect

func (ed *EthiopicDetector) Detect(text string, options Options) (Lang, float64)

func (*EthiopicDetector) Words

func (sc *EthiopicDetector) Words() int

type GeorgianDetector

type GeorgianDetector struct {
	// contains filtered or unexported fields
}

func (*GeorgianDetector) Chars

func (sc *GeorgianDetector) Chars() int

func (*GeorgianDetector) Count

func (gd *GeorgianDetector) Count(s string) bool

func (*GeorgianDetector) Detect

func (gd *GeorgianDetector) Detect(text string, options Options) (Lang, float64)

func (*GeorgianDetector) Words

func (sc *GeorgianDetector) Words() int

type GreekDetector

type GreekDetector struct {
	// contains filtered or unexported fields
}

func (*GreekDetector) Chars

func (sc *GreekDetector) Chars() int

func (*GreekDetector) Count

func (gd *GreekDetector) Count(s string) bool

func (*GreekDetector) Detect

func (gd *GreekDetector) Detect(text string, options Options) (Lang, float64)

func (*GreekDetector) Words

func (sc *GreekDetector) Words() int

type GujaratiDetector

type GujaratiDetector struct {
	// contains filtered or unexported fields
}

func (*GujaratiDetector) Chars

func (sc *GujaratiDetector) Chars() int

func (*GujaratiDetector) Count

func (gd *GujaratiDetector) Count(s string) bool

func (*GujaratiDetector) Detect

func (gd *GujaratiDetector) Detect(text string, options Options) (Lang, float64)

func (*GujaratiDetector) Words

func (sc *GujaratiDetector) Words() int

type GurmukhiDetector

type GurmukhiDetector struct {
	// contains filtered or unexported fields
}

func (*GurmukhiDetector) Chars

func (sc *GurmukhiDetector) Chars() int

func (*GurmukhiDetector) Count

func (gd *GurmukhiDetector) Count(s string) bool

func (*GurmukhiDetector) Detect

func (gd *GurmukhiDetector) Detect(text string, options Options) (Lang, float64)

func (*GurmukhiDetector) Words

func (sc *GurmukhiDetector) Words() int

type HebrewDetector

type HebrewDetector struct {
	// contains filtered or unexported fields
}

func (*HebrewDetector) Chars

func (sc *HebrewDetector) Chars() int

func (*HebrewDetector) Count

func (hd *HebrewDetector) Count(s string) bool

func (*HebrewDetector) Detect

func (hd *HebrewDetector) Detect(text string, options Options) (Lang, float64)

func (*HebrewDetector) Words

func (sc *HebrewDetector) Words() int

type Info

type Info struct {
	Lang       Lang
	Confidence float64
}

Info represents a full outcome of language detection.

func Detect

func Detect(text string) Info

Detect detects the language info of the given text.

func DetectWithOptions

func DetectWithOptions(text string, options Options) Info

DetectWithOptions detects the language info of the given text with the provided options.

func (*Info) IsReliable

func (info *Info) IsReliable() bool

IsReliable returns true if Confidence is greater than the Reliable Confidence Threshold

type KannadaDetector

type KannadaDetector struct {
	// contains filtered or unexported fields
}

func (*KannadaDetector) Chars

func (sc *KannadaDetector) Chars() int

func (*KannadaDetector) Count

func (kd *KannadaDetector) Count(s string) bool

func (*KannadaDetector) Detect

func (kd *KannadaDetector) Detect(text string, options Options) (Lang, float64)

func (*KannadaDetector) Words

func (sc *KannadaDetector) Words() int

type KhmerDetector

type KhmerDetector struct {
	// contains filtered or unexported fields
}

func (*KhmerDetector) Chars

func (sc *KhmerDetector) Chars() int

func (*KhmerDetector) Count

func (kd *KhmerDetector) Count(s string) bool

func (*KhmerDetector) Detect

func (kd *KhmerDetector) Detect(text string, options Options) (Lang, float64)

func (*KhmerDetector) Words

func (sc *KhmerDetector) Words() int

type Lang

type Lang int

Lang represents a language following ISO 639-3 standard.

const (
	Unknown Lang = iota
	Afr
	Aka
	Amh
	Arb
	Azj
	Bel
	Ben
	Bho
	Bul
	Ceb
	Ces
	Dan
	Deu
	Ell
	Eng
	Epo
	Est
	Fin
	Fra
	Guj
	Hat
	Hau
	Heb
	Hin
	Hrv
	Hun
	Ibo
	Ilo
	Ind
	Ita
	Jav
	Jpn
	Kan
	Kat
	Khm
	Kin
	Kor
	Kur
	Lav
	Lit
	Mai
	Mal
	Mar
	Mkd
	Mlg
	Mya
	Nep
	Nld
	Nno
	Nob
	Nya
	Ori
	Orm
	Pan
	Pes
	Pol
	Por
	Ron
	Run
	Rus
	Sin
	Skr
	Slv
	Sna
	Som
	Spa
	Srp
	Swe
	Tam
	Tel
	Tgl
	Tha
	Tir
	Tuk
	Tur
	Uig
	Ukr
	Urd
	Uzb
	Vie
	Ydd
	Yor
	Zho
	Zul
)

Aka ...

func CodeToLang

func CodeToLang(code string) Lang

CodeToLang gets enum by ISO 639-3 code as a string.

func DetectLang

func DetectLang(text string) Lang

DetectLang detects only the language of the given text.

func DetectLangWithOptions

func DetectLangWithOptions(text string, options Options) Lang

DetectLangWithOptions detects only the language of the given text with the provided options.

func (Lang) Iso6391

func (lang Lang) Iso6391() string

Iso6391 returns ISO 639-1 code of Lang as a string.

func (Lang) Iso6393

func (lang Lang) Iso6393() string

Iso6393 returns ISO 639-3 code of Lang as a string.

func (Lang) String

func (lang Lang) String() string

String returns the human-readable name of the language as a string.

type LatinDetector

type LatinDetector struct {
	// contains filtered or unexported fields
}

func (*LatinDetector) Chars

func (sc *LatinDetector) Chars() int

func (*LatinDetector) Count

func (ld *LatinDetector) Count(s string) bool

func (*LatinDetector) Detect

func (ld *LatinDetector) Detect(text string, options Options) (Lang, float64)

func (*LatinDetector) Words

func (sc *LatinDetector) Words() int

type MalayalamDetector

type MalayalamDetector struct {
	// contains filtered or unexported fields
}

func (*MalayalamDetector) Chars

func (sc *MalayalamDetector) Chars() int

func (*MalayalamDetector) Count

func (md *MalayalamDetector) Count(s string) bool

func (*MalayalamDetector) Detect

func (md *MalayalamDetector) Detect(text string, options Options) (Lang, float64)

func (*MalayalamDetector) Words

func (sc *MalayalamDetector) Words() int

type MyanmarDetector

type MyanmarDetector struct {
	// contains filtered or unexported fields
}

func (*MyanmarDetector) Chars

func (sc *MyanmarDetector) Chars() int

func (*MyanmarDetector) Count

func (md *MyanmarDetector) Count(s string) bool

func (*MyanmarDetector) Detect

func (md *MyanmarDetector) Detect(text string, options Options) (Lang, float64)

func (*MyanmarDetector) Words

func (sc *MyanmarDetector) Words() int

type Options

type Options struct {
	Detectors []Detector
	Includes  []Lang
	Excludes  []Lang
}

Options represents options that can be set when detecting a language or/and script such blacklisting languages to skip checking.

type OriyaDetector

type OriyaDetector struct {
	// contains filtered or unexported fields
}

func (*OriyaDetector) Chars

func (sc *OriyaDetector) Chars() int

func (*OriyaDetector) Count

func (od *OriyaDetector) Count(s string) bool

func (*OriyaDetector) Detect

func (od *OriyaDetector) Detect(text string, options Options) (Lang, float64)

func (*OriyaDetector) Words

func (sc *OriyaDetector) Words() int

type SinhalaDetector

type SinhalaDetector struct {
	// contains filtered or unexported fields
}

func (*SinhalaDetector) Chars

func (sc *SinhalaDetector) Chars() int

func (*SinhalaDetector) Count

func (sd *SinhalaDetector) Count(s string) bool

func (*SinhalaDetector) Detect

func (sd *SinhalaDetector) Detect(text string, options Options) (Lang, float64)

func (*SinhalaDetector) Words

func (sc *SinhalaDetector) Words() int

type TamilDetector

type TamilDetector struct {
	// contains filtered or unexported fields
}

func (*TamilDetector) Chars

func (sc *TamilDetector) Chars() int

func (*TamilDetector) Count

func (td *TamilDetector) Count(s string) bool

func (*TamilDetector) Detect

func (td *TamilDetector) Detect(text string, options Options) (Lang, float64)

func (*TamilDetector) Words

func (sc *TamilDetector) Words() int

type TeluguDetector

type TeluguDetector struct {
	// contains filtered or unexported fields
}

func (*TeluguDetector) Chars

func (sc *TeluguDetector) Chars() int

func (*TeluguDetector) Count

func (td *TeluguDetector) Count(s string) bool

func (*TeluguDetector) Detect

func (td *TeluguDetector) Detect(text string, options Options) (Lang, float64)

func (*TeluguDetector) Words

func (sc *TeluguDetector) Words() int

type ThaiDetector

type ThaiDetector struct {
	// contains filtered or unexported fields
}

func (*ThaiDetector) Chars

func (sc *ThaiDetector) Chars() int

func (*ThaiDetector) Count

func (td *ThaiDetector) Count(s string) bool

func (*ThaiDetector) Detect

func (td *ThaiDetector) Detect(text string, options Options) (Lang, float64)

func (*ThaiDetector) Words

func (sc *ThaiDetector) Words() int

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL