Documentation
¶
Overview ¶
Package ldt detects natural languages and scripts ( writing systems ). Languages are represented by a determined list of constants while scripts are represented by *unicode.RangeTable.
Index ¶
- Constants
- Variables
- func LangToString(lang Lang) string
- func LangToStringShort(lang Lang) string
- type ArabicDetector
- type BengaliDetector
- type CJKDetector
- type CyrillicDetector
- type Detector
- type Detectors
- type DevanagariDetector
- type EthiopicDetector
- type GeorgianDetector
- type GreekDetector
- type GujaratiDetector
- type GurmukhiDetector
- type HebrewDetector
- type Info
- type KannadaDetector
- type KhmerDetector
- type Lang
- type LatinDetector
- type MalayalamDetector
- type MyanmarDetector
- type Options
- type OriyaDetector
- type SinhalaDetector
- type TamilDetector
- type TeluguDetector
- type ThaiDetector
Constants ¶
const ReliableConfidenceThreshold = 0.8
ReliableConfidenceThreshold is confidence rating that has to be succeeded for the language detection to be considered reliable.
Variables ¶
var Langs = map[Lang]string{ Afr: "Afrikaans", Aka: "Akan", Amh: "Amharic", Arb: "Arabic", Azj: "Azerbaijani", Bel: "Belarusian", Ben: "Bengali", Bho: "Bhojpuri", Bul: "Bulgarian", Ceb: "Cebuano", Ces: "Czech", Dan: "Danish", Deu: "German", Ell: "Greek", Eng: "English", Epo: "Esperanto", Est: "Estonian", Fin: "Finnish", Fra: "French", Guj: "Gujarati", Hat: "Haitian Creole", Hau: "Hausa", Heb: "Hebrew", Hin: "Hindi", Hrv: "Croatian", Hun: "Hungarian", Ibo: "Igbo", Ilo: "Ilocano", Ind: "Indonesian", Ita: "Italian", Jav: "Javanese", Jpn: "Japanese", Kan: "Kannada", Kat: "Georgian", Khm: "Khmer", Kin: "Kinyarwanda", Kor: "Korean", Kur: "Kurdish", Lav: "Latvian", Lit: "Lithuanian", Mai: "Maithili", Mal: "Malayalam", Mar: "Marathi", Mkd: "Macedonian", Mlg: "Malagasy", Mya: "Burmese", Nep: "Nepali", Nld: "Dutch", Nno: "Nynorsk", Nob: "Bokmal", Nya: "Chewa", Ori: "Oriya", Orm: "Oromo", Pan: "Punjabi", Pes: "Persian", Pol: "Polish", Por: "Portuguese", Ron: "Romanian", Run: "Rundi", Rus: "Russian", Sin: "Sinhalese", Skr: "Saraiki", Slv: "Slovene", Sna: "Shona", Som: "Somali", Spa: "Spanish", Srp: "Serbian", Swe: "Swedish", Tam: "Tamil", Tel: "Telugu", Tgl: "Tagalog", Tha: "Thai", Tir: "Tigrinya", Tuk: "Turkmen", Tur: "Turkish", Uig: "Uyghur", Ukr: "Ukrainian", Urd: "Urdu", Uzb: "Uzbek", Vie: "Vietnamese", Ydd: "Yiddish", Yor: "Yoruba", Zho: "Chinese", Zul: "Zulu", }
Langs represents a map of Lang to language name.
Functions ¶
func LangToString ¶
LangToString converts enum into ISO 639-3 code as a string. Deprecated: LangToString is deprected and exists for historical compatibility. Please use `Lang.Iso6393()` instead.
func LangToStringShort ¶
LangToStringShort converts enum into ISO 639-1 code as a string. Return empty string when there is no ISO 639-1 code. Deprecated: LangToStringShort is deprected and exists for historical compatibility. Please use `Lang.Iso6391()` instead.
Types ¶
type ArabicDetector ¶
type ArabicDetector struct {
// contains filtered or unexported fields
}
func (*ArabicDetector) Count ¶
func (ad *ArabicDetector) Count(s string) bool
type BengaliDetector ¶
type BengaliDetector struct {
// contains filtered or unexported fields
}
func (*BengaliDetector) Count ¶
func (bd *BengaliDetector) Count(s string) bool
type CJKDetector ¶
type CJKDetector struct {
// contains filtered or unexported fields
}
func (*CJKDetector) Count ¶
func (cjk *CJKDetector) Count(s string) bool
type CyrillicDetector ¶
type CyrillicDetector struct {
// contains filtered or unexported fields
}
func (*CyrillicDetector) Count ¶
func (cd *CyrillicDetector) Count(s string) bool
type DevanagariDetector ¶
type DevanagariDetector struct {
// contains filtered or unexported fields
}
func (*DevanagariDetector) Count ¶
func (dd *DevanagariDetector) Count(s string) bool
type EthiopicDetector ¶
type EthiopicDetector struct {
// contains filtered or unexported fields
}
func (*EthiopicDetector) Count ¶
func (ed *EthiopicDetector) Count(s string) bool
type GeorgianDetector ¶
type GeorgianDetector struct {
// contains filtered or unexported fields
}
func (*GeorgianDetector) Count ¶
func (gd *GeorgianDetector) Count(s string) bool
type GreekDetector ¶
type GreekDetector struct {
// contains filtered or unexported fields
}
func (*GreekDetector) Count ¶
func (gd *GreekDetector) Count(s string) bool
type GujaratiDetector ¶
type GujaratiDetector struct {
// contains filtered or unexported fields
}
func (*GujaratiDetector) Count ¶
func (gd *GujaratiDetector) Count(s string) bool
type GurmukhiDetector ¶
type GurmukhiDetector struct {
// contains filtered or unexported fields
}
func (*GurmukhiDetector) Count ¶
func (gd *GurmukhiDetector) Count(s string) bool
type HebrewDetector ¶
type HebrewDetector struct {
// contains filtered or unexported fields
}
func (*HebrewDetector) Count ¶
func (hd *HebrewDetector) Count(s string) bool
type Info ¶
Info represents a full outcome of language detection.
func DetectWithOptions ¶
DetectWithOptions detects the language info of the given text with the provided options.
func (*Info) IsReliable ¶
IsReliable returns true if Confidence is greater than the Reliable Confidence Threshold
type KannadaDetector ¶
type KannadaDetector struct {
// contains filtered or unexported fields
}
func (*KannadaDetector) Count ¶
func (kd *KannadaDetector) Count(s string) bool
type KhmerDetector ¶
type KhmerDetector struct {
// contains filtered or unexported fields
}
func (*KhmerDetector) Count ¶
func (kd *KhmerDetector) Count(s string) bool
type Lang ¶
type Lang int
Lang represents a language following ISO 639-3 standard.
const ( Unknown Lang = iota Afr Aka Amh Arb Azj Bel Ben Bho Bul Ceb Ces Dan Deu Ell Eng Epo Est Fin Fra Guj Hat Hau Heb Hin Hrv Hun Ibo Ilo Ind Ita Jav Jpn Kan Kat Khm Kin Kor Kur Lav Lit Mai Mal Mar Mkd Mlg Mya Nep Nld Nno Nob Nya Ori Orm Pan Pes Pol Por Ron Run Rus Sin Skr Slv Sna Som Spa Srp Swe Tam Tel Tgl Tha Tir Tuk Tur Uig Ukr Urd Uzb Vie Ydd Yor Zho Zul )
Aka ...
func CodeToLang ¶
CodeToLang gets enum by ISO 639-3 code as a string.
func DetectLang ¶
DetectLang detects only the language of the given text.
func DetectLangWithOptions ¶
DetectLangWithOptions detects only the language of the given text with the provided options.
type LatinDetector ¶
type LatinDetector struct {
// contains filtered or unexported fields
}
func (*LatinDetector) Count ¶
func (ld *LatinDetector) Count(s string) bool
type MalayalamDetector ¶
type MalayalamDetector struct {
// contains filtered or unexported fields
}
func (*MalayalamDetector) Count ¶
func (md *MalayalamDetector) Count(s string) bool
type MyanmarDetector ¶
type MyanmarDetector struct {
// contains filtered or unexported fields
}
func (*MyanmarDetector) Count ¶
func (md *MyanmarDetector) Count(s string) bool
type Options ¶
Options represents options that can be set when detecting a language or/and script such blacklisting languages to skip checking.
type OriyaDetector ¶
type OriyaDetector struct {
// contains filtered or unexported fields
}
func (*OriyaDetector) Count ¶
func (od *OriyaDetector) Count(s string) bool
type SinhalaDetector ¶
type SinhalaDetector struct {
// contains filtered or unexported fields
}
func (*SinhalaDetector) Count ¶
func (sd *SinhalaDetector) Count(s string) bool
type TamilDetector ¶
type TamilDetector struct {
// contains filtered or unexported fields
}
func (*TamilDetector) Count ¶
func (td *TamilDetector) Count(s string) bool
type TeluguDetector ¶
type TeluguDetector struct {
// contains filtered or unexported fields
}
func (*TeluguDetector) Count ¶
func (td *TeluguDetector) Count(s string) bool
type ThaiDetector ¶
type ThaiDetector struct {
// contains filtered or unexported fields
}
func (*ThaiDetector) Count ¶
func (td *ThaiDetector) Count(s string) bool