Documentation
¶
Overview ¶
Package cdt ports character set detection from ICU.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var (
ErrNotDetected = errors.New("cdt: charset not detected")
)
Functions ¶
This section is empty.
Types ¶
type Detector ¶
type Detector struct {
// contains filtered or unexported fields
}
Detector implements charset detection.
Example ¶
package main
import (
"fmt"
)
var (
zh_gb18030_text = []byte{
71, 111, 202, 199, 71, 111, 111, 103, 108, 101, 233, 95, 176, 108, 181, 196, 210, 187, 214, 214, 190, 142, 215, 103, 208, 205, 163, 172, 129, 75, 176, 108,
208, 205, 163, 172, 178, 162, 190, 223, 211, 208, 192, 172, 187, 248, 187, 216, 202, 213, 185, 166, 196, 220, 181, 196, 177, 224, 179, 204, 211, 239, 209, 212,
161, 163, 10,
}
)
func main() {
detector := NewTextDetector()
result, err := detector.DetectBest(zh_gb18030_text)
if err == nil {
fmt.Printf(
"Detected charset is %s, language is %s",
result.Charset,
result.Language)
}
}
Output: Detected charset is GB18030, language is zh
func NewDetector ¶
NewDetector creates a Detector for plain text or html.
func NewHtmlDetector ¶
func NewHtmlDetector() *Detector
NewHtmlDetector creates a Detector for Html.
func NewTextDetector ¶
func NewTextDetector() *Detector
NewTextDetector creates a Detector for plain text.
type Result ¶
type Result struct {
// IANA name of the detected charset.
Charset string
// IANA name of the detected language. It may be empty for some charsets.
Language string
// Confidence of the Result. Scale from 1 to 100. The bigger, the more confident.
Confidence int
}
Result contains all the information that charset detector gives.
Click to show internal directories.
Click to hide internal directories.