cdt

package
v1.2.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 25, 2025 License: MIT Imports: 5 Imported by: 0

README

cdt

cdt is library to automatically detect charset of texts for Go programming language. It's based on the algorithm and data in ICU's implementation.

This package was created by saintfish. Forked by the pango project in order to incorporate bugfixes and new features.

Documentation

Overview

Package cdt ports character set detection from ICU.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrNotDetected = errors.New("cdt: charset not detected")
)

Functions

This section is empty.

Types

type Detector

type Detector struct {
	// contains filtered or unexported fields
}

Detector implements charset detection.

Example
package main

import (
	"fmt"
)

var (
	zh_gb18030_text = []byte{
		71, 111, 202, 199, 71, 111, 111, 103, 108, 101, 233, 95, 176, 108, 181, 196, 210, 187, 214, 214, 190, 142, 215, 103, 208, 205, 163, 172, 129, 75, 176, 108,
		208, 205, 163, 172, 178, 162, 190, 223, 211, 208, 192, 172, 187, 248, 187, 216, 202, 213, 185, 166, 196, 220, 181, 196, 177, 224, 179, 204, 211, 239, 209, 212,
		161, 163, 10,
	}
)

func main() {
	detector := NewTextDetector()
	result, err := detector.DetectBest(zh_gb18030_text)
	if err == nil {
		fmt.Printf(
			"Detected charset is %s, language is %s",
			result.Charset,
			result.Language)
	}
}
Output:

Detected charset is GB18030, language is zh

func NewDetector

func NewDetector(html ...bool) (d *Detector)

NewDetector creates a Detector for plain text or html.

func NewHtmlDetector

func NewHtmlDetector() *Detector

NewHtmlDetector creates a Detector for Html.

func NewTextDetector

func NewTextDetector() *Detector

NewTextDetector creates a Detector for plain text.

func (*Detector) DetectAll

func (d *Detector) DetectAll(b []byte) ([]Result, error)

DetectAll returns all Results which have non-zero Confidence. The Results are sorted by Confidence in descending order.

func (*Detector) DetectBest

func (d *Detector) DetectBest(b []byte) (*Result, error)

DetectBest returns the Result with highest Confidence.

type Result

type Result struct {
	// IANA name of the detected charset.
	Charset string
	// IANA name of the detected language. It may be empty for some charsets.
	Language string
	// Confidence of the Result. Scale from 1 to 100. The bigger, the more confident.
	Confidence int
}

Result contains all the information that charset detector gives.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL