Documentation
¶
Index ¶
- Variables
- func AddFilter(filter Filter)
- func ApplySegmentConfig(c *config.Config)
- func CleanStopWords(v string) string
- func CleanStopWordsFromSlice(v []string) (r []string)
- func DoFilter(v string) bool
- func Has(name string) bool
- func IsInitialized() bool
- func IsNop(segment Segment) bool
- func LoadStopWordsDict(stopWordsFile string, args ...bool)
- func NewAPI(apiURL string, apiKey string) *apiSegment
- func Register(name string, c func() Segment)
- func ReloadDict(dictFiles ...string) error
- func ResetSegment()
- func ResetStopwords()
- func SplitWords(b []byte, args ...string) []string
- func SplitWordsAsString(b []byte, args ...string) string
- func SplitWordsBy(b []byte, mode string, args ...string) []string
- func StopWords() []string
- func Unregister(name string)
- type Filter
- type Segment
Constants ¶
This section is empty.
Variables ¶
var ( DefaultEngine = atomic.NewString(`sego`) // gojieba / sego / jiebago Filters []Filter )
var ( // DictFile 分词词典文件 DictFile string )
Functions ¶
func AddFilter ¶
func AddFilter(filter Filter)
AddFilter appends a new filter to the global Filters collection.
func ApplySegmentConfig ¶
ApplySegmentConfig applies segment configuration from the given config object. It initializes or updates the segment engine based on the configuration. If the engine is 'api', it will register a new API segment with the provided URL and key. The function handles engine switching and cleanup of previous segment instances.
func CleanStopWords ¶
CleanStopWords removes all stop words from the input string and replaces them with spaces. It returns the cleaned string with stop words removed.
func CleanStopWordsFromSlice ¶
CleanStopWordsFromSlice removes stop words from the input string slice and returns a new slice containing only non-stop words. Stop words are loaded from StopWords() if not already initialized.
func DoFilter ¶
DoFilter checks if the given string passes all registered filters. Returns true if all filters accept the string, false otherwise.
func IsInitialized ¶
func IsInitialized() bool
IsInitialized reports whether the default segment has been initialized.
func LoadStopWordsDict ¶
LoadStopWordsDict loads stop words from the specified file into memory. If rebuild is true, it clears existing stop words before loading new ones. Non-empty lines in the file are treated as stop words after trimming whitespace.
func ResetSegment ¶
func ResetSegment()
ResetSegment closes the current segment if it exists and resets the segment initialization state. This allows for a new segment to be created on next use.
func ResetStopwords ¶
func ResetStopwords()
ResetStopwords resets the stopwords initialization state, allowing stopwords to be reloaded on next use.
func SplitWordsAsString ¶
SplitWordsAsString 将分词结果作为字串返回
func SplitWordsBy ¶
SplitWordsBy 按模式分词
func StopWords ¶
func StopWords() []string
StopWords returns the list of stop words that are loaded once and cached. The stop words are initialized on first call and returned from cache on subsequent calls.
func Unregister ¶
func Unregister(name string)
Unregister removes the segment with the specified name from the registry
Types ¶
type Segment ¶
type Segment interface {
//载入词典(词典路径,词典类型)
LoadDict(string, ...string) error
//分词(文本,词性)
Segment(string, ...string) []string
//分词(文本,分词模式,词性)
SegmentBy(string, string, ...string) []string
//关闭或释放资源
Close() error
}
Segment interface