classifier

package
v0.23.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 21, 2026 License: MIT Imports: 9 Imported by: 0

Documentation

Index

Constants

View Source
const (
	EmbeddedArtifactPath   = "data/cue-linear.json"
	EmbeddedArtifactSHA256 = "0e54d9f4199aa3b7a46c7c7feed7e5e2d3ebe78b98421b9502e1ba670b78608c"
)

Embedded artifact metadata used for checksum and loading validation.

Variables

This section is empty.

Functions

func AvgWordLength

func AvgWordLength(tokens []string) float64

AvgWordLength returns the mean character length of tokens. Returns 0.0 for an empty slice.

func CompressionRatio

func CompressionRatio(tokens []string) float64

CompressionRatio estimates text redundancy using bigram repetition. It returns the fraction of repeated token bigrams. Higher values indicate more repetitive (redundant) text. Returns 0.0 if tokens has fewer than 2 entries.

func EmbeddedArtifactBytes

func EmbeddedArtifactBytes() int

EmbeddedArtifactBytes returns the embedded artifact size in bytes.

func FuncWordRatio

func FuncWordRatio(tokens []string) float64

FuncWordRatio returns the fraction of tokens that are function words (determiners, prepositions, conjunctions, pronouns). Returns 0.0 for an empty slice.

func LyAdverbDensity

func LyAdverbDensity(tokens []string) float64

LyAdverbDensity returns the fraction of tokens ending in "ly" with length >= 4 (to exclude short words like "fly"). Returns 0.0 for an empty slice.

func NominalDensity

func NominalDensity(tokens []string) float64

NominalDensity returns the fraction of tokens ending in common nominalization suffixes (-tion, -ment, -ness, -ity, -ance, -ence). Returns 0.0 for an empty slice.

func SentLenVariance

func SentLenVariance(text string) float64

SentLenVariance splits text into sentences on `.`, `!`, `?` and returns the coefficient of variation (stddev / mean) of sentence word counts. Returns 0.0 when fewer than 2 sentences are found.

func TypeTokenRatio

func TypeTokenRatio(tokens []string) float64

TypeTokenRatio returns the ratio of unique tokens to total tokens. Higher values indicate more varied vocabulary. Returns 0.0 for an empty slice.

Types

type LexiconCounts

type LexiconCounts struct {
	FillerWords    int
	ModalWords     int
	VagueWords     int
	ActionWords    int
	StopWords      int
	HedgePhrases   int
	VerbosePhrases int
}

LexiconCounts captures loaded cue-list sizes for quality and drift checks.

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model is a deterministic linear classifier loaded from an embedded artifact.

func LoadEmbedded

func LoadEmbedded() (*Model, error)

LoadEmbedded loads and verifies the embedded model artifact.

func (*Model) Classify

func (m *Model) Classify(text string) Result

Classify computes a deterministic risk score and binary label.

func (*Model) LexiconCounts

func (m *Model) LexiconCounts() LexiconCounts

LexiconCounts returns the active cue-list sizes loaded from the artifact.

func (*Model) ModelID

func (m *Model) ModelID() string

ModelID returns the artifact model identifier.

func (*Model) Threshold

func (m *Model) Threshold() float64

Threshold returns the decision threshold.

func (*Model) Version

func (m *Model) Version() string

Version returns the artifact version identifier.

type Result

type Result struct {
	Label          string
	RiskScore      float64
	Threshold      float64
	ModelID        string
	Backend        string
	Version        string
	TriggeredCues  []string
	FeatureSummary map[string]float64
	WordCount      int
}

Result is the classifier decision contract for one paragraph.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL