classifier

package

v0.23.0 Latest Latest Go to latest Published: May 21, 2026 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/jeduden/mdsmith

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func AvgWordLength(tokens []string) float64
func CompressionRatio(tokens []string) float64
func EmbeddedArtifactBytes() int
func FuncWordRatio(tokens []string) float64
func LyAdverbDensity(tokens []string) float64
func NominalDensity(tokens []string) float64
func SentLenVariance(text string) float64
func TypeTokenRatio(tokens []string) float64
type LexiconCounts
type Model
- func LoadEmbedded() (*Model, error)
type Result

Constants ¶

View Source

const (
	EmbeddedArtifactPath   = "data/cue-linear.json"
	EmbeddedArtifactSHA256 = "0e54d9f4199aa3b7a46c7c7feed7e5e2d3ebe78b98421b9502e1ba670b78608c"
)

Embedded artifact metadata used for checksum and loading validation.

Variables ¶

This section is empty.

Functions ¶

func AvgWordLength ¶

func AvgWordLength(tokens []string) float64

AvgWordLength returns the mean character length of tokens. Returns 0.0 for an empty slice.

func CompressionRatio ¶

func CompressionRatio(tokens []string) float64

CompressionRatio estimates text redundancy using bigram repetition. It returns the fraction of repeated token bigrams. Higher values indicate more repetitive (redundant) text. Returns 0.0 if tokens has fewer than 2 entries.

func EmbeddedArtifactBytes ¶

func EmbeddedArtifactBytes() int

EmbeddedArtifactBytes returns the embedded artifact size in bytes.

func FuncWordRatio ¶

func FuncWordRatio(tokens []string) float64

FuncWordRatio returns the fraction of tokens that are function words (determiners, prepositions, conjunctions, pronouns). Returns 0.0 for an empty slice.

func LyAdverbDensity ¶

func LyAdverbDensity(tokens []string) float64

LyAdverbDensity returns the fraction of tokens ending in "ly" with length >= 4 (to exclude short words like "fly"). Returns 0.0 for an empty slice.

func NominalDensity ¶

func NominalDensity(tokens []string) float64

NominalDensity returns the fraction of tokens ending in common nominalization suffixes (-tion, -ment, -ness, -ity, -ance, -ence). Returns 0.0 for an empty slice.

func SentLenVariance ¶

func SentLenVariance(text string) float64

SentLenVariance splits text into sentences on `.`, `!`, `?` and returns the coefficient of variation (stddev / mean) of sentence word counts. Returns 0.0 when fewer than 2 sentences are found.

func TypeTokenRatio ¶

func TypeTokenRatio(tokens []string) float64

TypeTokenRatio returns the ratio of unique tokens to total tokens. Higher values indicate more varied vocabulary. Returns 0.0 for an empty slice.

Types ¶

type LexiconCounts ¶

type LexiconCounts struct {
	FillerWords    int
	ModalWords     int
	VagueWords     int
	ActionWords    int
	StopWords      int
	HedgePhrases   int
	VerbosePhrases int
}

LexiconCounts captures loaded cue-list sizes for quality and drift checks.

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model is a deterministic linear classifier loaded from an embedded artifact.

func LoadEmbedded ¶

func LoadEmbedded() (*Model, error)

LoadEmbedded loads and verifies the embedded model artifact.

func (*Model) Classify ¶

func (m *Model) Classify(text string) Result

Classify computes a deterministic risk score and binary label.

func (*Model) LexiconCounts ¶

func (m *Model) LexiconCounts() LexiconCounts

LexiconCounts returns the active cue-list sizes loaded from the artifact.

func (*Model) ModelID ¶

func (m *Model) ModelID() string

ModelID returns the artifact model identifier.

func (*Model) Threshold ¶

func (m *Model) Threshold() float64

Threshold returns the decision threshold.

func (*Model) Version ¶

func (m *Model) Version() string

Version returns the artifact version identifier.

type Result ¶

type Result struct {
	Label          string
	RiskScore      float64
	Threshold      float64
	ModelID        string
	Backend        string
	Version        string
	TriggeredCues  []string
	FeatureSummary map[string]float64
	WordCount      int
}

Result is the classifier decision contract for one paragraph.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL