Documentation
¶
Overview ¶
Package categorizer classifies pod failures into named categories based on configurable rules. It runs at the executor, where full Kubernetes pod status is available. The resulting category names are included in the FailureInfo proto attached to error events.
Configuration ¶
Categories are defined in the executor config under application.errorCategories. Each category has a name and one or more rules. Rules within a category are OR'd: if any rule matches, the category is included. A pod can match multiple categories.
Each rule uses exactly one matcher:
- OnConditions: matches Kubernetes failure signals (OOMKilled, Evicted, DeadlineExceeded, AppError)
- OnExitCodes: matches non-zero container exit codes using In/NotIn set operators
- OnTerminationMessage: matches container termination messages against a regex
Exit code 0 is always skipped. Both regular and init containers are checked.
Example ¶
application:
errorCategories:
- name: oom
rules:
- onConditions: ["OOMKilled"]
- name: cuda_error
rules:
- onExitCodes:
operator: In
values: [74, 75]
- onTerminationMessage:
pattern: "(?i)cuda.*error"
- name: transient_infra
rules:
- onConditions: ["Evicted"]
- onExitCodes:
operator: In
values: [137]
Validation ¶
NewClassifier validates all config upfront: unknown condition strings, invalid exit code operators, empty value lists, and invalid regexes all return errors at construction time.
Usage ¶
classifier, err := categorizer.NewClassifier(config.ErrorCategories)
if err != nil {
// handle invalid config
}
categories := classifier.Classify(pod) // returns []string{"oom", "cuda_error"} or nil
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CategoryConfig ¶
type CategoryConfig struct {
Name string `yaml:"name"`
Rules []CategoryRule `yaml:"rules"`
}
CategoryConfig defines a named error category with rules that match against pod failure signals. When any rule matches, the category name is included in the FailureInfo.categories field of the error event.
type CategoryRule ¶
type CategoryRule struct {
ContainerName string `yaml:"containerName,omitempty"`
OnExitCodes *errormatch.ExitCodeMatcher `yaml:"onExitCodes,omitempty"`
OnTerminationMessage *errormatch.RegexMatcher `yaml:"onTerminationMessage,omitempty"`
OnConditions []string `yaml:"onConditions,omitempty"`
}
CategoryRule defines a single matching condition. Exactly one matcher must be set per rule (validated by NewClassifier). Rules within a category are OR'd. When ContainerName is set, only failures from that container are considered. When empty, failures from any container can match (default).
type Classifier ¶
type Classifier struct {
// contains filtered or unexported fields
}
Classifier evaluates pods against a set of category rules and returns the names of all matching categories.
func NewClassifier ¶
func NewClassifier(configs []CategoryConfig) (*Classifier, error)
NewClassifier validates config and compiles regex patterns. Returns an error if any regex is invalid, a condition is unknown, or an exit code matcher has an invalid operator.