Documentation
¶
Index ¶
- Constants
- type APIConfig
- type APISettings
- type AlertChannelSettings
- type InfoChannelSettings
- type IssueReaction
- type IssueReactionSettings
- type IssueStatusSettings
- type ManagerConfig
- type ManagerSettings
- func (s *ManagerSettings) Clone() (*ManagerSettings, error)
- func (s *ManagerSettings) GetInfoChannelConfig(channelID string) (*InfoChannelSettings, bool)
- func (s *ManagerSettings) InitAndValidate() error
- func (s *ManagerSettings) IsInfoChannel(channelID string) bool
- func (s *ManagerSettings) IssueProcessingInterval(channelID string) time.Duration
- func (s *ManagerSettings) MapSlackPostReaction(reaction string) IssueReaction
- func (s *ManagerSettings) OrderIssuesBySeverity(channelID string, openIssueCount int) bool
- func (s *ManagerSettings) UserIsChannelAdmin(ctx context.Context, channelID, userID string, ...) bool
- func (s *ManagerSettings) UserIsGlobalAdmin(userID string) bool
- type RateLimitConfig
- type RoutingRule
- type SlackClientConfig
Constants ¶
const ( // EncryptionKeyLength is the required length for the encryption key. // The key must be exactly 32 alphanumeric characters to work with AES-256 encryption. EncryptionKeyLength = 32 // MinRestPort is the minimum valid TCP port number. Port 0 is reserved and cannot be used. MinRestPort = 1 // MaxRestPort is the maximum valid TCP port number. Ports above 65535 do not exist // in the TCP/IP protocol. MaxRestPort = 65535 // MinMaxUsersInAlertChannel is the minimum allowed value for MaxUsersInAlertChannel. // At least one user must be allowed in a channel for alerts to be useful. MinMaxUsersInAlertChannel = 1 // MaxMaxUsersInAlertChannel is the maximum allowed value for MaxUsersInAlertChannel. // This limit prevents accidental alerts to large public channels which could cause // spam and excessive Slack API usage. The value of 10,000 accommodates most // legitimate use cases while still providing protection. MaxMaxUsersInAlertChannel = 10000 )
Validation constants for APIConfig fields.
const ( // MinAlertsPerSecond is the minimum token refill rate. // A value of 0.001 means 1 token per 1000 seconds (~16.7 minutes), which is // useful for very strict rate limiting during incidents or testing. MinAlertsPerSecond = 0.001 // MaxAlertsPerSecond is the maximum token refill rate. // A value of 1000 allows up to 1000 alerts per second per channel, which is // more than sufficient for any realistic alerting workload. MaxAlertsPerSecond = 1000 // MinAllowedBurst is the minimum bucket capacity. // At least 1 token must be allowed, otherwise no alerts could ever be sent. MinAllowedBurst = 1 // MaxAllowedBurst is the maximum bucket capacity. // A burst of 10,000 alerts allows handling large batch imports or incident // recovery scenarios while still providing meaningful rate protection. MaxAllowedBurst = 10000 // MinMaxRequestWaitTime is the minimum time a request will wait for tokens. // At least 1 second is required to allow the rate limiter to function. MinMaxRequestWaitTime = 1 * time.Second // MaxMaxRequestWaitTime is the maximum time a request will wait for tokens. // Waiting longer than 5 minutes is impractical as HTTP clients and load // balancers typically have shorter timeouts. MaxMaxRequestWaitTime = 5 * time.Minute )
Validation constants for RateLimitConfig fields. These limits ensure the rate limiter operates within reasonable bounds.
const ( // MinDrainTimeout is the minimum allowed value for drain timeout fields. // This must be at least 2 seconds because internal operations (message acknowledgment, // negative acknowledgment, and distributed lock release) each have 2-second timeouts. // Setting a drain timeout shorter than this could cause message loss during shutdown. MinDrainTimeout = 2 * time.Second // MaxDrainTimeout is the maximum allowed value for drain timeout fields. // A 5-minute maximum prevents excessively long shutdown times that could delay // deployments or cause orchestration systems (like Kubernetes) to forcefully // terminate the process. In practice, if draining takes longer than a few minutes, // there is likely a deeper issue that won't be resolved by waiting longer. MaxDrainTimeout = 5 * time.Minute // DefaultCoordinatorDrainTimeout is the default drain timeout for the coordinator. // 5 seconds provides enough time to process in-flight messages while keeping // shutdown times reasonable for typical deployments. DefaultCoordinatorDrainTimeout = 5 * time.Second // DefaultChannelManagerDrainTimeout is the default drain timeout for channel managers. // 3 seconds is typically sufficient for channel managers, which have smaller internal // buffers than the coordinator. This is intentionally shorter than the coordinator // timeout to allow for sequential draining if needed. DefaultChannelManagerDrainTimeout = 3 * time.Second // MinSocketModeMaxWorkers is the minimum allowed value for concurrent socket mode handlers. // At least 10 workers ensures the system can handle basic event processing even under // constrained resource environments. MinSocketModeMaxWorkers = int64(10) // MaxSocketModeMaxWorkers is the maximum allowed value for concurrent socket mode handlers. // 1000 workers is a reasonable upper bound that prevents excessive goroutine spawning // while still allowing high-throughput event processing. MaxSocketModeMaxWorkers = int64(1000) // DefaultSocketModeMaxWorkers is the default number of concurrent socket mode handlers. // 100 workers provides a good balance between throughput and resource usage for // typical Slack workspaces. DefaultSocketModeMaxWorkers = int64(100) // DefaultSocketModeDrainTimeout is the default drain timeout for socket mode handlers. // 5 seconds provides enough time for most handlers to complete their work (Slack API // calls, queue writes) during graceful shutdown. DefaultSocketModeDrainTimeout = 5 * time.Second )
Validation constants for ManagerConfig drain timeout fields. Drain timeouts control how long the system waits during graceful shutdown to process remaining messages before forcefully terminating.
const ( // DefaultPostIconEmoji is the default emoji shown as the bot's icon in Slack posts. // This appears next to the bot's username in the message header. Must be in :emoji: format. // Can be overridden per-alert via the alert payload's iconEmoji field. DefaultPostIconEmoji = ":female-detective:" // DefaultPostUsername is the default display name for the bot in Slack posts. // This appears as the sender name in the message header. Can be overridden per-alert // via the alert payload's username field. DefaultPostUsername = "Slack Manager" // DefaultAlertSeverity is the severity assigned to alerts that don't specify one. // Valid severities are: panic (highest), error, warning, info (lowest). // Severity affects issue ordering when reordering is enabled - higher severity // issues appear at the bottom of the channel for visibility. DefaultAlertSeverity = types.AlertError // DefaultAppFriendlyName is the application name shown in Slack messages and modals. // Used in greeting messages, help text, and error dialogs to identify the bot. DefaultAppFriendlyName = "Slack Manager" )
Default values for ManagerSettings general fields. These defaults are applied during InitAndValidate() when fields are empty or zero.
const ( // DefaultIssueArchivingDelaySeconds is the default delay before archiving resolved issues. // 12 hours provides a reasonable window for team members across time zones to re-open // issues if needed before they are permanently archived. DefaultIssueArchivingDelaySeconds = 12 * 3600 // 12 hours // MinIssueArchivingDelaySeconds is the minimum allowed archiving delay. // At least 1 minute is required to give users a chance to re-open issues via new // alerts before the issue is permanently archived. MinIssueArchivingDelaySeconds = 60 // 1 minute // MaxIssueArchivingDelaySeconds is the maximum allowed archiving delay. // 30 days prevents issues from remaining in a "resolved but re-openable" state // indefinitely. Longer delays provide diminishing value as the likelihood of // needing to re-open an old resolved issue decreases over time. MaxIssueArchivingDelaySeconds = 30 * 24 * 3600 // 30 days )
Issue archiving delay constants control when resolved issues are automatically archived.
When an issue is resolved (via emoji reaction or auto-resolve), it remains in an "open" state for the archiving delay period before being marked as archived. This gives team members time to re-open the issue if needed. Archived issues retain their Slack posts but are marked as archived in the database to prevent re-opening via new alerts.
const ( // DefaultIssueReorderingLimit is the default maximum number of open issues in a channel // before reordering is automatically disabled. With 30 issues, reordering operations // remain responsive and API usage stays reasonable. DefaultIssueReorderingLimit = 30 // MinIssueReorderingLimit is the minimum allowed reordering limit. // Below 5 issues, reordering provides minimal visual benefit since users can easily // scan a small number of messages regardless of order. MinIssueReorderingLimit = 5 // MaxIssueReorderingLimit is the maximum allowed reordering limit. // Above 100 issues, the API calls required for reordering become excessive and may // trigger Slack rate limits, causing delays in issue processing. MaxIssueReorderingLimit = 100 )
Issue reordering constants control automatic severity-based ordering of issues in channels.
When reordering is enabled, issues are automatically arranged so higher-severity issues (panic, error) appear at the bottom of the channel where they are most visible. This is achieved by deleting and reposting Slack messages, which consumes API quota.
Reordering is automatically disabled when the number of open issues exceeds the limit, preventing excessive API calls during incident storms. Once the issue count drops below the limit, reordering resumes automatically.
const ( // DefaultIssueProcessingIntervalSeconds is the default interval between processing runs. // 10 seconds provides responsive issue management while limiting API usage. DefaultIssueProcessingIntervalSeconds = 10 // MinIssueProcessingIntervalSeconds is the minimum allowed processing interval. // Intervals shorter than 3 seconds may cause excessive Slack API calls during // high-activity periods and risk hitting rate limits. MinIssueProcessingIntervalSeconds = 3 // MaxIssueProcessingIntervalSeconds is the maximum allowed processing interval. // Intervals longer than 10 minutes cause unacceptable delays in auto-resolution, // archiving, and escalation, making the system feel unresponsive. MaxIssueProcessingIntervalSeconds = 600 // 10 minutes )
Issue processing interval constants control the frequency of background issue maintenance.
Each channel manager runs a periodic processing loop that handles:
- Reordering issues by severity (when enabled and issue count is below limit)
- Auto-resolving issues that have been quiet for their configured auto-resolve period
- Archiving resolved issues after their archiving delay has elapsed
- Escalating issues that have triggered escalation rules
The interval can be configured globally or per-channel via AlertChannelSettings.
const ( // DefaultMinIssueCountForThrottle is the minimum number of open issues in a channel // before throttling is considered. With fewer than 5 issues, API usage is manageable // and immediate updates provide better user experience. DefaultMinIssueCountForThrottle = 5 // MinMinIssueCountForThrottle is the minimum allowed threshold for throttling. // Setting this to 1 means throttling can activate with just one open issue, which // may be useful for channels with very strict API limits. MinMinIssueCountForThrottle = 1 // MaxMinIssueCountForThrottle is the maximum allowed threshold for throttling. // Setting this above 100 effectively disables throttling in most scenarios, which // may be appropriate for low-volume channels with generous API limits. MaxMinIssueCountForThrottle = 100 // DefaultMaxThrottleDurationSeconds is the maximum delay between updates to a single issue. // 90 seconds provides meaningful batching during storms while ensuring updates are // eventually delivered within a reasonable timeframe. DefaultMaxThrottleDurationSeconds = 90 // MinMaxThrottleDurationSeconds is the minimum allowed maximum throttle duration. // At least 1 second is required for throttling to have any effect. MinMaxThrottleDurationSeconds = 1 // MaxMaxThrottleDurationSeconds is the maximum allowed throttle duration. // Delays longer than 10 minutes make the system feel unresponsive and may cause // users to miss important updates about ongoing issues. MaxMaxThrottleDurationSeconds = 600 // 10 minutes )
Throttle constants control rate limiting of Slack post updates during high-activity periods.
Throttling prevents excessive API calls when many issues are being updated simultaneously, which can occur during incident storms or when a monitoring system sends many alerts at once. The throttle algorithm delays updates to individual issue posts based on:
- The number of open issues in the channel (throttling only activates above a threshold)
- The number of alerts already received for each issue (more alerts = longer delays)
- Whether the alert text has changed (text changes reduce the delay)
This allows critical first alerts to post immediately while batching rapid updates to the same issue, reducing visual noise and API usage.
const ( // DefaultIssueTerminateEmoji triggers immediate issue termination without resolution. // Use for false positives or issues that should be dismissed without tracking. DefaultIssueTerminateEmoji = ":firecracker:" // DefaultIssueResolveEmoji marks an issue as resolved. The issue remains visible // for the configured archiving delay before being archived. DefaultIssueResolveEmoji = ":white_check_mark:" // DefaultIssueInvestigateEmoji marks the user as investigating the issue. // The investigator's name appears in the issue post, signaling to others that // someone is actively working on it. DefaultIssueInvestigateEmoji = ":eyes:" // DefaultIssueMuteEmoji mutes an issue, preventing escalation notifications while // keeping the issue visible. Useful for known issues being monitored. DefaultIssueMuteEmoji = ":mask:" // DefaultIssueShowOptionButtonsEmoji reveals the interactive option buttons on the // issue post, providing quick access to actions without using emoji reactions. DefaultIssueShowOptionButtonsEmoji = ":information_source:" )
Default emoji constants for issue reaction commands.
Users interact with issues by adding emoji reactions to Slack posts. Each reaction type triggers a specific action. Multiple emojis can be configured for each action via IssueReactionSettings, allowing teams to use their preferred emoji while maintaining consistent behavior.
Actions requiring channel admin permissions: terminate, resolve, mute. Actions available to all users: investigate, show option buttons.
const ( // DefaultPanicEmoji indicates a panic-level issue (highest severity). DefaultPanicEmoji = ":scream:" // DefaultErrorEmoji indicates an error-level issue. DefaultErrorEmoji = ":x:" // DefaultWarningEmoji indicates a warning-level issue. DefaultWarningEmoji = ":warning:" // DefaultInfoEmoji indicates an info-level issue (lowest severity). DefaultInfoEmoji = ":information_source:" // DefaultMutePanicEmoji indicates a muted panic-level issue. DefaultMutePanicEmoji = ":no_bell:" // DefaultMuteErrorEmoji indicates a muted error-level issue. DefaultMuteErrorEmoji = ":no_bell:" // DefaultMuteWarningEmoji indicates a muted warning-level issue. DefaultMuteWarningEmoji = ":no_bell:" // DefaultInconclusiveEmoji indicates an issue resolved as inconclusive // (neither confirmed fixed nor confirmed false positive). DefaultInconclusiveEmoji = ":grey_question:" // DefaultResolvedEmoji indicates a successfully resolved issue. DefaultResolvedEmoji = ":white_check_mark:" )
Default emoji constants for issue status indicators.
Status emojis appear in issue posts to visually indicate the current severity and state. They help users quickly scan a channel to identify high-priority issues. Muted emojis indicate issues that are being tracked but won't trigger escalations.
const ( // DefaultConcurrency is the default number of concurrent Slack API requests. // A value of 3 balances throughput with Slack API rate limits. Higher values // may trigger more rate limit responses; lower values reduce throughput. DefaultConcurrency = 3 // DefaultMaxAttemptsForRateLimitError is the default number of retry attempts // for rate limit (429) errors. Slack rate limits are usually short-lived, // so 10 attempts with exponential backoff typically succeeds. DefaultMaxAttemptsForRateLimitError = 10 // DefaultMaxAttemptsForTransientError is the default number of retry attempts // for transient errors (network timeouts, 5xx responses). These errors often // resolve quickly, so 5 attempts is usually sufficient. DefaultMaxAttemptsForTransientError = 5 // DefaultMaxAttemptsForFatalError is the default number of retry attempts for // errors that are typically permanent (4xx except 429). While these usually // indicate a problem that won't resolve with retries, a few attempts help // handle edge cases like brief authentication issues. DefaultMaxAttemptsForFatalError = 5 // DefaultMaxRateLimitErrorWaitTimeSeconds is the maximum time (in seconds) to // wait between retries for rate limit errors. Slack typically indicates a // Retry-After header, but this caps the wait to prevent excessive delays. DefaultMaxRateLimitErrorWaitTimeSeconds = 120 // DefaultMaxTransientErrorWaitTimeSeconds is the maximum time (in seconds) to // wait between retries for transient errors. Shorter than rate limit waits // since transient issues often resolve quickly. DefaultMaxTransientErrorWaitTimeSeconds = 30 // DefaultMaxFatalErrorWaitTimeSeconds is the maximum time (in seconds) to wait // between retries for fatal errors. Kept short since fatal errors rarely // resolve on their own. DefaultMaxFatalErrorWaitTimeSeconds = 30 // DefaultHTTPTimeoutSeconds is the default HTTP client timeout in seconds. // 30 seconds accommodates slow Slack API responses while not hanging indefinitely. DefaultHTTPTimeoutSeconds = 30 )
Default values for SlackClientConfig fields. These provide sensible defaults for typical production deployments.
const ( // MinConcurrency is the minimum allowed concurrency value. // At least 1 concurrent request is required to make any API calls. MinConcurrency = 1 // MaxConcurrency is the maximum allowed concurrency value. // Higher values risk overwhelming Slack's API and triggering aggressive rate limiting. // 50 concurrent requests is more than sufficient for any realistic workload. MaxConcurrency = 50 // MinMaxAttempts is the minimum allowed value for retry attempt fields. // At least 1 attempt is required to make any request. MinMaxAttempts = 1 // MaxMaxAttempts is the maximum allowed value for retry attempt fields. // More than 100 attempts indicates a configuration error or misunderstanding // of retry behavior. Extended retries should use longer wait times, not more attempts. MaxMaxAttempts = 100 // MinWaitTimeSeconds is the minimum allowed wait time between retries. // At least 1 second prevents tight retry loops that could overwhelm the API. MinWaitTimeSeconds = 1 // MaxWaitTimeSeconds is the maximum allowed wait time between retries. // Waiting longer than 10 minutes between retries is impractical for most use cases. MaxWaitTimeSeconds = 600 // MinHTTPTimeoutSeconds is the minimum HTTP client timeout. // At least 1 second is needed for any network operation. MinHTTPTimeoutSeconds = 1 // MaxHTTPTimeoutSeconds is the maximum HTTP client timeout. // Timeouts longer than 5 minutes typically indicate a configuration error. // Most Slack API calls should complete within seconds. MaxHTTPTimeoutSeconds = 300 )
Validation bounds for SlackClientConfig numeric fields. These ensure configuration values are within reasonable operational ranges.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type APIConfig ¶
type APIConfig struct {
// LogJSON controls the log output format. When true, logs are written as structured
// JSON objects suitable for log aggregation systems like Elasticsearch or Splunk.
// When false, logs are written in a human-readable text format for local development.
LogJSON bool `json:"logJson" yaml:"logJson"`
// Verbose enables debug-level logging when true. This includes detailed information
// about request processing, rate limiting decisions, and internal state changes.
// Should be disabled in production to reduce log volume and avoid exposing
// sensitive information.
Verbose bool `json:"verbose" yaml:"verbose"`
// RestPort specifies the TCP port the HTTP server listens on. The value is stored
// as a string to allow easy configuration from environment variables. Common values
// are "8080" for development and "80" or "443" for production (though TLS termination
// is typically handled by a reverse proxy).
RestPort string `json:"restPort" yaml:"restPort"`
// EncryptionKey is a 32-character alphanumeric key used for AES-256 encryption of
// sensitive data such as webhook payloads. This key must be identical in both the
// API and Manager configurations to ensure encrypted data can be decrypted correctly.
// Generate a secure random key for production use; never use predictable values.
EncryptionKey string `json:"encryptionKey" yaml:"encryptionKey"`
// CacheKeyPrefix is prepended to all Redis cache keys to namespace them. Using the
// same prefix in both the API and Manager allows them to share cached data (such as
// channel information), reducing Slack API calls. Use different prefixes if running
// multiple independent Slack Manager instances against the same Redis cluster.
CacheKeyPrefix string `json:"cacheKeyPrefix" yaml:"cacheKeyPrefix"`
// ErrorReportChannelID is the Slack channel where the API posts error reports for
// failed requests (4xx and 5xx responses). This is useful for monitoring API health
// and debugging integration issues. Leave empty to disable error reporting.
// The channel must exist and the bot must be a member.
ErrorReportChannelID string `json:"errorReportChannelId" yaml:"errorReportChannelId"`
// MaxUsersInAlertChannel sets the maximum number of members allowed in a channel
// that receives alerts. The API returns HTTP 400 if an alert targets a channel
// exceeding this limit. This prevents accidental alerts to large public channels
// (like #general) which could spam many users and trigger Slack API rate limits.
MaxUsersInAlertChannel int `json:"maxUsersInAlertChannel" yaml:"maxUsersInAlertChannel"`
// RateLimitPerAlertChannel configures the token bucket rate limiter that controls
// how many alerts can be sent to each Slack channel. Each channel has its own
// independent rate limiter. See RateLimitConfig for detailed documentation.
RateLimitPerAlertChannel *RateLimitConfig `json:"rateLimitPerAlertChannel" yaml:"rateLimitPerAlertChannel"`
// SlackClient contains configuration for connecting to the Slack API, including
// authentication tokens and API behavior settings.
SlackClient *SlackClientConfig `json:"slackClient" yaml:"slackClient"`
}
APIConfig holds configuration for the Slack Manager REST API server.
These settings are read at startup and cannot be changed without restarting the server. For runtime-configurable settings, see APISettings which can be reloaded dynamically.
func NewDefaultAPIConfig ¶
func NewDefaultAPIConfig() *APIConfig
NewDefaultAPIConfig returns an APIConfig populated with sensible default values.
The defaults are configured for a typical production deployment:
- JSON logging enabled for log aggregation compatibility
- Verbose logging disabled to reduce noise
- Port 8080 (standard non-privileged HTTP port)
- Rate limiting: 1 alert/second sustained, 30 alert burst capacity, 15s max wait
- Max 100 users per alert channel (prevents accidental spam to large channels)
The EncryptionKey is intentionally left empty and must be set before use. The ErrorReportChannelID is also empty; set it to enable error reporting.
func (*APIConfig) Validate ¶
Validate checks that all required fields are present and all values are within acceptable ranges. It returns a descriptive error for the first validation failure encountered, or nil if the configuration is valid.
Validation includes:
- RestPort: must be a valid port number (1-65535)
- EncryptionKey: if non-empty, must be exactly 32 alphanumeric characters
- CacheKeyPrefix: must not be empty
- MaxUsersInAlertChannel: must be between 1 and 10,000
- RateLimitPerAlertChannel: must not be nil, and all fields must be valid
- SlackClient: must not be nil, and must pass its own validation
type APISettings ¶
type APISettings struct {
// RoutingRules defines how alerts with route keys are mapped to Slack channels.
// Rules are evaluated in precedence order: exact match, prefix match, regex match,
// then match-all. Within each precedence level, rules with a matching AlertType
// take priority over rules with no AlertType specified.
//
// If no rules match an alert's route key, the API returns an error. Consider adding
// a catch-all rule (MatchAll: true) as the last rule to handle unmatched alerts.
RoutingRules []*RoutingRule `json:"routingRules" yaml:"routingRules"`
// contains filtered or unexported fields
}
APISettings contains runtime configuration for the REST API server.
These settings control how incoming alerts are routed to Slack channels when alerts specify a route key rather than a direct channel ID. Unlike APIConfig (which requires a restart to change), APISettings can be updated at runtime via the Server's update method.
Alert Routing ¶
Alerts can specify their destination in two ways:
- Direct channel ID: The alert's slackChannelID field contains a channel ID (e.g., "C1234567890")
- Route key: The alert's routeKey field contains a logical identifier that is matched against RoutingRules to determine the target channel
When an alert uses a route key, the API evaluates RoutingRules in precedence order (exact match > prefix match > regex match > match-all) to find the target channel. See the Match method and RoutingRule documentation for details.
Immutability ¶
Settings passed to the API Server are cloned internally to prevent external modifications from affecting the running system. Once settings are passed to the Server, any subsequent changes to the original struct will have no effect. To update settings, create a new APISettings instance and pass it to the Server's update method.
func (*APISettings) Clone ¶
func (s *APISettings) Clone() (*APISettings, error)
Clone creates a deep copy of the APISettings by marshaling to JSON and back. The returned clone is NOT initialized - InitAndValidate() must be called before use.
This method is used internally by the API Server to ensure that external modifications to the original settings struct do not affect the running system.
Only exported fields are copied. Unexported fields (compiled regexes, cache, mutex, initialized flag) are intentionally excluded since InitAndValidate() rebuilds them.
func (*APISettings) InitAndValidate ¶
func (s *APISettings) InitAndValidate(logger types.Logger) error
InitAndValidate initializes internal data structures and validates all routing rules.
This method performs the following operations:
- Normalizes all string values (trims whitespace, converts to lowercase for matching)
- Validates that each rule has a unique, non-empty name
- Compiles regular expressions for MatchesRegex patterns
- Validates that each rule has at least one matching mechanism configured
- Validates that all channel IDs are valid Slack channel ID format
- Initializes the route match cache for performance
This method is idempotent - calling it multiple times on an already-initialized APISettings has no effect. The API Server calls this internally when settings are updated, so external callers typically don't need to call it directly.
Returns an error describing the first validation failure encountered, or nil if valid.
func (*APISettings) Match ¶
Match finds the target Slack channel for an alert based on its route key and alert type.
The method evaluates routing rules in precedence order to find the best match:
- Exact match (Equals) - highest precedence
- Prefix match (HasPrefix)
- Regex match (MatchesRegex)
- Match-all rule (MatchAll) - lowest precedence, used as catch-all
Within each precedence level, rules with a matching AlertType take priority over rules with no AlertType configured. This allows different alert types to be routed to different channels even when they share the same route key.
Results are cached in a thread-safe map for performance, as route matching occurs frequently during alert processing. Both the route key and alert type are normalized to lowercase before matching.
Returns the target channel ID and true if a match is found, or ("", false) if no routing rule matches the given route key and alert type.
type AlertChannelSettings ¶
type AlertChannelSettings struct {
// ID is the Slack channel ID (e.g., "C1234567890"). This must be the channel's
// internal ID, not its display name. Required and must be unique across all
// AlertChannels and InfoChannels.
ID string `json:"id" yaml:"id"`
// AdminUsers lists Slack user IDs (e.g., "U1234567890") with admin permissions
// in this channel. These users can resolve, terminate, and mute issues in this
// channel only. For cross-channel admin access, use ManagerSettings.GlobalAdmins.
AdminUsers []string `json:"adminUsers" yaml:"adminUsers"`
// AdminGroups lists Slack user group IDs whose members have admin permissions in
// this channel. Group membership is checked dynamically via the Slack API. This
// allows permission management through Slack groups rather than individual user IDs.
AdminGroups []string `json:"adminGroups" yaml:"adminGroups"`
// DisableIssueReordering overrides the global DisableIssueReordering setting for
// this channel. When true, issues in this channel are displayed in creation order
// regardless of severity, even if global reordering is enabled.
DisableIssueReordering bool `json:"disableIssueReordering" yaml:"disableIssueReordering"`
// IssueReorderingLimit overrides the global IssueReorderingLimit for this channel.
// When the number of open issues exceeds this limit, reordering is temporarily
// disabled for this channel. Set to 0 to use the global default.
// Must be between 5 and 100 if specified.
IssueReorderingLimit int `json:"issueReorderingLimit" yaml:"issueReorderingLimit"`
// IssueProcessingIntervalSeconds overrides the global IssueProcessingIntervalSeconds
// for this channel. Controls how frequently background maintenance runs for this
// channel's issues. Set to 0 to use the global default.
// Must be between 3 and 600 seconds if specified.
IssueProcessingIntervalSeconds int `json:"issueProcessingIntervalSeconds" yaml:"issueProcessingIntervalSeconds"`
// contains filtered or unexported fields
}
AlertChannelSettings provides per-channel configuration overrides for alert channels.
Use this to customize admin permissions and processing behavior for specific channels without affecting the global defaults. Channels not configured here use the global ManagerSettings values.
Admin Hierarchy ¶
A user is considered an admin for this channel if any of the following are true:
- The user is listed in ManagerSettings.GlobalAdmins (global admin)
- The user is listed in this channel's AdminUsers
- The user is a member of a group listed in this channel's AdminGroups
Channel-specific admins cannot perform actions in other channels unless they are also global admins or admins of those specific channels.
type InfoChannelSettings ¶
type InfoChannelSettings struct {
// ID is the Slack channel ID (e.g., "C1234567890"). Required and must be unique
// across all AlertChannels and InfoChannels.
ID string `json:"channelId" yaml:"channelId"`
// TemplatePath is the file path to a Go template file used to render content
// for this info channel. The template has access to system status data and is
// re-rendered periodically. Required.
TemplatePath string `json:"templatePath" yaml:"templatePath"`
}
InfoChannelSettings configures a channel designated for system information display.
Info channels cannot receive alerts - the API returns an error if an alert targets an info channel. Use info channels for:
- System status dashboards
- Documentation or help content
- Operational announcements
The Manager can post templated content to info channels based on the configured template path, which is rendered using the Go template engine with access to system status information.
type IssueReaction ¶
type IssueReaction string
const ( // IssueReactionTerminate is the reaction used to indicate that an issue should be terminated. IssueReactionTerminate IssueReaction = "terminate" // IssueReactionResolve is the reaction used to indicate that an issue has been resolved. IssueReactionResolve IssueReaction = "resolve" // IssueReactionInvestigate is the reaction used to indicate that an issue is being investigated. IssueReactionInvestigate IssueReaction = "investigate" // IssueReactionMute is the reaction used to indicate that an issue has been muted. IssueReactionMute IssueReaction = "mute" // IssueReactionShowOptionButtons is the reaction used to indicate that option buttons should be shown for an issue. IssueReactionShowOptionButtons IssueReaction = "show_option_buttons" )
type IssueReactionSettings ¶
type IssueReactionSettings struct {
// TerminateEmojis lists emojis that trigger immediate issue termination. Terminated
// issues are removed without being marked as resolved - use for false positives or
// issues that should be dismissed without tracking. Requires admin permissions.
// Default: [":firecracker:"]
TerminateEmojis []string `json:"terminateEmojis" yaml:"terminateEmojis"`
// ResolveEmojis lists emojis that mark an issue as resolved. Resolved issues can be
// re-opened by new alerts until the archiving delay elapses. Requires admin.
// Default: [":white_check_mark:"]
ResolveEmojis []string `json:"resolveEmojis" yaml:"resolveEmojis"`
// InvestigateEmojis lists emojis that mark the reacting user as investigating the
// issue. The user's name appears in the issue post to signal that someone is working
// on it. Available to all users. Default: [":eyes:"]
InvestigateEmojis []string `json:"investigateEmojis" yaml:"investigateEmojis"`
// MuteEmojis lists emojis that mute an issue, suppressing escalation notifications
// while keeping the issue visible. Useful for known issues being monitored.
// Requires admin permissions. Default: [":mask:"]
MuteEmojis []string `json:"muteEmojis" yaml:"muteEmojis"`
// ShowOptionButtonsEmojis lists emojis that reveal the interactive action buttons
// on the issue post (when AlwaysShowOptionButtons is false). Provides quick access
// to actions without memorizing emoji shortcuts. Available to all users.
// Default: [":information_source:"]
ShowOptionButtonsEmojis []string `json:"showOptionButtonsEmojis" yaml:"showOptionButtonsEmojis"`
}
IssueReactionSettings configures which emoji reactions trigger issue actions.
Users interact with issues by adding emoji reactions to Slack posts. Each action type can have multiple emojis configured, allowing teams to use their preferred symbols while maintaining consistent behavior. Emojis can be specified with or without colons - they are normalized during validation.
Permissions ¶
Some actions require channel admin permissions:
- Terminate: Requires admin (destructive action)
- Resolve: Requires admin (changes issue state)
- Mute: Requires admin (affects escalation)
Other actions are available to all channel members:
- Investigate: Any user can mark themselves as investigating
- ShowOptionButtons: Any user can reveal the action buttons
type IssueStatusSettings ¶
type IssueStatusSettings struct {
// PanicEmoji indicates a panic-level issue (highest severity, unmuted).
// Default: ":scream:"
PanicEmoji string `json:"panicEmoji" yaml:"panicEmoji"`
// ErrorEmoji indicates an error-level issue (unmuted).
// Default: ":x:"
ErrorEmoji string `json:"errorEmoji" yaml:"errorEmoji"`
// WarningEmoji indicates a warning-level issue (unmuted).
// Default: ":warning:"
WarningEmoji string `json:"warningEmoji" yaml:"warningEmoji"`
// InfoEmoji indicates an info-level issue (lowest severity, unmuted).
// Default: ":information_source:"
InfoEmoji string `json:"infoEmoji" yaml:"infoEmoji"`
// MutePanicEmoji indicates a muted panic-level issue. Escalations are suppressed
// but the issue remains visible and can still receive alerts.
// Default: ":no_bell:"
MutePanicEmoji string `json:"mutePanicEmoji" yaml:"mutePanicEmoji"`
// MuteErrorEmoji indicates a muted error-level issue.
// Default: ":no_bell:"
MuteErrorEmoji string `json:"muteErrorEmoji" yaml:"muteErrorEmoji"`
// MuteWarningEmoji indicates a muted warning-level issue.
// Default: ":no_bell:"
MuteWarningEmoji string `json:"muteWarningEmoji" yaml:"muteWarningEmoji"`
// InconclusiveEmoji indicates an issue resolved as inconclusive - neither confirmed
// as fixed nor identified as a false positive. Used when the root cause is unclear.
// Default: ":grey_question:"
InconclusiveEmoji string `json:"unresolvedEmoji" yaml:"unresolvedEmoji"`
// ResolvedEmoji indicates a successfully resolved (and possibly archived) issue.
// Default: ":white_check_mark:"
ResolvedEmoji string `json:"resolvedEmoji" yaml:"resolvedEmoji"`
}
IssueStatusSettings configures the visual indicators displayed in issue post headers.
Status emojis provide at-a-glance severity and state information, helping users quickly scan a channel to identify high-priority issues. Each issue displays one status emoji based on its current severity and mute state.
Severity Hierarchy (highest to lowest) ¶
- Panic: Critical issues requiring immediate attention
- Error: Significant issues that need prompt resolution
- Warning: Potential issues that should be monitored
- Info: Informational alerts (lowest priority)
Muted States ¶
Muted issues display different emojis to indicate they won't trigger escalations while remaining visible for monitoring. Muting is useful for known issues or issues being actively worked on.
type ManagerConfig ¶
type ManagerConfig struct {
// EncryptionKey is a 32-character alphanumeric key used for AES-256 encryption of
// sensitive data in alert payloads. This key must be identical to the key configured
// in APIConfig to ensure the Manager can decrypt payloads encrypted by the API.
//
// Security considerations:
// - Generate a cryptographically secure random key for production
// - Never use predictable values like "test" repeated or sequential characters
// - Rotate keys by deploying new API and Manager instances simultaneously
// - Store the key securely (e.g., Kubernetes secrets, HashiCorp Vault)
EncryptionKey string `json:"encryptionKey" yaml:"encryptionKey"`
// CacheKeyPrefix is prepended to all Redis cache keys to namespace them. Using the
// same prefix in both the API and Manager allows them to share cached data (such as
// Slack channel information and user lookups), significantly reducing Slack API calls.
//
// Use different prefixes only if running multiple independent Slack Manager deployments
// against the same Redis cluster that should not share cache data.
CacheKeyPrefix string `json:"cacheKeyPrefix" yaml:"cacheKeyPrefix"`
// SkipDatabaseCache disables the in-memory database query cache when set to true.
// The database cache reduces load on the database by caching frequently accessed
// data like issue states and channel configurations.
//
// Set to true only for:
// - Debugging cache-related issues
// - Development environments where you need immediate database consistency
// - Testing scenarios that require predictable database behavior
//
// In production, keep this false (the default) for optimal performance.
SkipDatabaseCache bool `json:"skipDatabaseCache" yaml:"skipDatabaseCache"`
// Location specifies the timezone used for timestamp parsing and formatting in
// Slack messages, logs, and issue metadata. All time-related operations use this
// location for consistency.
//
// Common values:
// - time.UTC (default): Recommended for distributed systems and log aggregation
// - time.LoadLocation("America/New_York"): For teams in a specific timezone
// - time.Local: Uses the server's local timezone (not recommended for production)
//
// Must not be nil. The default is time.UTC.
Location *time.Location `json:"location" yaml:"location"`
// SlackClient contains configuration for connecting to the Slack API, including
// authentication tokens, retry behavior, and timeout settings. See SlackClientConfig
// for detailed documentation of each field.
SlackClient *SlackClientConfig `json:"slackClient" yaml:"slackClient"`
// CoordinatorDrainTimeout is the maximum time the coordinator waits during shutdown
// to drain messages from its internal channels before terminating.
//
// During graceful shutdown, the coordinator:
// 1. Stops accepting new messages from the queue
// 2. Attempts to deliver all buffered messages to channel managers
// 3. Waits up to this timeout for channel managers to acknowledge receipt
//
// Messages not delivered within this timeout remain in the external queue and will
// be reprocessed by another Manager instance (or the same instance after restart).
//
// Default: 5 seconds. Must be between 2 seconds and 5 minutes.
CoordinatorDrainTimeout time.Duration `json:"coordinatorDrainTimeout" yaml:"coordinatorDrainTimeout"`
// ChannelManagerDrainTimeout is the maximum time each channel manager waits during
// shutdown to process messages from its internal buffer before terminating.
//
// During graceful shutdown, each channel manager:
// 1. Stops accepting new messages from the coordinator
// 2. Processes all buffered messages (creating issues, updating Slack, etc.)
// 3. Acknowledges processed messages to remove them from the queue
//
// Messages not processed within this timeout are negatively acknowledged (nack'd)
// and will be redelivered to the queue for reprocessing.
//
// Default: 3 seconds. Must be between 2 seconds and 5 minutes.
ChannelManagerDrainTimeout time.Duration `json:"channelManagerDrainTimeout" yaml:"channelManagerDrainTimeout"`
// SocketModeMaxWorkers limits the number of concurrent socket mode event handlers.
// This prevents goroutine explosion under high load by using a semaphore to limit
// the number of handlers that can run simultaneously.
//
// Each Slack event (reactions, interactions, slash commands, etc.) is processed
// by a separate goroutine. Without this limit, a burst of events could spawn
// thousands of goroutines, exhausting system resources.
//
// Default: 100. Must be between 10 and 1000.
SocketModeMaxWorkers int64 `json:"socketModeMaxWorkers" yaml:"socketModeMaxWorkers"`
// SocketModeDrainTimeout is the maximum time to wait for in-flight socket mode
// handlers to complete during graceful shutdown.
//
// During graceful shutdown, the socket mode handler:
// 1. Stops accepting new events from the Slack socket
// 2. Waits for all in-flight handlers to complete their work
// 3. If timeout is exceeded, logs a warning and proceeds with shutdown
//
// Handlers that don't complete within this timeout may have their work interrupted.
// For critical operations (like queue writes), handlers should complete quickly.
//
// Default: 5 seconds. Must be between 2 seconds and 5 minutes.
SocketModeDrainTimeout time.Duration `json:"socketModeDrainTimeout" yaml:"socketModeDrainTimeout"`
}
ManagerConfig holds configuration for the Slack Manager service.
The Manager service is responsible for processing alerts from the queue, managing issue lifecycle, and handling Slack events via Socket Mode. This configuration controls startup-time settings that cannot be changed without restarting the service.
Relationship with APIConfig ¶
The Manager and API services share some configuration values that must match:
- EncryptionKey: Must be identical for encrypted payloads to be decrypted correctly
- CacheKeyPrefix: Should be identical to share cached Slack channel data and reduce API calls
Graceful Shutdown ¶
The drain timeout settings control graceful shutdown behavior. During shutdown:
- The coordinator stops accepting new messages and drains its internal channels
- Each channel manager drains its internal buffers
- Messages that cannot be processed within the timeout are left in the queue for reprocessing by another instance (in a multi-instance deployment)
func NewDefaultManagerConfig ¶
func NewDefaultManagerConfig() *ManagerConfig
NewDefaultManagerConfig returns a ManagerConfig populated with sensible default values.
The defaults are configured for a typical production deployment:
- Cache key prefix "slack-manager:" for Redis key namespacing
- UTC timezone for consistent timestamp handling
- Coordinator drain timeout of 5 seconds
- Channel manager drain timeout of 3 seconds
- Socket mode max workers of 100
- Socket mode drain timeout of 5 seconds
- Database cache enabled (SkipDatabaseCache = false)
The EncryptionKey is intentionally left empty and must be set before use. The SlackClient tokens are also empty and must be configured.
func (*ManagerConfig) Validate ¶
func (c *ManagerConfig) Validate() error
Validate checks that all required fields are present and all values are within acceptable ranges. It returns a descriptive error for the first validation failure encountered, or nil if the configuration is valid.
Validation includes:
- EncryptionKey: if non-empty, must be exactly 32 alphanumeric characters
- CacheKeyPrefix: must not be empty (required for cache namespacing)
- Location: must not be nil
- SlackClient: must not be nil, and must pass its own validation
- CoordinatorDrainTimeout: must be between 2 seconds and 5 minutes
- ChannelManagerDrainTimeout: must be between 2 seconds and 5 minutes
- SocketModeMaxWorkers: must be between 10 and 1000
- SocketModeDrainTimeout: must be between 2 seconds and 5 minutes
type ManagerSettings ¶
type ManagerSettings struct {
// AppFriendlyName is the application name displayed in Slack messages, modals, and
// greeting messages. Used to identify the bot in user-facing text.
// Default: "Slack Manager"
AppFriendlyName string `json:"appFriendlyName" yaml:"appFriendlyName"`
// GlobalAdmins is a list of Slack user IDs (e.g., "U1234567890") with admin permissions
// in all channels. Global admins can resolve, terminate, and mute issues in any channel.
//
// Use sparingly - prefer channel-specific admins via AlertChannelSettings.AdminUsers
// for better access control. Global admins are typically reserved for on-call leads
// or platform administrators who need cross-channel access.
GlobalAdmins []string `json:"globalAdmins" yaml:"globalAdmins"`
// DefaultPostIconEmoji is the emoji displayed as the bot's avatar in Slack posts
// when no per-alert icon is specified. Must be in :emoji: format (colons are added
// automatically if missing during validation).
// Default: ":female-detective:"
DefaultPostIconEmoji string `json:"defaultPostIconEmoji" yaml:"defaultPostIconEmoji"`
// DefaultPostUsername is the display name shown for the bot in Slack posts when no
// per-alert username is specified. This appears as the sender name in message headers.
// Default: "Slack Manager"
DefaultPostUsername string `json:"defaultPostUsername" yaml:"defaultPostUsername"`
// DefaultAlertSeverity is the severity level assigned to alerts that don't specify one.
// Severity determines issue ordering (when reordering is enabled) and visual indicators.
// Valid values: "panic" (highest), "error", "warning". Info-level alerts are not allowed
// as the default since they typically don't require immediate attention.
// Default: "error"
DefaultAlertSeverity types.AlertSeverity `json:"defaultAlertSeverity" yaml:"defaultAlertSeverity"`
// DefaultIssueArchivingDelaySeconds is the time (in seconds) after resolution before
// an issue is marked as archived. During this period, new alerts can still re-open the
// issue. Once archived, the Slack post remains but the issue cannot be re-opened.
// Can be overridden per-alert via the alert payload.
// Default: 43200 (12 hours). Must be between 60 seconds and 30 days.
DefaultIssueArchivingDelaySeconds int `json:"defaultIssueArchivingDelaySeconds" yaml:"defaultIssueArchivingDelaySeconds"`
// DisableIssueReordering globally disables automatic severity-based ordering of issues.
// When false (default), higher-severity issues are moved to the bottom of the channel
// for visibility. When true, issues remain in creation order regardless of severity.
// Can be overridden per-channel via AlertChannelSettings.DisableIssueReordering.
DisableIssueReordering bool `json:"disableIssueReordering" yaml:"disableIssueReordering"`
// IssueReorderingLimit is the maximum number of open issues in a channel before
// automatic reordering is temporarily disabled. This prevents excessive API calls
// during incident storms. When the issue count drops below this limit, reordering
// resumes automatically. Can be overridden per-channel.
// Default: 30. Must be between 5 and 100.
IssueReorderingLimit int `json:"issueReorderingLimit" yaml:"issueReorderingLimit"`
// IssueProcessingIntervalSeconds is the frequency (in seconds) of background issue
// maintenance tasks: reordering, auto-resolution, archiving, and escalation.
// Lower values provide more responsive behavior but increase API usage.
// Can be overridden per-channel via AlertChannelSettings.IssueProcessingIntervalSeconds.
// Default: 10. Must be between 3 and 600 seconds.
IssueProcessingIntervalSeconds int `json:"issueProcessingIntervalSeconds" yaml:"issueProcessingIntervalSeconds"`
// IssueReactions configures which emoji reactions trigger issue actions. Each action
// can have multiple emojis configured, allowing teams to use their preferred symbols.
// If nil, a default IssueReactionSettings is created during InitAndValidate().
IssueReactions *IssueReactionSettings `json:"issueReactions" yaml:"issueReactions"`
// IssueStatus configures the emoji indicators displayed in issue posts to show
// severity and state. These appear in the issue header for quick visual scanning.
// If nil, a default IssueStatusSettings is created during InitAndValidate().
IssueStatus *IssueStatusSettings `json:"issueStatus" yaml:"issueStatus"`
// MinIssueCountForThrottle is the minimum number of open issues in a channel before
// update throttling is considered. Below this threshold, all updates are immediate.
// Above it, rapid updates to the same issue may be delayed to reduce API usage.
// Default: 5. Must be between 1 and 100.
MinIssueCountForThrottle int `json:"minIssueCountForThrottle" yaml:"minIssueCountForThrottle"`
// MaxThrottleDurationSeconds is the maximum delay (in seconds) between updates to
// a single issue when throttling is active. The actual delay scales with alert count
// but never exceeds this value. After this duration, updates are always allowed.
// Default: 90. Must be between 1 and 600 seconds.
MaxThrottleDurationSeconds int `json:"maxThrottleDurationSeconds" yaml:"maxThrottleDurationSeconds"`
// AlwaysShowOptionButtons controls whether interactive action buttons are permanently
// visible on issue posts. When false (default), buttons are hidden until a user reacts
// with the ShowOptionButtonsEmoji, reducing visual clutter. When true, buttons are
// always visible for immediate access to actions.
AlwaysShowOptionButtons bool `json:"alwaysShowOptionButtons" yaml:"alwaysShowOptionButtons"`
// ShowIssueCorrelationIDInSlackPost controls whether the issue's correlation ID
// (used for grouping related alerts) is displayed in the post footer. When false
// (default), the ID is hidden but accessible via the Issue Details modal.
// Enable for debugging or when correlation IDs are meaningful to users.
ShowIssueCorrelationIDInSlackPost bool `json:"showIssueCorrelationIdInSlackPost" yaml:"showIssueCorrelationIdInSlackPost"`
// DocsURL is an optional URL to documentation about your Slack Manager deployment.
// When set, this URL is included in help messages and error dialogs to direct users
// to relevant documentation. Leave empty if no documentation is available.
DocsURL string `json:"docsUrl" yaml:"docsUrl"`
// AlertChannels contains per-channel configuration overrides. Use this to customize
// admin permissions, reordering behavior, and processing intervals for specific channels.
// Channels not listed here use the global settings. Channel IDs must be unique and
// cannot overlap with InfoChannels.
AlertChannels []*AlertChannelSettings `json:"alertChannels" yaml:"alertChannels"`
// InfoChannels defines channels designated for system information display. These
// channels cannot receive alerts - the API returns an error if an alert targets an
// info channel. Use for dashboards, status pages, or documentation channels.
// Channel IDs must be unique and cannot overlap with AlertChannels.
InfoChannels []*InfoChannelSettings `json:"infoChannels" yaml:"infoChannels"`
// contains filtered or unexported fields
}
ManagerSettings contains runtime configuration for the Slack Manager service.
These settings control how the Manager processes alerts, displays issues in Slack, handles user interactions, and manages issue lifecycle. Unlike ManagerConfig (which requires a restart to change), ManagerSettings can be updated at runtime.
Structure ¶
Settings are organized into three levels:
- Global settings: Apply to all channels unless overridden
- AlertChannels: Per-channel overrides for processing behavior and admin permissions
- InfoChannels: Channels designated for system information (cannot receive alerts)
Defaults ¶
All fields are optional. Missing or zero values are replaced with sensible defaults during InitAndValidate(). See the Default* constants for specific default values.
Immutability ¶
Settings passed to the Manager are cloned internally to prevent external modifications from affecting the running system. Once settings are passed to the Manager, any subsequent changes to the original struct will have no effect. To update settings, create a new ManagerSettings instance and pass it to the Manager's update method.
Admin Permissions ¶
Admin permissions are checked for destructive actions (resolve, terminate, mute). A user is considered an admin if they are:
- Listed in GlobalAdmins (has admin rights in all channels)
- Listed in a channel's AdminUsers (has admin rights in that channel only)
- A member of a group listed in a channel's AdminGroups
func (*ManagerSettings) Clone ¶
func (s *ManagerSettings) Clone() (*ManagerSettings, error)
Clone creates a deep copy of the ManagerSettings by marshaling to JSON and back. The returned clone is NOT initialized - InitAndValidate() must be called before use.
This method is used internally by the Manager to ensure that external modifications to the original settings struct do not affect the running system.
Only exported fields are copied. Unexported fields (internal maps, mutex, initialized flag) are intentionally excluded since InitAndValidate() rebuilds them.
func (*ManagerSettings) GetInfoChannelConfig ¶
func (s *ManagerSettings) GetInfoChannelConfig(channelID string) (*InfoChannelSettings, bool)
GetInfoChannelConfig returns the InfoChannelSettings for the given channel ID. The second return value indicates whether the channel was found in the info channel configuration. Returns (nil, false) if the settings have not been initialized or if the channel is not configured as an info channel.
func (*ManagerSettings) InitAndValidate ¶
func (s *ManagerSettings) InitAndValidate() error
InitAndValidate initializes internal data structures and validates all settings.
This method performs the following operations:
- Sets default values for any missing or zero-value fields
- Normalizes emoji formats (adds colons if missing)
- Builds internal lookup maps for efficient access (global admins, channels, reactions)
- Validates all fields are within acceptable ranges
- Checks for duplicate or overlapping channel IDs
This method is idempotent - calling it multiple times on an already-initialized ManagerSettings has no effect. The Manager calls this internally when settings are updated, so external callers typically don't need to call it directly.
Returns an error describing the first validation failure encountered, or nil if valid.
func (*ManagerSettings) IsInfoChannel ¶
func (s *ManagerSettings) IsInfoChannel(channelID string) bool
IsInfoChannel reports whether the channel is configured as an info channel. Info channels cannot receive alerts - they are designated for system information display only. The API returns an error if an alert targets an info channel.
Returns false if the settings have not been initialized or if the channel is not configured as an info channel.
func (*ManagerSettings) IssueProcessingInterval ¶
func (s *ManagerSettings) IssueProcessingInterval(channelID string) time.Duration
IssueProcessingInterval returns the configured interval between background processing runs for the given channel. This controls how frequently the channel manager performs maintenance tasks like reordering, auto-resolution, and archiving.
Returns the channel-specific interval if configured in AlertChannels, otherwise returns the global IssueProcessingIntervalSeconds value. If the settings have not been initialized, returns the default interval (10 seconds).
func (*ManagerSettings) MapSlackPostReaction ¶
func (s *ManagerSettings) MapSlackPostReaction(reaction string) IssueReaction
MapSlackPostReaction maps a Slack reaction emoji name to an IssueReaction action type.
The reaction parameter is the emoji name without colons (e.g., "white_check_mark"), as provided by Slack's reaction_added events. The method checks against all configured reaction emojis in IssueReactionSettings and returns the corresponding action type.
Results are cached in a thread-safe map for performance, as reaction lookups occur frequently during event processing. Returns an empty string if the reaction doesn't match any configured action emoji or if the settings have not been initialized.
func (*ManagerSettings) OrderIssuesBySeverity ¶
func (s *ManagerSettings) OrderIssuesBySeverity(channelID string, openIssueCount int) bool
OrderIssuesBySeverity reports whether automatic severity-based issue ordering should be applied for the given channel with the current number of open issues.
Reordering is enabled when all of the following are true:
- DisableIssueReordering is false (globally or for this channel)
- The number of open issues does not exceed IssueReorderingLimit
Channel-specific settings override global settings when configured. Returns false if the settings have not been initialized.
func (*ManagerSettings) UserIsChannelAdmin ¶
func (s *ManagerSettings) UserIsChannelAdmin(ctx context.Context, channelID, userID string, userIsInGroup func(ctx context.Context, groupID, userID string) bool) bool
UserIsChannelAdmin reports whether the given user has admin permissions for the channel.
A user is considered a channel admin if any of the following are true:
- The user is a global admin (listed in GlobalAdmins)
- The user is listed in the channel's AdminUsers
- The user is a member of a group listed in the channel's AdminGroups
The userIsInGroup callback is used to check group membership via the Slack API. If userIsInGroup is nil, group-based permissions are not checked.
Returns false if the settings have not been initialized, if the channel has no specific configuration, or if the user doesn't match any admin criteria.
func (*ManagerSettings) UserIsGlobalAdmin ¶
func (s *ManagerSettings) UserIsGlobalAdmin(userID string) bool
UserIsGlobalAdmin reports whether the given user ID is listed in GlobalAdmins. Global admins have admin permissions in all channels and can perform any action.
Returns false if the settings have not been initialized or if userID is not found.
type RateLimitConfig ¶
type RateLimitConfig struct {
// AlertsPerSecond controls the token refill rate - how many tokens are added to
// the bucket per second. This determines the sustained alert rate after a burst
// has depleted the bucket. For example:
// - 1.0 = one alert per second (one token added every second)
// - 0.5 = one alert every 2 seconds (one token added every 2 seconds)
// - 10.0 = ten alerts per second (ten tokens added every second)
//
// This value can be fractional to allow rates slower than 1 per second.
AlertsPerSecond float64 `json:"alertsPerSecond" yaml:"alertsPerSecond"`
// AllowedBurst sets the token bucket capacity - the maximum number of tokens that
// can accumulate. This determines how many alerts can be sent in a rapid burst
// before rate limiting kicks in. The bucket starts full and refills over time
// when not in use.
//
// This value also serves as the maximum number of alerts allowed in a single API
// request. Requests exceeding this limit are rejected with HTTP 400 immediately,
// without waiting for rate limit tokens.
AllowedBurst int `json:"allowedBurst" yaml:"allowedBurst"`
// MaxRequestWaitTime is the maximum duration an API request will wait for rate
// limit tokens to become available. If tokens are not available within this time,
// the request fails with HTTP 429 (Too Many Requests).
//
// For example, with AlertsPerSecond=1 and an empty bucket, a request for 5 alerts
// would need to wait ~5 seconds for enough tokens. If MaxRequestWaitTime is 3s,
// the request would fail after 3 seconds rather than waiting the full 5 seconds.
//
// Set this based on your HTTP client timeout and user experience requirements.
// Longer waits reduce rejected requests but increase response latency.
MaxRequestWaitTime time.Duration `json:"maxRequestWaitTime" yaml:"maxRequestWaitTime"`
}
RateLimitConfig configures the token bucket rate limiter for alert processing.
Token Bucket Algorithm ¶
The rate limiter uses a token bucket algorithm, which can be visualized as a bucket that holds tokens. Each alert consumes one token from the bucket. The bucket has two key properties:
- Capacity (AllowedBurst): The maximum number of tokens the bucket can hold.
- Refill Rate (AlertsPerSecond): How fast tokens are added back to the bucket.
How It Works ¶
1. The bucket starts full with AllowedBurst tokens. 2. Each alert request removes one token from the bucket. 3. Tokens are continuously added at the AlertsPerSecond rate, up to the maximum capacity. 4. If the bucket is empty, requests must wait for tokens to be added.
Example Configuration ¶
With AlertsPerSecond=1 and AllowedBurst=30:
- A burst of up to 30 alerts can be sent immediately (draining the bucket).
- After the burst, alerts are limited to 1 per second (the refill rate).
- If no alerts are sent for 30 seconds, the bucket refills completely.
Choosing Values ¶
- AllowedBurst: Set this to handle expected traffic spikes. For example, if your monitoring system sends 20 alerts during an incident, set AllowedBurst >= 20.
- AlertsPerSecond: Set this to the sustained rate you want to allow. A value of 1 means one alert per second during continuous traffic.
The combination allows short bursts while preventing sustained flooding.
type RoutingRule ¶
type RoutingRule struct {
// Name is a unique identifier for the rule, used in logging and debugging.
// Does not affect routing behavior. Required and must be unique across all rules.
Name string `json:"name" yaml:"name"`
// Description provides human-readable documentation for the rule.
// Does not affect routing behavior. Optional.
Description string `json:"description" yaml:"description"`
// AlertType restricts this rule to alerts of a specific type. The value is
// case-insensitive and matched against the alert's type field. Common values
// include "security", "compliance", "metrics", "infrastructure".
//
// When multiple rules match a route key, rules with a matching AlertType take
// precedence over rules with no AlertType. Leave empty to match all alert types.
AlertType string `json:"alertType" yaml:"alertType"`
// Equals lists exact route key values that this rule matches. Matching is
// case-insensitive. The rule matches if the alert's route key equals any value
// in this list. Exact matches have the highest precedence.
Equals []string `json:"equals" yaml:"equals"`
// HasPrefix lists route key prefixes that this rule matches. Matching is
// case-insensitive. The rule matches if the alert's route key starts with any
// value in this list. Prefix matches have second-highest precedence.
HasPrefix []string `json:"hasPrefix" yaml:"hasPrefix"`
// MatchesRegex lists regular expressions that this rule matches against route keys.
// Patterns are automatically made case-insensitive ((?i) is prepended if not present).
// The rule matches if the alert's route key matches any pattern. Regex matches
// have third-highest precedence.
MatchesRegex []string `json:"matchesRegex" yaml:"matchesRegex"`
// MatchAll makes this rule match any route key, regardless of value. Used to
// create catch-all rules that handle alerts not matched by more specific rules.
// Match-all rules have the lowest precedence - they only match when no other
// rule matches. Only one catch-all rule per AlertType is typically needed.
MatchAll bool `json:"matchAll" yaml:"matchAll"`
// Channel is the Slack channel ID where matching alerts are sent. Must be a valid
// channel ID (e.g., "C1234567890"), not a channel name. Required.
Channel string `json:"channel" yaml:"channel"`
// contains filtered or unexported fields
}
RoutingRule defines a single rule for mapping alert route keys to Slack channels.
Matching Behavior ¶
A rule matches an alert's route key using one of these mechanisms (in precedence order):
- Exact match (Equals): The route key exactly matches one of the Equals values
- Prefix match (HasPrefix): The route key starts with one of the HasPrefix values
- Regex match (MatchesRegex): The route key matches one of the regular expressions
- Match-all (MatchAll): The rule matches any route key (used for catch-all rules)
All matching is case-insensitive. A rule must have at least one matching mechanism configured (Equals, HasPrefix, MatchesRegex, or MatchAll).
AlertType Filtering ¶
If AlertType is set, the rule only matches alerts with that specific type. Rules with a matching AlertType take precedence over rules with no AlertType at the same precedence level. This allows routing different alert types to different channels even when they share the same route key.
Example Configuration ¶
routingRules:
- name: "security-alerts"
alertType: "security"
hasPrefix: ["prod-", "staging-"]
channel: "C1234567890" # #security-alerts
- name: "prod-metrics"
equals: ["prod-metrics", "production"]
channel: "C2345678901" # #prod-monitoring
- name: "catch-all"
matchAll: true
channel: "C3456789012" # #alerts-general
type SlackClientConfig ¶
type SlackClientConfig struct {
// AppToken is the Slack App-Level Token (xapp-...) used for Socket Mode connections.
// This token enables real-time event handling including reactions, mentions, and
// interactive components. Obtain from: Slack App Settings > Basic Information >
// App-Level Tokens.
AppToken string `json:"appToken" yaml:"appToken"`
// BotToken is the Slack Bot User OAuth Token (xoxb-...) used for API operations.
// This token is used for posting messages, reading channels, managing reactions,
// and other bot actions. Obtain from: Slack App Settings > OAuth & Permissions.
BotToken string `json:"botToken" yaml:"botToken"`
// DebugLogging enables verbose logging of all Slack API requests and responses.
// Useful for debugging but should be disabled in production as it may expose
// sensitive data in logs.
DebugLogging bool `json:"debugLogging" yaml:"debugLogging"`
// DryRun prevents actual Slack API calls when true. The client logs intended
// actions without executing them. Useful for testing configuration and validating
// message formatting without affecting real Slack channels.
DryRun bool `json:"dryRun" yaml:"dryRun"`
// Concurrency sets the maximum number of concurrent Slack API requests.
// Higher values increase throughput but may trigger more rate limit responses.
// Default: 3. Must be between 1 and 50.
Concurrency int `json:"concurrency" yaml:"concurrency"`
// MaxAttemptsForRateLimitError sets how many times to retry after receiving
// a rate limit (HTTP 429) response. Slack rate limits are usually short-lived
// (seconds to minutes). Default: 10. Must be between 1 and 100.
MaxAttemptsForRateLimitError int `json:"maxAttemptsForRateLimitError" yaml:"maxAttemptsForRateLimitError"`
// MaxAttemptsForTransientError sets how many times to retry after transient
// errors like network timeouts or 5xx server errors. Default: 5. Must be
// between 1 and 100.
MaxAttemptsForTransientError int `json:"maxAttemptsForTransientError" yaml:"maxAttemptsForTransientError"`
// MaxAttemptsForFatalError sets how many times to retry after fatal errors
// (4xx responses except 429). These rarely succeed on retry but a few attempts
// help with edge cases. Default: 5. Must be between 1 and 100.
MaxAttemptsForFatalError int `json:"maxAttemptsForFatalError" yaml:"maxAttemptsForFatalError"`
// MaxRateLimitErrorWaitTimeSeconds caps the wait time between rate limit retries.
// Even if Slack's Retry-After header suggests longer, we won't wait more than
// this many seconds. Default: 120. Must be between 1 and 600.
MaxRateLimitErrorWaitTimeSeconds int `json:"maxRateLimitErrorWaitTimeSeconds" yaml:"maxRateLimitErrorWaitTimeSeconds"`
// MaxTransientErrorWaitTimeSeconds caps the wait time between transient error
// retries using exponential backoff. Default: 30. Must be between 1 and 600.
MaxTransientErrorWaitTimeSeconds int `json:"maxTransientErrorWaitTimeSeconds" yaml:"maxTransientErrorWaitTimeSeconds"`
// MaxFatalErrorWaitTimeSeconds caps the wait time between fatal error retries.
// Kept short since fatal errors rarely resolve on their own. Default: 30.
// Must be between 1 and 600.
MaxFatalErrorWaitTimeSeconds int `json:"maxFatalErrorWaitTimeSeconds" yaml:"maxFatalErrorWaitTimeSeconds"`
// HTTPTimeoutSeconds sets the HTTP client timeout for Slack API requests.
// This should be long enough to accommodate slow responses but short enough
// to fail fast on unresponsive endpoints. Default: 30. Must be between 1 and 300.
HTTPTimeoutSeconds int `json:"httpTimeoutSeconds" yaml:"httpTimeoutSeconds"`
}
SlackClientConfig holds configuration for the Slack API client used by both the API server and Manager service.
Authentication ¶
Two tokens are required for full functionality:
- AppToken (xapp-...): Used for Socket Mode connections to receive real-time events
- BotToken (xoxb-...): Used for API calls like posting messages and reading channels
Both tokens are obtained from the Slack App configuration at https://api.slack.com/apps.
Retry Behavior ¶
The client implements automatic retry with exponential backoff for different error types:
Rate Limit Errors (429): Retried with Slack's Retry-After header, up to MaxAttemptsForRateLimitError attempts with max wait of MaxRateLimitErrorWaitTimeSeconds.
Transient Errors (5xx, timeouts): Retried with exponential backoff, up to MaxAttemptsForTransientError attempts with max wait of MaxTransientErrorWaitTimeSeconds.
Fatal Errors (4xx except 429): Retried sparingly (MaxAttemptsForFatalError attempts) since these usually indicate permanent problems like invalid parameters.
Concurrency ¶
The Concurrency setting controls how many parallel Slack API requests can be made. Higher values increase throughput but risk triggering rate limits. The default of 3 is conservative; increase only if you need higher throughput and are prepared to handle more rate limit responses.
Debug Mode ¶
When DebugLogging is true, all Slack API requests and responses are logged at debug level. This is useful for troubleshooting but should be disabled in production as it may log sensitive information.
Dry Run Mode ¶
When DryRun is true, the client logs what it would do without actually calling the Slack API. Useful for testing configuration and message formatting without sending real messages.
func NewDefaultSlackClientConfig ¶
func NewDefaultSlackClientConfig() *SlackClientConfig
NewDefaultSlackClientConfig returns a SlackClientConfig populated with sensible defaults.
The defaults are configured for typical production use:
- Concurrency: 3 (balanced throughput vs rate limits)
- Rate limit retries: 10 attempts, max 120s wait
- Transient error retries: 5 attempts, max 30s wait
- Fatal error retries: 5 attempts, max 30s wait
- HTTP timeout: 30 seconds
The AppToken and BotToken are intentionally left empty and must be set before use.
func (*SlackClientConfig) SetDefaults ¶
func (c *SlackClientConfig) SetDefaults()
SetDefaults sets default values for any numeric fields that have zero or negative values. This is useful when the config is loaded from a file or environment variables where some fields may not be specified.
This method does not set defaults for required fields like AppToken and BotToken, as those must be explicitly configured.
func (*SlackClientConfig) Validate ¶
func (c *SlackClientConfig) Validate() error
Validate checks that all required fields are present and all values are within acceptable ranges. It returns a descriptive error for the first validation failure encountered, or nil if the configuration is valid.
Validation includes:
- AppToken: must not be empty
- BotToken: must not be empty
- Concurrency: must be between 1 and 50
- All MaxAttempts fields: must be between 1 and 100
- All MaxWaitTime fields: must be between 1 and 600 seconds
- HTTPTimeoutSeconds: must be between 1 and 300 seconds