config

package
v0.2.0-beta.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const SelfServiceName = "otelcontext"

SelfServiceName is the OTel service.name attribute the binary attaches to its own self-instrumentation spans. Mirrors the literal in main.initTracerProvider — keep the two in sync.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	Env               string
	LogLevel          string
	HTTPPort          string
	GRPCPort          string
	DBDriver          string
	DBDSN             string
	DLQPath           string
	DLQReplayInterval string

	// Ingestion Filtering
	IngestMinSeverity      string
	IngestAllowedServices  string
	IngestExcludedServices string

	// Storage Filtering. Logs that pass IngestMinSeverity (so they reach the
	// receiver and feed in-memory consumers like vectordb / GraphRAG) but
	// fall below StoreMinSeverity are skipped during the DB persist pass —
	// only the row-write is dropped, not the in-memory enrichment. Empty
	// (default) means StoreMinSeverity == IngestMinSeverity, i.e. no
	// behavior change vs. the single-threshold semantics.
	StoreMinSeverity string

	// DB Connection Pool
	DBMaxOpenConns    int
	DBMaxIdleConns    int
	DBConnMaxLifetime string // e.g. "1h", "30m"

	// Postgres-only opt-in: declarative range partitioning of the logs table by
	// day. When set to "daily", AutoMigrate provisions logs as a partitioned
	// table and the PartitionScheduler creates lookahead partitions and drops
	// expired ones (DROP PARTITION beats DELETE for retention by orders of
	// magnitude). Greenfield only — startup refuses if `logs` already exists
	// as a non-partitioned table. Empty / "none" = legacy unpartitioned schema.
	DBPostgresPartitioning string

	// Number of future daily partitions to maintain ahead of "today" when
	// DBPostgresPartitioning=daily. Defaults to 3. Tune up if your retention
	// policy is short and ingest spikes around a daily boundary.
	DBPartitionLookaheadDays int

	// Retention
	HotRetentionDays int

	// Retention tuning. Defaults (batch=50000, sleep=1ms) work for Postgres at
	// 100k logs/sec sustained. Lower on resource-constrained hosts; raise on
	// dedicated DB machines. 0/negative values use defaults.
	RetentionBatchSize    int
	RetentionBatchSleepMs int

	// TSDB
	TSDBRingBufferDuration string // e.g. "1h"

	// Smart Observability — Adaptive Sampling
	SamplingRate               float64
	SamplingAlwaysOnErrors     bool
	SamplingLatencyThresholdMs int

	// Smart Observability — Metric Cardinality
	MetricAttributeKeys  string // comma-separated allowlist
	MetricMaxCardinality int

	// Per-tenant cardinality cap. 0 = unlimited (only the global cap
	// applies, preserving legacy single-tenant behavior). Setting this
	// gives every tenant its own series budget so a noisy tenant cannot
	// starve siblings of fresh series in the in-memory TSDB. The global
	// cap (MetricMaxCardinality) remains a backstop and is checked
	// after the per-tenant cap.
	MetricMaxCardinalityPerTenant int

	// DLQ Safety
	DLQMaxFiles   int
	DLQMaxDiskMB  int
	DLQMaxRetries int
	// DLQMaxReplayPerTick caps how many DLQ files the replay worker attempts
	// in a single tick. Without it, an outage that filled the DLQ with 10k
	// files would replay all of them in the first post-restart tick,
	// hammering the (just-restarted) DB and exhausting connections.
	// 0 = unlimited (legacy default).
	DLQMaxReplayPerTick int

	// API Protection
	APIRateLimitRPS int

	// MCP Server
	MCPEnabled bool
	MCPPath    string
	// MCPMaxConcurrent caps the in-flight tools/call invocations server-wide.
	// Beyond this, callers receive a JSON-RPC server-overloaded error. <=0
	// disables the cap. Default 32 — sized for tight agent polling loops
	// without overrunning the GraphRAG in-memory store.
	MCPMaxConcurrent int
	// MCPCallTimeoutMs is the per-invocation deadline for tools/call. A tool
	// that exceeds it gets cancelled and the client receives an RPC timeout
	// error. <=0 disables the deadline. Default 30000 (30s).
	MCPCallTimeoutMs int
	// MCPCacheTTLMs is the lifetime of a memoized tool result for the cheap
	// in-memory GraphRAG tools (get_service_map, impact_analysis, etc.).
	// <=0 disables caching. Default 5000 (5s).
	MCPCacheTTLMs int

	// Compression
	CompressionLevel string // "default", "fast", "best"

	// Vector Index
	VectorIndexMaxEntries int

	// VectorIndexSnapshotPath is the on-disk location for periodic vectordb
	// snapshots. When empty, persistence is disabled and the index rebuilds
	// from DB on every restart (legacy behaviour). Default
	// "data/vectordb.snapshot".
	VectorIndexSnapshotPath string

	// VectorIndexSnapshotInterval, e.g. "5m". When set and
	// VectorIndexSnapshotPath is non-empty, the index serializes its state
	// to disk on this cadence. "0" / empty disables periodic writes (a
	// final snapshot still fires on graceful shutdown). Default "5m".
	VectorIndexSnapshotInterval string

	// LogFTSEnabled toggles SQLite FTS5 provisioning + querying. The FTS5
	// inverted index typically consumes 30-40% of SQLite DB disk for
	// log-heavy workloads, while the LIKE fallback (log_repo.go:105) keeps
	// search_logs functional without it. Default false; opt in with
	// LOG_FTS_ENABLED=true. Only meaningful on SQLite; Postgres uses pg_trgm
	// independently of this flag.
	LogFTSEnabled bool

	// GraphRAG worker count (background consumers of the ingestion event channel).
	// Defaults to 4 if unset or <=0. Increase under sustained high ingest.
	GraphRAGWorkerCount int

	// GraphRAG event channel buffer size. Defaults to 10000 if unset or <=0.
	GraphRAGEventQueueSize int

	// Async ingest pipeline (Phase 1 robustness work). Decouples OTLP Export
	// from synchronous DB writes. When enabled, Export() returns as soon as
	// the parsed batch is enqueued; persistence runs on a worker pool.
	//
	// Backpressure is hybrid:
	//   <90% queue       — accept all
	//   90%-100% queue   — drop healthy batches (silent), errors/slow always pass
	//   100% queue       — return RESOURCE_EXHAUSTED so OTLP clients back off
	IngestAsyncEnabled      bool // default true; opt out via INGEST_ASYNC_ENABLED=false
	IngestPipelineQueueSize int  // default 50000 batches; per-deployment tunable
	IngestPipelineWorkers   int  // default 8 worker goroutines
	// IngestPipelinePerTenantCap caps in-flight batches per tenant so a noisy
	// tenant cannot starve siblings of fresh queue slots when fullness is
	// below the soft-backpressure threshold. 0 (default) disables — single-
	// tenant deployments need no cap. Operators on multi-tenant deployments
	// should set INGEST_PIPELINE_PER_TENANT_CAP to roughly Capacity/N where
	// N is the expected number of concurrently-active tenants, with some
	// headroom (e.g. 2× the fair-share value) for short bursts.
	IngestPipelinePerTenantCap int

	// TLS (HTTP + gRPC). When both paths are set, TLS is enabled on both servers.
	// Empty values (default) keep plaintext behavior.
	TLSCertFile string
	TLSKeyFile  string

	// TLSAutoSelfsigned enables zero-friction self-signed TLS bootstrap for dev /
	// internal deployments. Ignored when TLSCertFile/TLSKeyFile are set (explicit
	// cert-file mode wins). Generated material is cached under TLSCacheDir.
	TLSAutoSelfsigned bool
	TLSCacheDir       string

	// API key authentication. When empty, auth middleware is a pass-through.
	// Loaded from API_KEY env var — never logged.
	APIKey string

	// OTelExporterEndpoint enables self-instrumentation. When set, the platform
	// exports its own spans to the configured OTLP endpoint (e.g. "localhost:4317"
	// for self-ingest, or an external collector).
	OTelExporterEndpoint string

	// DefaultTenant is the tenant ID assigned to rows ingested without an explicit
	// X-Tenant-ID header (HTTP) / x-tenant-id gRPC metadata.
	DefaultTenant string

	// OTLPTrustResourceTenant enables resolving the tenant from the OTLP
	// `tenant.id` resource attribute when no transport-level tenant header
	// was provided. Disabled by default because resource attributes are
	// client-controlled — a compromised SDK could set tenant.id to forge
	// another tenant's data. Only turn this on in closed environments where
	// all OTLP producers are trusted.
	OTLPTrustResourceTenant bool

	// APITenantKeysFile, when non-empty, switches API auth from a single
	// shared API_KEY into per-tenant bearer tokens. The file contains one
	// `key=tenant` pair per line; the matched key's tenant OVERRIDES any
	// X-Tenant-ID header so callers cannot cross tenants. Empty = disabled
	// (legacy shared-key mode remains available for single-tenant dev).
	APITenantKeysFile string

	// DevMode disables origin checks for WebSocket and enables dev-friendly defaults.
	// Derived from APP_ENV == "development".
	DevMode bool

	// gRPC server tuning — protects against huge OTLP batches and connection abuse.
	GRPCMaxRecvMB            int
	GRPCMaxConcurrentStreams int

	// AllowSqliteProd lets operators explicitly acknowledge that SQLite is
	// being used outside dev/test. Without it, a production Env + SQLite
	// combination refuses to start.
	AllowSqliteProd bool

	// WSMaxClients caps simultaneous WebSocket connections to /ws*
	// endpoints. 0 = unlimited (default). When set, new connections past
	// the cap receive HTTP 503. Sized for the operator's expected dashboard
	// audience — small for ops dashboards, larger for read-heavy public UIs.
	WSMaxClients int
}

func Load

func Load(customPath string) (*Config, error)

func (*Config) GuardSelfInstrumentation

func (c *Config) GuardSelfInstrumentation()

GuardSelfInstrumentation prevents an amplification loop when OTEL_EXPORTER_OTLP_ENDPOINT points at the binary's own gRPC port. Without this, every span the OTel SDK emits would re-enter Export, generate more spans (one per Export call), and re-enter again — unbounded fan-out.

Strategy: when the configured endpoint resolves to a loopback address, the own service name is auto-added to IngestExcludedServices so the ingest filter drops self-emitted batches. Operators can still override by setting the variable explicitly — the guard only ADDS, never removes.

No-op when self-instrumentation is disabled (empty endpoint) or the endpoint is non-loopback (a separate collector, the operator's responsibility).

func (*Config) TLSCertFileMode

func (c *Config) TLSCertFileMode() bool

TLSCertFileMode reports whether explicit cert-file TLS is configured. This path has precedence over self-signed.

func (*Config) TLSEnabled

func (c *Config) TLSEnabled() bool

TLSEnabled reports whether HTTPS + gRPC-TLS should be served using any mode (explicit files or auto self-signed).

func (*Config) TLSSelfsignedMode

func (c *Config) TLSSelfsignedMode() bool

TLSSelfsignedMode reports whether the self-signed bootstrap path should be used. False when explicit cert files are set (cert-file wins).

func (*Config) Validate

func (c *Config) Validate() error

Validate checks that all configuration values are within valid ranges. Call this once after Load() during startup to catch misconfiguration early.

func (*Config) ValidateDBForEnv

func (c *Config) ValidateDBForEnv() error

ValidateDBForEnv refuses the combination of SQLite driver + production environment unless AllowSqliteProd is explicitly set. SQLite's single-writer lock caps sustained throughput to ~5 services; using it in production will silently throttle ingestion.

Call once during startup after Load + Validate.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL